Connect to Text Files with the Microsoft Jet ODBC Text Driver
by Steve Kass
SQL Server provides several solutions for importing and exporting text files. Data transformation services (DTS) supports text as a data source or destination; the bcp and osql utilities can import and export text files; and T-SQL provides BULK INSERT for importing text. Another solution is less widely known: the Jet text file driver can connect SQL Server directly to text files via a linked server. This ODBC driver handles some tasks better than any of the other solutions, and it provides pure T-SQL solutions to many problems.
This is the first of (at least) two articles on the Jet text driver. It shows how to connect a directory of text files as a linked server and includes a practical example. If youíre near a development server as you read this, open the Query Analyzer and follow along.
Connecting a directory of text files as a linked server is easy. Hereís how:
Thatís all thatís required to add the linked server txtsvr as a connection to the text files in E:\txtsvr, but there are no tables in txtsvr yet. Also missing is a required file, schema.ini, which must contain a description of each tableís format. You can learn more about schema.ini from the Microsoft MSDN Library, but for now, just use Notepad or your favorite text editor to create the file e:\txtsvr\schema.ini as shown in Listing 1. In addition, create an empty text file in e:\txtsvr named table1.txt.
The file e:\txtsvr\table1.txt is now an empty table in the linked server txtsvr. We can see that if we list information about all the tables in txtserv with the system stored procedure sp_tables_ex:
EXEC sp_tables_ex txtsvr
Youíll see that the name of the one table in txtsvr is table1#txt, according to sp_tables_ex. The table name for a file in txtsvr will always contain Ď#í in place of the filenameís Ď.í character.
We canít issue a CREATE TABLE statement against this linked server, but we have created a table easily enough. We can, however, run INSERT and SELECT queries, if we use the four-part naming convention <server>Ö<table> (thatís three dots):
insert into txtsvr...table1#txt(CustomerNumber,CustomerName,EntryDate)
select orderid, shipname, orderdate
Be sure to specify the column names explicitly in any INSERT query. The ordinal positions of the columns donít always match the order given in schema.ini, and so you may be in for a surprise if you omit the column names. This INSERT query exports data from Northwind..orders to e:\txtsvr\table1.txt. It also doesnít make much sense, but itís just an example. Weíll get to a practical example very soon. Here are the first few lines of e:\txtsvr\table1.txt after we run the query. Tab characters separate the three columns of data in each row.
10248 "Vins et alcools Chevalier" 7/4/1996 0:00:00
10249 "Toms Spezialitšten" 7/5/1996 0:00:00
10250 "Hanari Carnes" 7/8/1996 0:00:00
Now that this file contains data, a SELECT query is worth trying. Remember, this is both a text file in a Windows directory and a table in the linked server txtsvr.
select CustomerName, convert(varchar(10),EntryDate,101) as [Date]
where datepart(weekday,EntryDate) = 4
Output (results abridged):
Hanari Carnes 07/10/1996
Ernst Handel 07/17/1996
Folk och fš HB 07/24/1996
Now we certainly donít need the Jet text file driver or a linked server just to retrieve this information. This was simply an example to introduce the technique.
If youíre running the examples as you read, you probably noticed that these queries were slow. Each one takes several seconds at best. If you run them a second time, however, youíll see a significant improvement. And if youíre concerned with the time it might take if table1.txt contained thousands, or hundreds of thousands, of rows, rest assured that this approach will do as well as DTS, bcp, or BULK INSERT in its speed, and in other respects, it has some real advantages.
If youíve ever wanted to generate a Windows directory listing, you might have found the extended stored procedure xp_cmdshell useful. The one-liner
EXEC master..xp_cmdshell 'dir "e:\Program Files\Ocelot"'
shows me whatís in my Ocelot program directory (see Listing 2).
It takes some work to turn this into a usable table containing the name, last modification date, and size of each non-directory file. Among the difficulties are the commas in the size (pretend you donít know about DIR/-C), the non-standard AM/PM indication, and the various extra lines of information.
One of the most welcome features of the Jet text file driver is how well it accommodates text files contain a mix of formatted and non-formatted lines. Weíll add a table specification to schema.ini that describes the ďgoodĒ lines as fixed-length data, and then view this mixed-bag file as a table. The lines that donít contain a date, size and filename in neat columns, or that are blank or too long or too short, wonít cause any trouble, as they do if we try to use bcp or BULK INSERT to import this information into a three-column table.
While there may be a way to specify the one-character AM/PM indicator as a datetime format, I didnít find it. As a result, I put four columns into the schema for a DIR output file, and I added an entry to schema.ini (Listing 3).
Youíll note that Iím importing the file size column as currency. That serves two purposes. First, it means I donít have to do anything special to accommodate the commas, since the driver welcomes them in monetary data. Second, it eliminates the possibility of overflowing a 4-byte integer if a file is much bigger than 2 GB. I chose 20 for the filename size so the results would fit on this page. Whatís more important to know is that any filenames longer than 20 characters will be quietly truncated to 20 characters when data is selected through the linked server. Conveniently, directory.txt contains the entire filename, so if the truncation is a problem, you donít have to generated the directory listing again. Simply replace the 20 with a larger number in schema.ini.
My specific goal will be to get the following results from txtsvr...directory#txt, if E:\txtsvr\directory.txt contains a raw directory listing.
Name FileSize Date
-------------------- -------------------- -------------------
CHANGE.SQL 874 2001-12-28 04:00:00
CHAR.SQL 1298 2001-12-28 04:00:00
CLUSTER.OCELOT 884 2001-12-28 04:00:00
This goal is just two simple steps away. First, put the raw directory listing into directory.txt. Iíll do this the hard way (shown in Listing 4), as you might want to in a stored procedure, with the directory as a parameter. Note the use of quotename(), which handles spaces in the directory name and guards against malicious ďSQL injection.Ē
Second, query the table directory#txt for the data you want, understanding that if the first 17 characters of a line do not contain a date in the format mm/dd/yyyy hh:nn, directory#txt will show NULL in the Date column, and if the 21 character-wide size column doesnít contain a valid currency value, Size will be NULL. This particular query (Listing 5) retrieves only the files, not the directories, by requiring the Size column to be non-null.
Thatís it. But just in case you decide you need something else, the entire raw directory listing is still in directory.txt. If you later want to see a list of the directories in E:\Program Files\Ocelot, use Listing 5 again, but change the WHERE condition to where [Size] is null and FileDate is not null.
In the next article, youíll learn how powerful the Jet driver is for handling ragged file, unusual formats, and large quantities of data.
Col2=CustomerName Text Width 40
Volume in drive E is DISK3PART01
Volume Serial Number is 841B-E016
Directory of e:\Program Files\Ocelot
12/31/2001 09:09p <DIR> .
12/31/2001 09:09p <DIR> ..
12/28/2001 04:00a 874 CHANGE.SQL
12/28/2001 04:00a 1,298 CHAR.SQL
12/28/2001 04:00a 884 CLUSTER.OCELOT
12/28/2001 04:00a 1,493 COMPOUND.SQL
Col1=FileDate DateTime Width 17
Col2=AMPM Text Width 1
Col3=Size Currency Width 21
Col4=Name Text Width 20
declare @directory varchar(80)
set @directory = 'E:\Program Files\Ocelot'
declare @cmd varchar(300)
set @cmd =
'dir ' + cast(quotename(@directory,'"') as varchar(320))
+ ' > E:\txtsvr\Directory.txt'
exec master..xp_cmdshell @cmd
cast([Size] as bigint) as FileSize,
case when ampm='a' then 0 else 12 end,
120) as [Date]
where [Size] is not null
and FileDate is not null
order by fName