Connect to Text Files with the Microsoft Jet ODBC Text Driver

by Steve Kass

 

 

SQL Server provides several solutions for importing and exporting text files.  Data transformation services (DTS) supports text as a data source or destination; the bcp and osql utilities can import and export text files; and T-SQL provides BULK INSERT for importing text.  Another solution is less widely known: the Jet text file driver can connect SQL Server directly to text files via a linked server.  This ODBC driver handles some tasks better than any of the other solutions, and it provides pure T-SQL solutions to many problems.

 

This is the first of (at least) two articles on the Jet text driver.  It shows how to connect a directory of text files as a linked server and includes a practical example.  If you’re near a development server as you read this, open the Query Analyzer and follow along.

 

 

Set up the linked server

 

Connecting a directory of text files as a linked server is easy.  Here’s how:

 

  1. In Windows, create a new directory for the linked server.  For this article, that directory will be E:\txtsvr.
  2. Add that directory as a linked server with sp_addlinkedserver:
    EXEC sp_addlinkedserver txtsvr, 'Jet 4.0',
      'Microsoft.Jet.OLEDB.4.0', 'e:\txtsvr', NULL, 'Text'

 

That’s all that’s required to add the linked server txtsvr as a connection to the text files in E:\txtsvr, but there are no tables in txtsvr yet.  Also missing is a required file, schema.ini, which must contain a description of each table’s format.  You can  learn more about schema.ini from the Microsoft MSDN Library, but for now, just use Notepad or your favorite text editor to create the file e:\txtsvr\schema.ini as shown in Listing 1.  In addition, create an empty text file in e:\txtsvr named table1.txt.

 

The file e:\txtsvr\table1.txt is now an empty table in the linked server txtsvr.  We can see that if we list information about all the tables in txtserv with the system stored procedure sp_tables_ex:

 

EXEC sp_tables_ex txtsvr

 

You’ll see that the name of the one table in txtsvr is table1#txt, according to sp_tables_ex.  The table name for a file in txtsvr will always contain ‘#’ in place of the filename’s ‘.’ character.

 

 

Text files are tables

 

We can’t issue a CREATE TABLE statement against this linked server, but we have created a table easily enough.  We can, however, run INSERT and SELECT queries, if we use the four-part naming convention <server>…<table> (that’s three dots):

 

insert into txtsvr...table1#txt(CustomerNumber,CustomerName,EntryDate)

select orderid, shipname, orderdate

from Northwind..orders

 

Be sure to specify the column names explicitly in any INSERT query.  The ordinal positions of the columns don’t always match the order given in schema.ini, and so you may be in for a surprise if you omit the column names.  This INSERT query exports data from Northwind..orders to e:\txtsvr\table1.txt.  It also doesn’t make much sense, but it’s just an example.  We’ll get to a practical example very soon.  Here are the first few lines of e:\txtsvr\table1.txt after we run the query.  Tab characters separate the three columns of data in each row.

 

10248     "Vins et alcools Chevalier"  7/4/1996 0:00:00

10249     "Toms Spezialitäten"    7/5/1996 0:00:00

10250     "Hanari Carnes"    7/8/1996 0:00:00

 

Now that this file contains data, a SELECT query is worth trying.  Remember, this is both a text file in a Windows directory and a table in the linked server txtsvr.

 

select CustomerName, convert(varchar(10),EntryDate,101) as [Date]

from txtsvr...table1#txt

where datepart(weekday,EntryDate) = 4

 

Output (results abridged):

 

CustomerName                             Date      

---------------------------------------- ----------

Hanari Carnes                            07/10/1996

Ernst Handel                             07/17/1996

Folk och fä HB                           07/24/1996

 

Now we certainly don’t need the Jet text file driver or a linked server just to retrieve this information.  This was simply an example to introduce the technique.

 

If you’re running the examples as you read, you probably noticed that these queries were slow.  Each one takes several seconds at best.  If you run them a second time, however, you’ll see a significant improvement.  And if you’re concerned with the time it might take if table1.txt contained thousands, or hundreds of thousands, of rows, rest assured that this approach will do as well as DTS, bcp, or BULK INSERT in its speed, and in other respects, it has some real advantages.

 

 

SELECT a Windows directory listing

 

If you’ve ever wanted to generate a Windows directory listing, you might have found the extended stored procedure xp_cmdshell useful.  The one-liner

 

EXEC master..xp_cmdshell 'dir "e:\Program Files\Ocelot"'

 

shows me what’s in my Ocelot program directory (see Listing 2).

 

 

It takes some work to turn this into a usable table containing the name, last modification date, and size of each non-directory file.  Among the difficulties are the commas in the size (pretend you don’t know about DIR/-C), the non-standard AM/PM indication, and the various extra lines of information.

 

One of the most welcome features of the Jet text file driver is how well it accommodates text files contain a mix of formatted and non-formatted lines.  We’ll add a table specification to schema.ini that describes the “good” lines as fixed-length data, and then view this mixed-bag file as a table.  The lines that don’t contain a date, size and filename in neat columns, or that are blank or too long or too short, won’t cause any trouble, as they do if we try to use bcp or BULK INSERT to import this information into a three-column table.

 

While there may be a way to specify the one-character AM/PM indicator as a datetime format, I didn’t find it.  As a result, I put four columns into the schema for a DIR output file, and I added an entry to schema.ini (Listing 3).

 

 

You’ll note that I’m importing the file size column as currency.  That serves two purposes.  First, it means I don’t have to do anything special to accommodate the commas, since the driver welcomes them in monetary data.  Second, it eliminates the possibility of overflowing a 4-byte integer if a file is much bigger than 2 GB.  I chose 20 for the filename size so the results would fit on this page.  What’s more important to know is that any filenames longer than 20 characters will be quietly truncated to 20 characters when data is selected through the linked server.  Conveniently, directory.txt contains the entire filename, so if the truncation is a problem, you don’t have to generated the directory listing again.  Simply replace the 20 with a larger number in schema.ini.

 

My specific goal will be to get the following results from txtsvr...directory#txt, if E:\txtsvr\directory.txt contains a raw directory listing.

 

Name                 FileSize             Date                

-------------------- -------------------- -------------------

CHANGE.SQL           874                  2001-12-28 04:00:00

CHAR.SQL             1298                 2001-12-28 04:00:00

CLUSTER.OCELOT       884                  2001-12-28 04:00:00

 

(results abridged)

 

This goal is just two simple steps away.  First, put the raw directory listing into directory.txt.  I’ll do this the hard way (shown in Listing 4), as you might want to in a stored procedure, with the directory as a parameter.  Note the use of quotename(), which handles spaces in the directory name and guards against malicious “SQL injection.”

 

Second, query the table directory#txt for the data you want, understanding that if the first 17 characters of a line do not contain a date in the format mm/dd/yyyy  hh:nn, directory#txt will show NULL in the Date column, and if the 21 character-wide size column doesn’t contain a valid currency value, Size will be NULL.  This particular query (Listing 5) retrieves only the files, not the directories, by requiring the Size column to be non-null.

 

That’s it.  But just in case you decide you need something else, the entire raw directory listing is still in directory.txt.  If you later want to see a list of the directories in E:\Program Files\Ocelot, use Listing 5 again, but change the WHERE condition to  where [Size] is null and FileDate is not null.

 

In the next article, you’ll learn how powerful the Jet driver is for handling ragged file, unusual formats, and large quantities of data.

 

 

Listing 1.

[table1.txt]

Format=TabDelimited

CharacterSet=OEM

ColNameHeader=True

Col1=CustomerNumber Long

Col2=CustomerName Text Width 40

Col3=EntryDate DateTime

 

 

Listing 2.

output                                                                                                                                                                                                                                                          

-----------------------------------------------------------

 Volume in drive E is DISK3PART01

 Volume Serial Number is 841B-E016

NULL

 Directory of e:\Program Files\Ocelot

NULL

12/31/2001  09:09p      <DIR>          .

12/31/2001  09:09p      <DIR>          ..

12/28/2001  04:00a                 874 CHANGE.SQL

12/28/2001  04:00a               1,298 CHAR.SQL

12/28/2001  04:00a                 884 CLUSTER.OCELOT

12/28/2001  04:00a               1,493 COMPOUND.SQL

 

(results abridged)

 

 

Listing 3.

[directory.txt]

Format=FixedLength

CharacterSet=OEM

ColNameHeader=False

DateTimeFormat=mm/dd/yyyy  hh:nn

Col1=FileDate DateTime Width 17

Col2=AMPM Text Width 1

Col3=Size Currency Width 21

Col4=Name Text Width 20

 

 

Listing 4.

declare @directory varchar(80)

set @directory = 'E:\Program Files\Ocelot'

declare @cmd varchar(300)

set @cmd =

  'dir ' + cast(quotename(@directory,'"') as varchar(320))

         + ' > E:\txtsvr\Directory.txt'

exec master..xp_cmdshell @cmd

 

 

Listing 5.

 

select

  [Name],

  cast([Size] as bigint) as FileSize,

  convert(varchar(19),

          dateadd(hour,

                  case when ampm='a' then 0 else 12 end,

                  FileDate),

          120) as [Date]

  from txtsvr...Directory#txt

  where [Size] is not null

  and FileDate is not null

  order by fName