Managed Extensions: Using the .NET OLE DB Classes to Read Text Files

Welcome to this week’s installment of .NET Tips & Techniques! Each week, award-winning Architect and Lead Programmer Tom Archer demonstrates how to perform a practical .NET programming task using either C# or Managed C++ Extensions.

A previous .NET Tips & Techniques article illustrated how to access comma-delimited text files using the .NET ODBC classes. This week’s tip covers performing the same task using the .NET OLE DB classes.

Note: If you’re trying to decide which database-access technology to use—ODBC or OLE DB—I posted an article on my consulting Web site that compares ODBC to OLE DB.

Using the OLE DB Classes

Much like any of the database-access classes provided by the .NET BCL (Base Class Library), the OLE DB classes that you’ll use most often represent a connection to the data source (OleDbConnection), a command object that will be executed against the data source (OleDbCommand), and a reader that is used to enumerate the results of a command (OleDbDataReader). Therefore, the standard process for reading any type of file using OLE DB classes is the following:

  1. Instantiate an OleDbConnection object, passing to it a connection string that must include the name of the OLE DB provider for the underlying data source.
  2. Call the OleDbConnection::Open method.
  3. Instantiate an OleDbCommand object and specify to it the SQL that you want to be executed against the data store.
  4. Call one of the OleDbCommand execute methods—ExecuteNonQuery, ExecuteReader,
    or ExecuteNonScalar:

      ExecuteNonQuery
      Used for a command that will not return a result set, but will return a value indicating the number of rows affected
      ExecuteReader
      Returns a result set based on the executed command
      ExecuteNonScalar
      Returns only the first column of the first row of the result set, ignoring the balance of the result set
  5. In the case of using the OleDbCommand::ExecuteReader:
    1. A reader object is returned. It is enumerated by using the OleDbDataReader::Read method to read all rows returned (indicated by the OleDbDataReader::FieldCount property).
    2. For each row, the OleDbDataReader::GetName method is used to determine the current column name, and then you can either use the generic OleDbDataReader::GetValue method to get the actual data value or methods specific to the data being read—such as GetBytes, GetChar, GetDateTime, and so on.

Specifying the Connection String

When formatting an OLE DB connection string, be aware of the following three main parts:

  • Provider—Always set this to the OLE DB provider that you will be using. In the case of accessing text files, this value is Provider=Microsoft.Jet.OLEDB.4.0;
  • Data Source—When accessing text files, set this value to the fully qualified file name (e.g., Data Source=myData.txt;)
  • Extended Properties—This is where things get a bit interesting. This parameter allows you to specify several properties that are specific to the provider being used. Specifically, you can specify things such as whether the file has a column row (as its first row) and how the data is formatted. Here are some example Extended Properties settings:

    No header row, file is delimited
    Extended Properties=’text;HDR=No;FMT=Delimited’

    Header row, file is delimited
    Extended Properties=’text;HDR=Yes;FMT=Delimited’

Note that I didn’t mention how to change the FMT parameter of the Extended Properties. This is because if your file is delimited by something other than commas, you’ll need to define a schema.ini file.

The schema.ini File

Because the Jet OLE DB provider uses the Microsoft Text Driver to access text files, you can define a schema.ini file just as you would if you were using ODBC to access the data. This is covered in a previous article, so I won’t repeat all that here. However, bear in mind that if a conflict exists between what you specify in the Extended Properties part of the OLE DB connection string and a parameter you specify in the schema.ini file, the schema.ini file will take precedence. For example, if you specify HDR=No in the connection string, but specify ColNameHeader=TRUE in the schema.ini file, the file will be parsed as if the first row is a column header.

Demo Application

This article includes a simple demo that reads and displays the text from a small text file. Here’s a screen shot of that application:

Now, here’s the actual code from the function in the application that uses much of what you’ve read in this article (with the addition of some basic error-handling and clean-up):

void CReadingCsvFilesWithOledbDlg::DisplayCsvFile()
{
#pragma push_macro("new")
#undef new
  CWaitCursor wc;

  m_lstCsvData.DeleteAllItems();
  DeleteAllColumns(m_lstCsvData);

  OleDbConnection* connection = NULL;
  OleDbDataReader* reader = NULL;
  try
  {
    StringBuilder* connString = new StringBuilder();

    connString->Append(S"Provider=Microsoft.Jet.OLEDB.4.0;");
    connString->AppendFormat(S"Data Source={0};",
                             Path::GetDirectoryName(m_strFileName));
    connString->Append(S"Extended Properties='text;HDR=No;
                         FMT=TabDelimited'");

    connection = new OleDbConnection(connString->ToString());
    connection->Open();

    CString strSelect;
    strSelect.Format(_T("SELECT * FROM [%s]"),
                     Path::GetFileName(m_strFileName));
    OleDbCommand* command = new OleDbCommand(strSelect, connection);

    reader = command->ExecuteReader();
    int iCurrRow = 0;
    while (reader->Read())
    {
      int idx;
      for (int iCurrCol = 0; iCurrCol < reader->FieldCount;
           iCurrCol++)
      {
        if (0 == iCurrRow)
        {
          // assumes a schema.ini file
          m_lstCsvData.InsertColumn(iCurrCol, (CString)reader->
                                    GetName(iCurrCol));
        }

        if (0 == iCurrCol)
        {
          idx = m_lstCsvData.InsertItem(iCurrRow, (CString)
                                       (reader->GetValue(iCurrCol)->
                                       ToString()));
        }
        else
          m_lstCsvData.SetItemText(idx, iCurrCol, (CString)
                                  (reader->GetValue(iCurrCol)->
                                   ToString()));
      }
      iCurrRow++;
    }
    SizeAllColumns(m_lstCsvData);
  }
  catch(Exception* e)
  {
    m_strFileName = _T("");
    UpdateData(FALSE);

    AfxMessageBox((CString)e->Message);
  }
  __finally
  {
    if (NULL != reader) reader->Close();
    if (NULL != connection) connection->Close();
  }
#pragma pop_macro("new")
}

After looking at the screen shot and the following code, you should be able to surmise that I have a schema.ini file that specifies the names of the columns that are displayed in the listview control.

Using the Registry

If you do not want to use a schema.ini file—for example, if you wanted a given machine to always handle text files the same way without having to maintain a bunch of separate schema.ini files and without having to hard-code the Extended Properties parameter to each OLE DB connection string—you can control how the Microsoft Text Driver works with your text files by modifying the following registry key:

HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Jet\4.0\Engines\Text\Format

Because these values are to be used in lieu of having a schema.ini file, you can find the valid values on the Microsoft Web site.

The following figure shows what the Registry key looks like on my development machine:

More by Author

Get the Free Newsletter!

Subscribe to Developer Insider for top news, trends & analysis

Must Read