Reading a CSV file into a DataTable

Published on 19 May 2009

If you don't share my opinion that CSV should be banned by an international convention, it might be because you've never had to write code to read a CSV file.

"Ah, but thats easy", you say, "Just use String.Split()". Well, that will work if your CSV file is an academic example that doesn't contain strings that were typed by real human beings. But real-world CSV files generally contain some or all of the following horrors:

- some strings enclosed in speechmarks, but some not
- empty fields
- speechmarks within strings
- commas within strings
- carriage returns within strings

To demonstrate, here is a CSV example that exhibits all of the traits listed above:

  
"First Name","Last Name","Address","Town","Postcode"  
David,O'Leary,"12 Acacia Avenue",London,NW5 3DF  
June,Robinson,"14, Abbey Court","Putney",SW6 4FG  
Greg,Hampton,"",,  
Stephen,James,"""Dunroamin"" 45 Bridge Street",Bristol,BS2 6TG  
Daisy,Smith,"Flat 6,  
Porchester Court,  
24 Dragon Road",Newcastle,NE3 4FG
```(Note that the last line has two carriage returns in the Address, which is why it sprawls over three lines. Also speechmarks within speechmark-delimited strings have to be escaped somehow; in this example they are escaped by doubling them up.)  
  
It is possible to deal with pathological cases of CSV using [carefully](http://me7486.blogspot.com/2008/06/regular-expression-for-parsing-csv-file_729.html) tuned [Regular Expressions](http://www.hotblue.com/article0000.aspx?a=0006). It is also possible to use an OdbcConnection and the [Microsoft Text ODBC Driver](http://www.thejackol.com/2004/07/01/connect-to-a-csv-file-using-odbc-c/).  
  
But needing to read a CSV file is a very common affliction - surely someone has built an open-source framework to deal with it reliably? Surprisingly, such frameworks are quite hard to come by, but I did eventually find a brilliant one:  
  
LumenWorks.Framework.IO.Csv by Sebastien Lorien  
Downloadable from The Code Project, here:  
[A Fast CSV Reader by Sebastien Lorien](http://www.codeproject.com/KB/database/CsvReader.aspx)  
  
You can download the binaries or the source, its much faster than Regex or ODBC, and it can handle cases like the mangled CSV snippet shown above. It is also open-source, under the MIT Licence.  
  
The design is based on the StreamReader class, and Sebastien gives a few example on the [CodeProject page](http://www.codeproject.com/KB/database/CsvReader.aspx). However the examples mostly show how to bind a CSV straight to a DataGrid. So here is an example of how to use the library to read a CSV file into a DataTable:  

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Data;

using System.IO;
using LumenWorks.Framework.IO.Csv;

namespace CSVTest
{
class LumenWorksCSVExample
{

    /// <summary>  
    /// Example code to fetch a CSV file and turn it into a DataTable  
    /// using the LumenWorks.Framework.IO.Csv library  
    /// </summary>  
    /// <param name="filePath">The full path of the CSV file</param>  
    /// <returns>A DataTable with all-string columns</returns>  
    public DataTable FetchFromCSVFile(string filePath)  
    {  
        DataTable csvTable = new DataTable();  
        using (CsvReader csvReader =  
            new CsvReader(new StreamReader(filePath),true))  
        {  
            csvTable.Load(csvReader);  
        }  
        return csvTable;  
    }  
}  

}

  
If Load() is a bit too automatic and you need more control over the loading of the rows, maybe to only get certain rows or columns, or to do column type conversions, this longer example might be more useful:  

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Data;

using System.IO;
using LumenWorks.Framework.IO.Csv;

namespace CSVTest
{
class LumenWorksCSVExample2
{
///


/// Example code to fetch a CSV file and turn it into a DataTable
/// using the LumenWorks.Framework.IO.Csv library
/// Note: This example uses loops instead of DataTable.Load()
///

/// The full path of the CSV file
/// A DataTable with all-string columns
public DataTable FetchFromCSVFileLong(string filePath)
{
char delimiter = ',';
bool hasHeader = true;
DataTable csvTable = new DataTable();
using (CsvReader csvReader =
new CsvReader(new StreamReader(filePath), hasHeader, delimiter))
{

            int fieldCount = csvReader.FieldCount;  
            string\[\] headers = csvReader.GetFieldHeaders();  

            // this bit could be modified to fine-tune the columns  
            foreach (string headerLabel in headers)  
                csvTable.Columns.Add(headerLabel, typeof(string));  

            while (csvReader.ReadNextRecord())  
            {  
                DataRow newRow = csvTable.NewRow();  
                // this bit could be modified to do type conversions, skip columns, etc  
                for (int i = 0; i < fieldCount; i++)  
                    newRow\[i\] = csvReader\[i\];  

                csvTable.Rows.Add(newRow);  
                  
            }  
        }  
        return csvTable;  
    }  
}  

}