Wednesday, July 8, 2009

Flexible text format support using regular expression



I was writing a small database application, which allows users to import records in plain text files to Access database. One of the problems is that different users will have different formats: some are tab delimited, some are comma delimited; some have fixed width fields, while others don't. You can use a switch statement to deal with them, but when the number of formats increases, so is the ugliness index of your code. To make it more difficult, more often than not, you don't know what the format will be at coding time.

So I need to support customized format, it has to be flexible enough, yet could be easily understood by the application. It looks like a daunting task, until I came across the idea of regular expression.

Using the Code

It's very simple to use, since there isn't much in it other than the idea of using regular expression. The demo project contains two formats described in formats.xml, and two sample input files.
You need to

1. Add flex_format.cs to your project
2. Load formats information stored in XML file during initialization

See full detail: