Vince's CSV Parser
|
This is the detailed documentation for Vince's CSV library. For quick examples, go to this project's GitHub page.
Dealing with Variable Length CSV Rows
See "How does automatic delimiter detection work?"
First, the CSV reader attempts to parse the first 100 lines of a CSV file as if the delimiter were a pipe, tab, comma, etc. Out of all the possible delimiter choices, the delimiter which produces the highest number of rows * columns
(where all rows are of a consistent length) is chosen as the winner.
However, if the CSV file has leading comments, or has less than 100 lines, a second heuristic will be used. The CSV reader again parses the first 100 lines using each candidate delimiter, but tallies up the length of each row parsed. Then, the delimiter with the largest most common row length n
is chosen as the winner, and the line number where the first row of length n
occurs is chosen as the starting row.
Because you can subclass csv::CSVReader, you can implement your own guessing hueristic. csv::internals::CSVGuesser may be used as a helpful guide in doing so.
This library already does a lot of work behind the scenes to use threads to squeeze performance from your CPU. However, ambitious users who are in the mood for experimenting should follow these guidelines:
CSVRow
objects together and create separate threads to process each columncsv::CSVField
objects