Vince's CSV Parser
|
Stuff that is generally not of interest to end-users. More...
Classes | |
class | ThreadSafeDeque |
A std::deque wrapper which allows multiple read and write threads to concurrently access it along with providing read threads the ability to wait for the deque to become populated. More... | |
class | IBasicCSVParser |
Abstract base class which provides CSV parsing logic. More... | |
class | StreamParser |
A class for parsing CSV data from a std::stringstream or an std::ifstream More... | |
class | MmapParser |
Parser for memory-mapped files. More... | |
struct | ColNames |
A data structure for handling column name information. More... | |
struct | GuessScore |
struct | RawCSVField |
A barebones class used for describing CSV fields. More... | |
class | CSVFieldList |
A class used for efficiently storing RawCSVField objects and expanding as necessary. More... | |
struct | RawCSVData |
A class for storing raw CSV data and associated metadata. More... | |
Typedefs | |
using | ColNamesPtr = std::shared_ptr< ColNames > |
using | ParseFlagMap = std::array< ParseFlags, 256 > |
An array which maps ASCII chars to a parsing flag. | |
using | WhitespaceMap = std::array< bool, 256 > |
An array which maps ASCII chars to a flag indicating if it is whitespace. | |
using | RawCSVDataPtr = std::shared_ptr< RawCSVData > |
Enumerations | |
enum class | ParseFlags { QUOTE_ESCAPE_QUOTE = 0 , QUOTE = 2 | 1 , NOT_SPECIAL = 4 , DELIMITER = 4 | 2 , NEWLINE = 4 | 2 | 1 } |
An enum used for describing the significance of each character with respect to CSV parsing. More... | |
Functions | |
size_t | get_file_size (csv::string_view filename) |
std::string | get_csv_head (csv::string_view filename) |
std::string | get_csv_head (csv::string_view filename, size_t file_size) |
Read the first 500KB of a CSV file. | |
HEDLEY_CONST CONSTEXPR_17 ParseFlagMap | make_parse_flags (char delimiter) |
Create a vector v where each index i corresponds to the ASCII number for a character and, v[i + 128] labels it according to the CSVReader::ParseFlags enum. | |
HEDLEY_CONST CONSTEXPR_17 ParseFlagMap | make_parse_flags (char delimiter, char quote_char) |
Create a vector v where each index i corresponds to the ASCII number for a character and, v[i + 128] labels it according to the CSVReader::ParseFlags enum. | |
HEDLEY_CONST CONSTEXPR_17 WhitespaceMap | make_ws_flags (const char *ws_chars, size_t n_chars) |
Create a vector v where each index i corresponds to the ASCII number for a character c and, v[i + 128] is true if c is a whitespace character. | |
WhitespaceMap | make_ws_flags (const std::vector< char > &flags) |
template<typename T > | |
bool | is_equal (T a, T b, T epsilon=0.001) |
constexpr ParseFlags | quote_escape_flag (ParseFlags flag, bool quote_escape) noexcept |
Transform the ParseFlags given the context of whether or not the current field is quote escaped. | |
STATIC_ASSERT (ParseFlags::DELIMITER< ParseFlags::NEWLINE) | |
STATIC_ASSERT (quote_escape_flag(ParseFlags::NOT_SPECIAL, false)==ParseFlags::NOT_SPECIAL) | |
Optimizations for reducing branching in parsing loop. More... | |
STATIC_ASSERT (quote_escape_flag(ParseFlags::QUOTE, false)==ParseFlags::QUOTE) | |
STATIC_ASSERT (quote_escape_flag(ParseFlags::DELIMITER, false)==ParseFlags::DELIMITER) | |
STATIC_ASSERT (quote_escape_flag(ParseFlags::NEWLINE, false)==ParseFlags::NEWLINE) | |
STATIC_ASSERT (quote_escape_flag(ParseFlags::NOT_SPECIAL, true)==ParseFlags::NOT_SPECIAL) | |
STATIC_ASSERT (quote_escape_flag(ParseFlags::QUOTE, true)==ParseFlags::QUOTE_ESCAPE_QUOTE) | |
STATIC_ASSERT (quote_escape_flag(ParseFlags::DELIMITER, true)==ParseFlags::NOT_SPECIAL) | |
STATIC_ASSERT (quote_escape_flag(ParseFlags::NEWLINE, true)==ParseFlags::NOT_SPECIAL) | |
std::string | format_row (const std::vector< std::string > &row, csv::string_view delim) |
std::vector< std::string > | _get_col_names (csv::string_view head, CSVFormat format) |
Return a CSV's column names. More... | |
GuessScore | calculate_score (csv::string_view head, CSVFormat format) |
CSVGuessResult | _guess_format (csv::string_view head, const std::vector< char > &delims) |
Guess the delimiter used by a delimiter-separated values file. More... | |
std::string | json_escape_string (csv::string_view s) noexcept |
template<typename T = int> | |
T | csv_abs (T x) |
Calculate the absolute value of a number. | |
template<> | |
int | csv_abs (int x) |
template<> | |
long int | csv_abs (long int x) |
template<> | |
long long int | csv_abs (long long int x) |
template<> | |
float | csv_abs (float x) |
template<> | |
double | csv_abs (double x) |
template<> | |
long double | csv_abs (long double x) |
template<typename T , csv::enable_if_t< std::is_arithmetic< T >::value, int > = 0> | |
int | num_digits (T x) |
Calculate the number of digits in a number. | |
template<typename T , csv::enable_if_t< std::is_unsigned< T >::value, int > = 0> | |
std::string | to_string (T value) |
to_string() for unsigned integers More... | |
template<typename T > | |
HEDLEY_CONST CONSTEXPR_14 long double | pow10 (const T &n) noexcept |
Compute 10 to the power of n. | |
template<> | |
HEDLEY_CONST CONSTEXPR_14 long double | pow10 (const unsigned &n) noexcept |
Compute 10 to the power of n. | |
template<size_t Bytes> | |
CONSTEXPR_14 long double | get_int_max () |
Given a byte size, return the largest number than can be stored in an integer of that size. More... | |
template<size_t Bytes> | |
CONSTEXPR_14 long double | get_uint_max () |
Given a byte size, return the largest number than can be stored in an unsigned integer of that size. | |
HEDLEY_PRIVATE CONSTEXPR_14 DataType | _process_potential_exponential (csv::string_view exponential_part, const long double &coeff, long double *const out) |
Given a pointer to the start of what is start of the exponential part of a number written (possibly) in scientific notation parse the exponent. | |
HEDLEY_PRIVATE HEDLEY_PURE CONSTEXPR_14 DataType | _determine_integral_type (const long double &number) noexcept |
Given the absolute value of an integer, determine what numeric type it fits in. | |
CONSTEXPR_14 DataType | data_type (csv::string_view in, long double *const out, const char decimalSymbol) |
Distinguishes numeric from other text values. More... | |
Variables | |
constexpr const int | UNINITIALIZED_FIELD = -1 |
const int | PAGE_SIZE = 4096 |
Size of a memory page in bytes. More... | |
constexpr size_t | ITERATION_CHUNK_SIZE = 10000000 |
For functions that lazy load a large CSV, this determines how many bytes are read at a time. | |
CONSTEXPR_VALUE_14 long double | CSV_INT8_MAX = get_int_max<1>() |
Largest number that can be stored in a 8-bit integer. | |
CONSTEXPR_VALUE_14 long double | CSV_INT16_MAX = get_int_max<2>() |
Largest number that can be stored in a 16-bit integer. | |
CONSTEXPR_VALUE_14 long double | CSV_INT32_MAX = get_int_max<4>() |
Largest number that can be stored in a 32-bit integer. | |
CONSTEXPR_VALUE_14 long double | CSV_INT64_MAX = get_int_max<8>() |
Largest number that can be stored in a 64-bit integer. | |
CONSTEXPR_VALUE_14 long double | CSV_UINT8_MAX = get_uint_max<1>() |
Largest number that can be stored in a 8-bit ungisned integer. | |
CONSTEXPR_VALUE_14 long double | CSV_UINT16_MAX = get_uint_max<2>() |
Largest number that can be stored in a 16-bit unsigned integer. | |
CONSTEXPR_VALUE_14 long double | CSV_UINT32_MAX = get_uint_max<4>() |
Largest number that can be stored in a 32-bit unsigned integer. | |
CONSTEXPR_VALUE_14 long double | CSV_UINT64_MAX = get_uint_max<8>() |
Largest number that can be stored in a 64-bit unsigned integer. | |
Stuff that is generally not of interest to end-users.
|
strong |
An enum used for describing the significance of each character with respect to CSV parsing.
Definition at line 166 of file common.hpp.
std::vector< std::string > csv::internals::_get_col_names | ( | csv::string_view | head, |
CSVFormat | format | ||
) |
Return a CSV's column names.
[in] | filename | Path to CSV file |
[in] | format | Format of the CSV file |
Definition at line 28 of file csv_reader.cpp.
CSVGuessResult csv::internals::_guess_format | ( | csv::string_view | head, |
const std::vector< char > & | delims | ||
) |
Guess the delimiter used by a delimiter-separated values file.
For each delimiter, find out which row length was most common. The delimiter with the longest mode row length wins. Then, the line number of the header row is the first row with the mode row length.
Definition at line 93 of file csv_reader.cpp.
CONSTEXPR_14 DataType csv::internals::data_type | ( | csv::string_view | in, |
long double *const | out, | ||
const char | decimalSymbol | ||
) |
Distinguishes numeric from other text values.
Used by various type casting functions, like csv_parser::CSVReader::read_row()
[in] | in | String value to be examined |
[out] | out | Pointer to long double where results of numeric parsing get stored |
[in] | decimalSymbol | the character separating integral and decimal part, defaults to '.' if omitted |
Definition at line 242 of file data_type.hpp.
std::string csv::internals::format_row | ( | const std::vector< std::string > & | row, |
csv::string_view | delim | ||
) |
Print a CSV row
Definition at line 9 of file csv_reader.cpp.
CONSTEXPR_14 long double csv::internals::get_int_max | ( | ) |
Given a byte size, return the largest number than can be stored in an integer of that size.
Note: Provides a platform-agnostic way of mapping names like "long int" to byte sizes
Definition at line 105 of file data_type.hpp.
|
inline |
Returns true if two floating point values are about the same
Definition at line 154 of file common.hpp.
csv::internals::STATIC_ASSERT | ( | quote_escape_flag(ParseFlags::NOT_SPECIAL, false) | = =ParseFlags::NOT_SPECIAL | ) |
Optimizations for reducing branching in parsing loop.
Idea: The meaning of all non-quote characters changes depending on whether or not the parser is in a quote-escaped mode (0 or 1)
|
inline |
to_string() for unsigned integers
to_string() for floating point numbers
to_string() for signed integers
Definition at line 82 of file csv_writer.hpp.
const int csv::internals::PAGE_SIZE = 4096 |
Size of a memory page in bytes.
Used by csv::internals::CSVFieldArray when allocating blocks.
Definition at line 145 of file common.hpp.