Vince's CSV Parser
csv::internals Namespace Reference

Stuff that is generally not of interest to end-users. More...

Classes

class  ThreadSafeDeque
 A std::deque wrapper which allows multiple read and write threads to concurrently access it along with providing read threads the ability to wait for the deque to become populated. More...
 
class  IBasicCSVParser
 Abstract base class which provides CSV parsing logic. More...
 
class  StreamParser
 A class for parsing CSV data from a std::stringstream or an std::ifstream More...
 
class  MmapParser
 Parser for memory-mapped files. More...
 
struct  ColNames
 A data structure for handling column name information. More...
 
struct  GuessScore
 
struct  RawCSVField
 A barebones class used for describing CSV fields. More...
 
class  CSVFieldList
 A class used for efficiently storing RawCSVField objects and expanding as necessary. More...
 
struct  RawCSVData
 A class for storing raw CSV data and associated metadata. More...
 

Typedefs

using ColNamesPtr = std::shared_ptr< ColNames >
 
using ParseFlagMap = std::array< ParseFlags, 256 >
 An array which maps ASCII chars to a parsing flag.
 
using WhitespaceMap = std::array< bool, 256 >
 An array which maps ASCII chars to a flag indicating if it is whitespace.
 
using RawCSVDataPtr = std::shared_ptr< RawCSVData >
 

Enumerations

enum class  ParseFlags {
  QUOTE_ESCAPE_QUOTE = 0 , QUOTE = 2 | 1 , NOT_SPECIAL = 4 , DELIMITER = 4 | 2 ,
  NEWLINE = 4 | 2 | 1
}
 An enum used for describing the significance of each character with respect to CSV parsing. More...
 

Functions

size_t get_file_size (csv::string_view filename)
 
std::string get_csv_head (csv::string_view filename)
 
std::string get_csv_head (csv::string_view filename, size_t file_size)
 Read the first 500KB of a CSV file.
 
HEDLEY_CONST CONSTEXPR_17 ParseFlagMap make_parse_flags (char delimiter)
 Create a vector v where each index i corresponds to the ASCII number for a character and, v[i + 128] labels it according to the CSVReader::ParseFlags enum.
 
HEDLEY_CONST CONSTEXPR_17 ParseFlagMap make_parse_flags (char delimiter, char quote_char)
 Create a vector v where each index i corresponds to the ASCII number for a character and, v[i + 128] labels it according to the CSVReader::ParseFlags enum.
 
HEDLEY_CONST CONSTEXPR_17 WhitespaceMap make_ws_flags (const char *ws_chars, size_t n_chars)
 Create a vector v where each index i corresponds to the ASCII number for a character c and, v[i + 128] is true if c is a whitespace character.
 
WhitespaceMap make_ws_flags (const std::vector< char > &flags)
 
template<typename T >
bool is_equal (T a, T b, T epsilon=0.001)
 
constexpr ParseFlags quote_escape_flag (ParseFlags flag, bool quote_escape) noexcept
 Transform the ParseFlags given the context of whether or not the current field is quote escaped.
 
 STATIC_ASSERT (ParseFlags::DELIMITER< ParseFlags::NEWLINE)
 
 STATIC_ASSERT (quote_escape_flag(ParseFlags::NOT_SPECIAL, false)==ParseFlags::NOT_SPECIAL)
 Optimizations for reducing branching in parsing loop. More...
 
 STATIC_ASSERT (quote_escape_flag(ParseFlags::QUOTE, false)==ParseFlags::QUOTE)
 
 STATIC_ASSERT (quote_escape_flag(ParseFlags::DELIMITER, false)==ParseFlags::DELIMITER)
 
 STATIC_ASSERT (quote_escape_flag(ParseFlags::NEWLINE, false)==ParseFlags::NEWLINE)
 
 STATIC_ASSERT (quote_escape_flag(ParseFlags::NOT_SPECIAL, true)==ParseFlags::NOT_SPECIAL)
 
 STATIC_ASSERT (quote_escape_flag(ParseFlags::QUOTE, true)==ParseFlags::QUOTE_ESCAPE_QUOTE)
 
 STATIC_ASSERT (quote_escape_flag(ParseFlags::DELIMITER, true)==ParseFlags::NOT_SPECIAL)
 
 STATIC_ASSERT (quote_escape_flag(ParseFlags::NEWLINE, true)==ParseFlags::NOT_SPECIAL)
 
std::string format_row (const std::vector< std::string > &row, csv::string_view delim)
 
std::vector< std::string > _get_col_names (csv::string_view head, CSVFormat format)
 Return a CSV's column names. More...
 
GuessScore calculate_score (csv::string_view head, CSVFormat format)
 
CSVGuessResult _guess_format (csv::string_view head, const std::vector< char > &delims)
 Guess the delimiter used by a delimiter-separated values file. More...
 
std::string json_escape_string (csv::string_view s) noexcept
 
template<typename T = int>
csv_abs (T x)
 Calculate the absolute value of a number.
 
template<>
int csv_abs (int x)
 
template<>
long int csv_abs (long int x)
 
template<>
long long int csv_abs (long long int x)
 
template<>
float csv_abs (float x)
 
template<>
double csv_abs (double x)
 
template<>
long double csv_abs (long double x)
 
template<typename T , csv::enable_if_t< std::is_arithmetic< T >::value, int > = 0>
int num_digits (T x)
 Calculate the number of digits in a number.
 
template<typename T , csv::enable_if_t< std::is_unsigned< T >::value, int > = 0>
std::string to_string (T value)
 to_string() for unsigned integers More...
 
template<typename T >
HEDLEY_CONST CONSTEXPR_14 long double pow10 (const T &n) noexcept
 Compute 10 to the power of n.
 
template<>
HEDLEY_CONST CONSTEXPR_14 long double pow10 (const unsigned &n) noexcept
 Compute 10 to the power of n.
 
template<size_t Bytes>
CONSTEXPR_14 long double get_int_max ()
 Given a byte size, return the largest number than can be stored in an integer of that size. More...
 
template<size_t Bytes>
CONSTEXPR_14 long double get_uint_max ()
 Given a byte size, return the largest number than can be stored in an unsigned integer of that size.
 
HEDLEY_PRIVATE CONSTEXPR_14 DataType _process_potential_exponential (csv::string_view exponential_part, const long double &coeff, long double *const out)
 Given a pointer to the start of what is start of the exponential part of a number written (possibly) in scientific notation parse the exponent.
 
HEDLEY_PRIVATE HEDLEY_PURE CONSTEXPR_14 DataType _determine_integral_type (const long double &number) noexcept
 Given the absolute value of an integer, determine what numeric type it fits in.
 
CONSTEXPR_14 DataType data_type (csv::string_view in, long double *const out, const char decimalSymbol)
 Distinguishes numeric from other text values. More...
 

Variables

constexpr const int UNINITIALIZED_FIELD = -1
 
const int PAGE_SIZE = 4096
 Size of a memory page in bytes. More...
 
constexpr size_t ITERATION_CHUNK_SIZE = 10000000
 For functions that lazy load a large CSV, this determines how many bytes are read at a time.
 
CONSTEXPR_VALUE_14 long double CSV_INT8_MAX = get_int_max<1>()
 Largest number that can be stored in a 8-bit integer.
 
CONSTEXPR_VALUE_14 long double CSV_INT16_MAX = get_int_max<2>()
 Largest number that can be stored in a 16-bit integer.
 
CONSTEXPR_VALUE_14 long double CSV_INT32_MAX = get_int_max<4>()
 Largest number that can be stored in a 32-bit integer.
 
CONSTEXPR_VALUE_14 long double CSV_INT64_MAX = get_int_max<8>()
 Largest number that can be stored in a 64-bit integer.
 
CONSTEXPR_VALUE_14 long double CSV_UINT8_MAX = get_uint_max<1>()
 Largest number that can be stored in a 8-bit ungisned integer.
 
CONSTEXPR_VALUE_14 long double CSV_UINT16_MAX = get_uint_max<2>()
 Largest number that can be stored in a 16-bit unsigned integer.
 
CONSTEXPR_VALUE_14 long double CSV_UINT32_MAX = get_uint_max<4>()
 Largest number that can be stored in a 32-bit unsigned integer.
 
CONSTEXPR_VALUE_14 long double CSV_UINT64_MAX = get_uint_max<8>()
 Largest number that can be stored in a 64-bit unsigned integer.
 

Detailed Description

Stuff that is generally not of interest to end-users.

Enumeration Type Documentation

◆ ParseFlags

An enum used for describing the significance of each character with respect to CSV parsing.

See also
quote_escape_flag
Enumerator
QUOTE_ESCAPE_QUOTE 

A quote inside or terminating a quote_escaped field.

QUOTE 

Characters which may signify a quote escape.

NOT_SPECIAL 

Characters with no special meaning or escaped delimiters and newlines.

DELIMITER 

Characters which signify a new field.

NEWLINE 

Characters which signify a new row.

Definition at line 166 of file common.hpp.

Function Documentation

◆ _get_col_names()

std::vector< std::string > csv::internals::_get_col_names ( csv::string_view  head,
CSVFormat  format 
)

Return a CSV's column names.

Parameters
[in]filenamePath to CSV file
[in]formatFormat of the CSV file

Definition at line 28 of file csv_reader.cpp.

◆ _guess_format()

CSVGuessResult csv::internals::_guess_format ( csv::string_view  head,
const std::vector< char > &  delims 
)

Guess the delimiter used by a delimiter-separated values file.

For each delimiter, find out which row length was most common. The delimiter with the longest mode row length wins. Then, the line number of the header row is the first row with the mode row length.

Definition at line 93 of file csv_reader.cpp.

◆ data_type()

CONSTEXPR_14 DataType csv::internals::data_type ( csv::string_view  in,
long double *const  out,
const char  decimalSymbol 
)

Distinguishes numeric from other text values.

Used by various type casting functions, like csv_parser::CSVReader::read_row()

Rules

  • Leading and trailing whitespace ("padding") ignored
  • A string of just whitespace is NULL
Parameters
[in]inString value to be examined
[out]outPointer to long double where results of numeric parsing get stored
[in]decimalSymbolthe character separating integral and decimal part, defaults to '.' if omitted

Definition at line 242 of file data_type.hpp.

◆ format_row()

std::string csv::internals::format_row ( const std::vector< std::string > &  row,
csv::string_view  delim 
)

Print a CSV row

Definition at line 9 of file csv_reader.cpp.

◆ get_int_max()

template<size_t Bytes>
CONSTEXPR_14 long double csv::internals::get_int_max ( )

Given a byte size, return the largest number than can be stored in an integer of that size.

Note: Provides a platform-agnostic way of mapping names like "long int" to byte sizes

Definition at line 105 of file data_type.hpp.

◆ is_equal()

template<typename T >
bool csv::internals::is_equal ( a,
b,
epsilon = 0.001 
)
inline

Returns true if two floating point values are about the same

Definition at line 154 of file common.hpp.

◆ STATIC_ASSERT()

csv::internals::STATIC_ASSERT ( quote_escape_flag(ParseFlags::NOT_SPECIAL, false)  = =ParseFlags::NOT_SPECIAL)

Optimizations for reducing branching in parsing loop.

Idea: The meaning of all non-quote characters changes depending on whether or not the parser is in a quote-escaped mode (0 or 1)

◆ to_string()

template<typename T , csv::enable_if_t< std::is_unsigned< T >::value, int > = 0>
std::string csv::internals::to_string ( value)
inline

to_string() for unsigned integers

to_string() for floating point numbers

to_string() for signed integers

Definition at line 82 of file csv_writer.hpp.

Variable Documentation

◆ PAGE_SIZE

const int csv::internals::PAGE_SIZE = 4096

Size of a memory page in bytes.

Used by csv::internals::CSVFieldArray when allocating blocks.

Definition at line 145 of file common.hpp.