CSV

This class provides a complete interface to CSV files and data. It offers tools to enable you to read and write to and from Strings or IO objects, as needed.

Reading

From a `File`

A Line at a `Time`

CSV.foreach("path/to/file.csv") do |row|
  # use row here...
end

All at Once

arr_of_arrs = CSV.read("path/to/file.csv")

From a String

A Line at a `Time`

CSV.parse("CSV,data,String") do |row|
  # use row here...
end

All at Once

arr_of_arrs = CSV.parse("CSV,data,String")

Writing

To a `File`

CSV.open("path/to/file.csv", "wb") do |csv|
  csv << ["row", "of", "CSV", "data"]
  csv << ["another", "row"]
  # ...
end

To a String

csv_string = CSV.generate do |csv|
  csv << ["row", "of", "CSV", "data"]
  csv << ["another", "row"]
  # ...
end

Convert a Single Line

csv_string = ["CSV", "data"].to_csv   # to CSV
csv_array  = "CSV,String".parse_csv   # from CSV

Shortcut Interface

CSV             { |csv_out| csv_out << %w{my data here} }  # to $stdout
CSV(csv = "")   { |csv_str| csv_str << %w{my data here} }  # to a String
CSV($stderr)    { |csv_err| csv_err << %w{my data here} }  # to $stderr
CSV($stdin)     { |csv_in|  csv_in.each { |row| p row } }  # from $stdin

Advanced Usage

Wrap an `IO` `Object`

csv = CSV.new(io, options)
# ... read (with gets() or each()) from and write (with <<) to csv here ...

`CSV` and Character Encodings (M17n or Multilingualization)

This new CSV parser is m17n savvy. The parser works in the Encoding of the IO or String object being read from or written to. Your data is never transcoded (unless you ask Ruby to transcode it for you) and will literally be parsed in the Encoding it is in. Thus CSV will return Arrays or Rows of Strings in the Encoding of your data. This is accomplished by transcoding the parser itself into your Encoding.

Some transcoding must take place, of course, to accomplish this multiencoding support. For example, :col_sep, :row_sep, and :quote_char must be transcoded to match your data. Hopefully this makes the entire process feel transparent, since CSV’s defaults should just magically work for your data. However, you can set these values manually in the target Encoding to avoid the translation.

It’s also important to note that while all of CSV’s core parser is now Encoding agnostic, some features are not. For example, the built-in converters will try to transcode data to UTF-8 before making conversions. Again, you can provide custom converters that are aware of your Encodings to avoid this translation. It’s just too hard for me to support native conversions in all of Ruby’s Encodings.

Anyway, the practical side of this is simple: make sure IO and String objects passed into CSV have the proper Encoding set and everything should just work. CSV methods that allow you to open IO objects (CSV::foreach(), CSV::open(), CSV::read(), and CSV::readlines()) do allow you to specify the Encoding.

One minor exception comes when generating CSV into a String with an Encoding that is not ASCII compatible. There’s no existing data for CSV to use to prepare itself and thus you will probably need to manually specify the desired Encoding for most of those cases. It will try to guess using the fields in a row of output though, when using CSV::generate_line() or Array#to_csv().

I try to point out any other Encoding issues in the documentation of methods as they come up.

This has been tested to the best of my ability with all non-“dummy” Encodings Ruby ships with. However, it is brave new code and may have some bugs. Please feel free to report any issues you find with it.

Constants

ConverterEncoding

The encoding used by all converters.

Converters

This Hash holds the built-in converters of CSV that can be accessed by name. You can select Converters with CSV.convert() or through the options Hash passed to CSV::new().

:integer: Converts any field Integer() accepts.
:float: Converts any field Float() accepts.
:numeric: A combination of :integer and :float.
:date: Converts any field Date::parse() accepts.
:date_time: Converts any field DateTime::parse() accepts.
:all: All built-in converters. A combination of :date_time and :numeric.

All built-in converters transcode field data to UTF-8 before attempting a conversion. If your data cannot be transcoded to UTF-8 the conversion will fail and the field will remain unchanged.

This Hash is intentionally left unfrozen and users should feel free to add values to it that can be accessed by all CSV objects.

To add a combo field, the value should be an Array of names. Combo fields can be nested with other combo fields.

DEFAULT_OPTIONS

The options used when no overrides are given by calling code. They are:

:col_sep: ","
:row_sep: :auto
:quote_char: '"'
:field_size_limit: nil
:converters: nil
:unconverted_fields: nil
:headers: false
:return_headers: false
:header_converters: nil
:skip_blanks: false
:force_quotes: false
:skip_lines: nil

DateMatcher

A Regexp used to find and convert some common Date formats.

DateTimeMatcher

A Regexp used to find and convert some common DateTime formats.

FieldInfo

A FieldInfo Struct contains details about a field’s position in the data source it was read from. CSV will pass this Struct to some blocks that make decisions based on field structure. See CSV.convert_fields() for an example.

index: The zero-based index of the field in its row.
line: The line of the data source this row is from.
header: The header for the column, when available.

HeaderConverters

This Hash holds the built-in header converters of CSV that can be accessed by name. You can select HeaderConverters with CSV.header_convert() or through the options Hash passed to CSV::new().

:downcase: Calls downcase() on the header String.
:symbol: The header String is downcased, spaces are replaced with underscores, non-word characters are dropped, and finally to_sym() is called.

All built-in header converters transcode header data to UTF-8 before attempting a conversion. If your data cannot be transcoded to UTF-8 the conversion will fail and the header will remain unchanged.

This Hash is intentionally left unfrozen and users should feel free to add values to it that can be accessed by all CSV objects.

To add a combo field, the value should be an Array of names. Combo fields can be nested with other combo fields.

VERSION

The version of the installed library.

Attributes

col_sep

Read

The encoded :col_sep used in parsing and writing. See CSV::new for details.

encoding

Read

The Encoding CSV is parsing or writing in. This will be the Encoding you receive parsed data in and/or the Encoding data will be written in.

field_size_limit

Read

The limit for field size, if any. See CSV::new for details.

lineno

Read

The line number of the last row read from this file. Fields with nested line-end characters will not affect this count.

quote_char

Read

The encoded :quote_char used in parsing and writing. See CSV::new for details.

row_sep

Read

The encoded :row_sep used in parsing and writing. See CSV::new for details.

skip_lines

Read

The regex marking a line as a comment. See CSV::new for details

Class Methods

Reading

From a File

A Line at a Time

All at Once

From a String

A Line at a Time

All at Once

Writing

To a File

To a String

Convert a Single Line

Shortcut Interface

Advanced Usage

Wrap an IO Object

CSV and Character Encodings (M17n or Multilingualization)

filter( options = Hash.new ) { |row| ... }

filter( input, options = Hash.new ) { |row| ... }

filter( input, output, options = Hash.new ) { |row| ... }

foreach(path, options = Hash.new, &block)

generate( str, options = Hash.new ) { |csv| ... }

generate( options = Hash.new ) { |csv| ... }

generate_line(row, options = Hash.new)

instance(data = $stdout, options = Hash.new) { |instance| ... }

new(data, options = Hash.new)

open( filename, mode = "rb", options = Hash.new ) { |faster_csv| ... }

open( filename, options = Hash.new ) { |faster_csv| ... }

open( filename, mode = "rb", options = Hash.new )

open( filename, options = Hash.new )

parse( str, options = Hash.new ) { |row| ... }

parse( str, options = Hash.new )

parse_line(line, options = Hash.new)

read(path, *options)

readlines(*args)

table(path, options = Hash.new)

<<(row)

add_converter(var_name, const, name = nil, &converter)

add_row(row)

add_unconverted_fields(row, fields)

convert( name )

convert { |field| ... }

convert { |field, field_info| ... }

convert_fields(fields, headers = false)

converters

each() { |row| ... }

encode_re(*chunks)

encode_str(*chunks)

escape_re(str)

force_quotes?

gets

header_convert( name )

header_convert { |field| ... }

header_convert { |field, field_info| ... }

header_converters

header_row?

headers

init_comments(options)

init_converters(options, field_name = :converters)

init_headers(options)

init_parsers(options)

init_separators(options)

inspect

parse_headers(row = nil)

puts(row)

raw_encoding(default = Encoding::ASCII_8BIT)

read

readline

readlines

return_headers?

rewind

shift

skip_blanks?

unconverted_fields?

write_headers?

From a `File`

A Line at a `Time`

A Line at a `Time`

To a `File`

Wrap an `IO` `Object`

`CSV` and Character Encodings (M17n or Multilingualization)