You can find the Javadoc package list at the bottom of this page.

Introducing Commons CSV

Apache Commons CSV reads and writes files in variations of the Comma Separated Value (CSV) format.

Common CSV formats are predefined in the CSVFormat class:

CSV Formats
CSVFormat	Description	Since Version
DEFAULT	IO for the Standard Comma Separated Value format, like RFC 4180 but allowing empty lines.	1.0
EXCEL	IO for the Microsoft Excel CSV. format.	1.0
INFORMIX_UNLOAD	IO for the Informix UNLOAD TO file_name command.	1.3
INFORMIX_UNLOAD_CSV	IO for the Informix UNLOAD CSV TO file_name command with escaping disabled.	1.3
MONGODB_CSV	IO for the MongoDB CSV `mongoexport` command.	1.7
MONGODB_TSV	IO for the MongoDB Tab Separated Values (TSV)`mongoexport` command.	1.7
MYSQL	IO for the MySQL CSV format.	1.0
ORACLE	IO for the Oracle CSV format of the SQL*Loader utility.	1.6
POSTGRESQL_CSV	IO for the PostgreSQL CSV format used by the `COPY` operation.	1.5
POSTGRESQL_TEXT	IO for the PostgreSQL Text format used by the `COPY` operation.	1.5
RFC4180	IO for the RFC-4180 format defined byRFC 4180.	1.0
TDF	IO for the Tab Delimited Format (also known as Tab Separated Values).	1.0

Custom formats can be created using a fluent style API.

Parsing Standard CSV Files

Parsing files with Apache Commons CSV is relatively straight forward. Pick a CSVFormat and go from there.

Parsing an Excel CSV File

To parse an Excel CSV file, write:

        
Reader in = new FileReader("path/to/file.csv");
Iterable<CSVRecord> records = CSVFormat.EXCEL.parse(in);
for (CSVRecord record : records) {
    String lastName = record.get("Last Name");
    String firstName = record.get("First Name");
}

Parsing Custom CSV Files

You can define your own using IO rules by building your own CSVFormat instance. Starting with CSVFormat.builder() lets you start from a predefined format and customize. For example:

      
CSVFormat myFormat = CSVFormat.DEFAULT.builder()
    .setCommentMarker('#')
    .setEscape('+')
    .setIgnoreSurroundingSpaces(true)
    .setQuote('"')
    .setQuoteMode(QuoteMode.ALL)
    .get()

Handling Byte Order Marks

To handle files that start with a Byte Order Mark (BOM), like some Excel CSV files, you need an extra step to deal with the optional BOM bytes. Using the BOMInputStream class from Apache Commons IO simplifies this task; for example:

        
try (Reader reader = new InputStreamReader(BOMInputStream.builder()
        .setPath(path)
        .get(), "UTF-8");
        CSVParser parser = CSVFormat.EXCEL.builder()
                .setHeader()
                .get()
                .parse(reader)) {
    for (CSVRecord record : parser) {
        String string = record.get("ColumnA");
        // ...
    }
}

You might find it handy to create something like this:

        
/**
 * Creates a reader capable of handling BOMs.
 *
 * @param path The path to read.
 * @return a new InputStreamReader for UTF-8 bytes.
 * @throws IOException if an I/O error occurs.
 */
public InputStreamReader newReader(final Path path) throws IOException {
    return new InputStreamReader(BOMInputStream.builder()
            .setPath(path)
            .get(), StandardCharsets.UTF_8);
}

Using Headers

Apache Commons CSV provides several ways to access record values. The simplest way is to access values by their index in the record. However, columns in CSV files often have a name, for example: ID, CustomerNo, Birthday, etc. The CSVFormat class provides an API for specifying these header names and CSVRecord on the other hand has methods to access values by their corresponding header name.

Accessing column values by index

To access a record value by index, no special configuration of the CSVFormat is necessary:

        
Reader in = new FileReader("path/to/file.csv");
Iterable<CSVRecord> records = CSVFormat.RFC4180.parse(in);
for (CSVRecord record : records) {
    String columnOne = record.get(0);
    String columnTwo = record.get(1);
}

Defining a header manually

Indices may not be the most intuitive way to access record values. For this reason it is possible to assign names to each column in the file:

        
Reader in = new FileReader("path/to/file.csv");
Iterable<CSVRecord> records = CSVFormat.RFC4180.builder()
  .setHeader("ID", "CustomerNo", "Name")
  .build()
  .parse(in);
for (CSVRecord record : records) {
    String id = record.get("ID");
    String customerNo = record.get("CustomerNo");
    String name = record.get("Name");
}

Note that column values can still be accessed using their index.

Using an enum to define a header

Using String values all over the code to reference columns can be error prone. For this reason, it is possible to define an enum to specify header names. Note that the enum constant names are used to access column values. This may lead to enums constant names which do not follow the Java coding standard of defining constants in upper case with underscores:

        
public enum Headers {
    ID, CustomerNo, Name
}
Reader in = new FileReader("path/to/file.csv");
Iterable<CSVRecord> records = CSVFormat.RFC4180.builder()
  .setHeader(Headers.class)
  .build()
  .parse(in);
for (CSVRecord record : records) {
    String id = record.get(Headers.ID);
    String customerNo = record.get(Headers.CustomerNo);
    String name = record.get(Headers.Name);
}

Again it is possible to access values by their index and by using a String (for example "CustomerNo").

Header auto detection

Some CSV files define header names in their first record. If configured, Apache Commons CSV can parse the header names from the first record:

        
Reader in = new FileReader("path/to/file.csv");
Iterable<CSVRecord> records = CSVFormat.RFC4180.builder()
  .setHeader()
  .setSkipHeaderRecord(true)
  .build()
  .parse(in);
for (CSVRecord record : records) {
    String id = record.get("ID");
    String customerNo = record.get("CustomerNo");
    String name = record.get("Name");
}

This will use the values from the first record as header names and skip the first record when iterating.

Printing with headers

To print a CSV file with headers, you specify the headers in the format:

        
Appendable out = ...;
CSVPrinter printer = CSVFormat.DEFAULT.builder()
  .setHeader("H1", "H2")
  .build()
  .print(out);

To print a CSV file with JDBC column labels, you specify the ResultSet in the format:

        
try (ResultSet resultSet = ...) {
    CSVPrinter printer = CSVFormat.DEFAULT.builder()
      .setHeader(resultSet)
      .build()
      .print(out);
}

Working with JDBC

Exporting JDBC Result Sets

To export row data from a JDBC ResultSet , use CSVPrinter.printRecords(ResultSet) :

        
        final StringWriter sw = new StringWriter();
        final CSVFormat csvFormat = CSVFormat.DEFAULT;
        try (Connection connection = DriverManager.getConnection("jdbc:h2:mem:my_test;", "sa", "")) {
            try (Statement stmt = connection.createStatement();
                    CSVPrinter printer = new CSVPrinter(sw, csvFormat);
                    ResultSet resultSet = stmt.executeQuery("select ID, NAME, TEXT, BIN_DATA from TEST")) {
                printer.printRecords(resultSet);
            }
        }
        final String csv = sw.toString();
        System.out.println(csv);

Limiting rows from JDBC Result Sets

SQL lets you limit how many rows a SELECT statement returns with the LIMIT clause.

When you can't or don't want to change the SQL used to generate rows, JDBC lets you limit how many rows a JDBC Statement returns with the Statement.setMaxRows(int) method.

When you get a JDBC ResultSet from an API like DatabaseMetaData.getProcedures(...), there is no SQL or JDBC Statement to use to set a limit, the ResultSet class does not have an API to limit rows.

To simplify limiting ResultSet rows, Commons CVS offers the CSVFormat.Builder.setMaxRows(long) method. For example:

        
        CSVFormat csvFormat = CSVFormat.DEFAULT
            .setMaxRows(5_000)
            .get();
        try (ResultSet resultSet = ...) {
            csvFormat.printer().printRecords(resultSet);
        }

Using the above, calling CSVPrinter.printRecords(ResultSet) will limit the row count to the maximum number of rows specified in setMaxRows().

Note that setMaxRows() works with the other methods that print a sequence of records.

Packages

Package

Description

org.apache.commons.csv

Apache Commons CSV

Apache Commons CSV 1.14.1 API