Commons Flatfile

This library is intended to provide conveniences for working with flat data structures. There are a few basic components:

Entity API
Data Definition DSL
Object graph representation

Entity API

The flatfile package is built on a number of interfaces, of which instances can be combined to represent any flat data structure (and some bumpy ones). All interfaces extend java.io.Serializable.

Interface	Description	Basic implementation(s)
`Entity`	A container for some content of 0 or more bytes	`Field`, `DynamicField`
`FieldOption`	Marker interface for "field" options	Various and open-ended
`EntityCollection`	An `Entity` that is a collection of child Entities	implemented via subinterfaces
`IndexedEntityCollection`	Indexed `EntityCollection`	`EntityArray`
`NamedEntityCollection`	`EntityCollection` whose children are identified by `String` keys	`EntityMap`
`EntityFactory`	Describes an object that can return an `Entity` instance given some `Object` "cue"	`CloningEntityFactory`, `CompositeEntityFactory`, `ParserEntityFactory`

See also Flatfile API

Data Definition DSL

The Basics

The package provides an EntityFactory implementation that associates Entity definitions with String identifiers, read from a custom DSL which is loosely based on COBOL's data definition format. Here are some definitions in this format:

/* Java multi-line comments are supported */
// Java single-line comments are supported
// type foo, length 1:
foo (1), //the comma is optional

/* type bar, length 3 with default value "bar".
   Note that string literals, including unicode chars, are as in Java:
 */
bar (3) "bar"

/* type optionalField, length 10, default value of all underscores
   specified using the 'c'* "fill-character" syntax.
   Again note that character literals, including unicode representation, are as in Java:
 */
optionalField (10) '_'* 

// type baz, default value "baz", length (3) implicit:
baz "baz"

// type delimiter immutable field of length 3 filled with asterisks:
delimiter (3) '*'*!

// type blah, immutable value "blah", length (4) implicit:
blah "blah"!

// type simpleArray, 3 occurrences of 2 bytes each:
simpleArray (2) [3]

// complex type dateYYYYMMDD:
dateYYYYMMDD {
  year (4)
  month (2)
  day (2)
}

// complex type dateRange with type references:
dateRange {
  start $dateYYYYMMDD
  ? '-'!// anonymous (filler) child with immutable value and implicit length (1)
  end $dateYYYYMMDD
}

// type complexArray, 3 occurrences of a named entity collection:
complexArray {
  a (1)
  b (2)
  c (3)
} [3]

// previous example, initialized to all spaces:
complexArray {
  a (1)
  b (2)
  c (3)
} [3] ' '*

Field Options

That's nice, but it's not always enough. Field options can be used to zero in on the exact behavior you need from a given field definition. Field options supported:

Option name	Function	Type	Values
`pad`	Used when a too-small value is specified	`byte`	`default (byte) 0`
`justify`	Specify field justification when a too-small value is specified	`PadJustifyFieldSupport$Justify` enum	`LEFT (default), RIGHT, CENTER`
`overflow`	Specify behavior on too-large value	`FieldSupport$Overflow` enum	`ERROR (default), IGNORE`
`underflow`	Specify behavior on too-small value	`FieldSupport$Underflow` enum	`ERROR (default), IGNORE`

Example:

//define an integer field:
intField (9) pad='0' justify=RIGHT

//define a field for which overflow is permitted:
truncateMe (20) overflow=IGNORE

Field options are by no means magical; rather the option setting syntax shown above applies to any obvious (String, byte, numeric) property type. Additionally a String value will be converted to a public static member of a class that implements the FieldOption marker interface. This is more than an implementation detail; this is important information about how you can implement Flatfile's Entity interface to satisfy requirements that may be more specific than what is covered in the basic package. You can even specify nested properties in el-style syntax--just surround the property expression with double quotes!

Dynamically-Sizable Arrays

It is possible to define an IndexedEntityCollection (implemented by EntityArray) whose number of occurrences is not known:

  anySize (2) []
  acceptableRange (2) [1..5]
  minOccurs (2) [1..]
  maxOccurs (2) [..5]
  optional (2) [..1]

Entities defined thus will return IndexedEntityCollections for which #isSizable() returns true. When this is the case, #setSize() can be called to set the size when the correct size is known.

EntityCollection Child Delimiters

The EntityCollection implementations returned by the DSL-based EntityFactory support some handy properties:

Property	Type	Description	Default value
`delim`	`byte[]`	Content to be written between each child entity	`byte[0]`
`delimAfter`	`boolean`	Whether a delimiter should follow the final child	`true`
`suppressEmptyChildren`	`boolean`	Whether to suppress children of zero length (and, more importantly, their delimiters)	`true`

Dynamically-Sizable Fields

Occasionally there may be a requirement that fields of unknown length be intermingled with fields of predetermined length or value. Here is an example:

  structure {
    "foo="! fooValue (*) // any length
    "bar="! barValue (1..) // at least length 1
    "baz="! bazValue (..10) // at most length 10
    "blah="! blahValue (3..4) // 3 or 4
  } delim="\r\n"

Dynamically-sizable fields, or DynamicFields, support the following options:

Option name	Function	Type	Values
`pad`	Used when a too-small value is specified	`byte`	`default (byte) 0`
`justify`	Specify field justification when a too-small value is specified	`PadJustifyFieldSupport$Justify` enum	`LEFT (default), RIGHT, CENTER`
`Overflow`	Specify behavior on too-large value	`FieldSupport$Overflow` enum	`ERROR (default), IGNORE`
`Underflow`	Specify behavior on too-small value	`FieldSupport$Underflow` enum	`ERROR (default), IGNORE`

Default Options

You can also default options for certain types. ParserEntityFactory defines certain constants to show where this is possible:

Field name	Value
`OPTION_FIELD`	`field`
`OPTION_DYNAMIC_FIELD`	`dynamicField`

You can use these constants, prefaced by an "at" (@) symbol to set default options for any type supported, at the top of the resource:

@field justify=CENTER pad=' '; // semicolon indicates end
@dynamicField underflow=IGNORE

Entity Checks

A final feature of the DSL-based EntityFactory is the idea that it may run a number of checks against entities as they are read from the definition file. The only check implemented at this time is the length check, which is specified by appending a colon and expected length after any entity definition, as shown:

  myRecord {
    a (10)
    ? ' '!
    b (50)
    ? ' '!
    c {
      c1 (20)
      c2 (20)
    }
    ? ' '!
    d (5) [2]
    ? ' '!
    e (24)
    ? ' '!
    f (1)
  } : 140

  multilineRecord {
    foo (2)
    bar (2)
    baz (1)
  } delim=' ' delimAfter=false [10] delim="\r\n" : 90

Object Graph Representation

We have covered the core APIs that attempt to represent flat structures of virtually unlimited complexity in what is intended to be a simple way. Next we saw how the provided DSL allows us to build the included entity representations using a terse syntax that aims to yet be as clear as, or more so than, the equivalent Java code. Finally we can go a step further and provide an efficient means for our flat Entity-based structures to interoperate with Java POJOs. By implementing the reflection and conversion APIs defined by the Morph project, we can provide, for relatively little investment, a simple means to copy data between Entity graphs and POJO graphs. By inserting Entity-aware Reflectors and Transformers at opportune points in a Morph configuration, it is possible to achieve a surprising amount of basic functionality. More complex things can be accomplished by extending the APIs. We need to provide examples!

Development

Project Documentation

Commons

General Information

ASF

Table Of Contents