Hierarchical properties
Many sources of configuration data have a hierarchical or tree-like
nature. They can represent data that is structured in many ways.
Such configuration sources are represented by classes derived from
HierarchicalConfiguration
.
Prominent examples of hierarchical configuration sources are XML
documents. They can be read and written using the
XMLConfiguration
class. This section explains how
to deal with such structured data and demonstrates the enhanced query
facilities supported by HierarchicalConfiguration
. We
use XML documents as examples for structured configuration sources,
but the information provided here (especially the rules for accessing
properties) applies to other hierarchical configurations as well.
Examples for other hierarchical configuration classes are
Accessing properties in hierarchical configurations
We will start with a simple XML document to show some basics
about accessing properties. The following file named
gui.xml
is used as example document:
<?xml version="1.0" encoding="ISO-8859-1" ?>
<gui-definition>
<colors>
<background>#808080</background>
<text>#000000</text>
<header>#008000</header>
<link normal="#000080" visited="#800080"/>
<default>${colors.header}</default>
</colors>
<rowsPerPage>15</rowsPerPage>
<buttons>
<name>OK,Cancel,Help</name>
</buttons>
<numberFormat pattern="###\,###.##"/>
</gui-definition>
(As becomes obvious, this tutorial does not bother with good
design of XML documents, the example file should rather
demonstrate the different ways of accessing properties.)
To access the data stored in this document it must be loaded
by XMLConfiguration
. Like other
file based
configuration classes XMLConfiguration
supports
many ways of specifying the file to process. One way is to
pass the file name to the constructor as shown in the following
code fragment:
try
{
XMLConfiguration config = new XMLConfiguration("tables.xml");
// do something with config
}
catch(ConfigurationException cex)
{
// something went wrong, e.g. the file was not found
}
If no exception was thrown, the properties defined in the
XML document are now available in the configuration object.
Other hierarchical configuration classes that operate on files
have corresponding constructors and methods for loading their data.
The following fragment shows how the properties can be accessed:
String backColor = config.getString("colors.background");
String textColor = config.getString("colors.text");
String linkNormal = config.getString("colors.link[@normal]");
String defColor = config.getString("colors.default");
int rowsPerPage = config.getInt("rowsPerPage");
List<Object> buttons = config.getList("buttons.name");
This listing demonstrates some important points about constructing
keys for accessing properties in hierarchical configuration sources and about
features of HierarchicalConfiguration
in general:
-
Nested elements are accessed using a dot notation. In
the example document there is an element
<text>
in the body of the
<color>
element. The corresponding
key is color.text
.
-
The root element is ignored when constructing keys. In
the example you do not write
gui-definition.color.text
, but only
color.text
.
-
Attributes of XML elements are accessed in a XPath like
notation.
-
Interpolation can be used as in
PropertiesConfiguration
.
Here the <default>
element in the
colors
section refers to another color.
-
Lists of properties can be defined in a short form using
the delimiter character (which is the comma by default).
In this example the
buttons.name
property
has the three values OK, Cancel, and
Help, so it is queried using the getList()
method. This works in attributes, too. Using the static
setDefaultDelimiter()
method of
AbstractConfiguration
you can globally
define a different delimiter character or -
by setting the delimiter to 0 - disabling this mechanism
completely. Placing a backslash before a delimiter
character will escape it. This is demonstrated in the
pattern
attribute of the numberFormat
element.
In the next section will show how data in a more complex XML
document can be processed.
Complex hierarchical structures
Consider the following scenario: An application operates on
database tables and wants to load a definition of the database
schema from its configuration. A XML document provides this
information. It could look as follows:
<?xml version="1.0" encoding="ISO-8859-1" ?>
<database>
<tables>
<table tableType="system">
<name>users</name>
<fields>
<field>
<name>uid</name>
<type>long</type>
</field>
<field>
<name>uname</name>
<type>java.lang.String</type>
</field>
<field>
<name>firstName</name>
<type>java.lang.String</type>
</field>
<field>
<name>lastName</name>
<type>java.lang.String</type>
</field>
<field>
<name>email</name>
<type>java.lang.String</type>
</field>
</fields>
</table>
<table tableType="application">
<name>documents</name>
<fields>
<field>
<name>docid</name>
<type>long</type>
</field>
<field>
<name>name</name>
<type>java.lang.String</type>
</field>
<field>
<name>creationDate</name>
<type>java.util.Date</type>
</field>
<field>
<name>authorID</name>
<type>long</type>
</field>
<field>
<name>version</name>
<type>int</type>
</field>
</fields>
</table>
</tables>
</database>
This XML is quite self explanatory; there is an arbitrary number
of table elements, each of it has a name and a list of fields.
A field in turn consists of a name and a data type. This
XML document (let's call it tables.xml
) can be
loaded in exactly the same way as the simple document in the
section before.
When we now want to access some of the properties we face a
problem: the syntax for constructing configuration keys we
learned so far is not powerful enough to access all of the data
stored in the tables document.
Because the document contains a list of tables some properties
are defined more than once. E.g. the configuration key
tables.table.name
refers to a name
element inside a table
element inside a
tables
element. This constellation happens to
occur twice in the tables document.
Multiple definitions of a property do not cause problems and are
supported by all classes of Configuration. If such a property
is queried using getProperty()
, the method
recognizes that there are multiple values for that property and
returns a collection with all these values. So we could write
Object prop = config.getProperty("tables.table.name");
if(prop instanceof Collection)
{
System.out.println("Number of tables: " + ((Collection<?>) prop).size());
}
An alternative to this code would be the getList()
method of Configuration
. If a property is known to
have multiple values (as is the table name property in this example),
getList()
allows retrieving all values at once.
Note: it is legal to call getString()
or one of the other getter methods on a property with multiple
values; it returns the first element of the list.
Accessing structured properties
Okay, we can obtain a list with the names of all defined
tables. In the same way we can retrieve a list with the names
of all table fields: just pass the key
tables.table.fields.field.name
to the
getList()
method. In our example this list
would contain 10 elements, the names of all fields of all tables.
This is fine, but how do we know, which field belongs to
which table?
When working with such hierarchical structures the configuration keys
used to query properties can have an extended syntax. All components
of a key can be appended by a numerical value in parentheses that
determines the index of the affected property. So if we have two
table
elements we can exactly specify, which one we
want to address by appending the corresponding index. This is
explained best by some examples:
We will now provide some configuration keys and show the results
of a getProperty()
call with these keys as arguments.
tables.table(0).name
-
Returns the name of the first table (all indices are 0 based),
in this example the string users.
tables.table(0)[@tableType]
-
Returns the value of the tableType attribute of the first
table (system).
tables.table(1).name
-
Analogous to the first example returns the name of the
second table (documents).
tables.table(2).name
-
Here the name of a third table is queried, but because there
are only two tables result is null. The fact that a
null value is returned for invalid indices can be used
to find out how many values are defined for a certain property:
just increment the index in a loop as long as valid objects
are returned.
tables.table(1).fields.field.name
-
Returns a collection with the names of all fields that
belong to the second table. With such kind of keys it is
now possible to find out, which fields belong to which table.
tables.table(1).fields.field(2).name
-
The additional index after field selects a certain field.
This expression represents the name of the third field in
the second table (creationDate).
tables.table.fields.field(0).type
-
This key may be a bit unusual but nevertheless completely
valid. It selects the data types of the first fields in all
tables. So here a collection would be returned with the
values [long, long].
These examples should make the usage of indices quite clear.
Because each configuration key can contain an arbitrary number
of indices it is possible to navigate through complex structures of
hierarchical configurations; each property can be uniquely identified.
Sometimes dealing with long property keys may become inconvenient,
especially if always the same properties are accessed. For this
case HierarchicalConfiguration
provides a short cut
with the configurationAt()
method. This method can
be passed a key that selects exactly one node of the hierarchy
of nodes contained in a hierarchical configuration. Then a new
hierarchical configuration will be returned whose root node is
the selected node. So all property keys passed into that
configuration should be relative to the new root node. For
instance, if we are only interested in information about the
first database table, we could do something like that:
HierarchicalConfiguration sub = config.configurationAt("tables.table(0)");
String tableName = sub.getString("name"); // only need to provide relative path
List<Object> fieldNames = sub.getList("fields.field.name");
For dealing with complex list-like structures there is another
short cut. Often it will be necessary to iterate over all items
in the list and access their (sub) properties. A good example are
the fields of the tables in our demo configuration. When you want
to process all fields of a table (e.g. for constructing a
CREATE TABLE
statement), you will need all information
stored for them in the configuration. An option would be to use
the getList()
method to fetch the required data one
by one:
List<Object> fieldNames = config.getList("tables.table(0).fields.field.name");
List<Object> fieldTypes = config.getList("tables.table(0).fields.field.type");
List<Object> ... // further calls for other data that might be stored in the config
But this is not very readable and will fail if not all field
elements contain the same set of data (for instance the
type
property may be optional, then the list for
the types can contain less elements than the other lists). A
solution to these problems is the configurationsAt()
method, a close relative to the configurationAt()
method covered above. This method evaluates the passed in key and
collects all configuration nodes that match this criterion. Then
for each node a HierarchicalConfiguration
object is
created with this node as root node. A list with these configuration
objects is returned. As the following example shows this comes in
very handy when processing list-like structures:
List<HierarchicalConfiguration> fields =
config.configurationsAt("tables.table(0).fields.field");
for(HierarchicalConfiguration sub : fields)
{
// sub contains all data about a single field
String fieldName = sub.getString("name");
String fieldType = sub.getString("type");
...
The configurations returned by the configurationAt()
and
configurationsAt()
method are in fact instances of the
SubnodeConfiguration
class. The API documentation of
this class contains more information about its features and
limitations.
Adding new properties
So far we have learned how to use indices to avoid ambiguities when
querying properties. The same problem occurs when adding new
properties to a structured configuration. As an example let's
assume we want to add a new field to the second table. New properties
can be added to a configuration using the addProperty()
method. Of course, we have to exactly specify where in the tree like structure new
data is to be inserted. A statement like
// Warning: This might cause trouble!
config.addProperty("tables.table.fields.field.name", "size");
would not be sufficient because it does not contain all needed
information. How is such a statement processed by the
addProperty()
method?
addProperty()
splits the provided key into its
single parts and navigates through the properties tree along the
corresponding element names. In this example it will start at the
root element and then find the tables
element. The
next key part to be processed is table
, but here a
problem occurs: the configuration contains two table
properties below the tables
element. To get rid off
this ambiguity an index can be specified at this position in the
key that makes clear, which of the two properties should be
followed. tables.table(1).fields.field.name
e.g.
would select the second table
property. If an index
is missing, addProperty()
always follows the last
available element. In our example this would be the second
table
, too.
The following parts of the key are processed in exactly the same
manner. Under the selected table
property there is
exactly one fields
property, so this step is not
problematic at all. In the next step the field
part
has to be processed. At the actual position in the properties tree
there are multiple field
(sub) properties. So we here
have the same situation as for the table
part.
Because no explicit index is defined the last field
property is selected. The last part of the key passed to
addProperty()
(name
in this example)
will always be added as new property at the position that has
been reached in the former processing steps. So in our example
the last field
property of the second table would
be given a new name
sub property and the resulting
structure would look like the following listing:
...
<table tableType="application">
<name>documents</name>
<fields>
<field>
<name>docid</name>
<type>long</type>
</field>
<field>
<name>name</name>
<type>java.lang.String</type>
</field>
<field>
<name>creationDate</name>
<type>java.util.Date</type>
</field>
<field>
<name>authorID</name>
<type>long</type>
</field>
<field>
<name>version</name>
<name>size</name> <== Newly added property
<type>int</type>
</field>
</fields>
</table>
</tables>
</database>
This result is obviously not what was desired, but it demonstrates
how addProperty()
works: the method follows an
existing branch in the properties tree and adds new leaves to it.
(If the passed in key does not match a branch in the existing tree,
a new branch will be added. E.g. if we pass the key
tables.table.data.first.test
, the existing tree can be
navigated until the data
part of the key. From here a
new branch is started with the remaining parts data
,
first
and test
.)
If we want a different behavior, we must explicitly tell
addProperty()
what to do. In our example with the
new field our intension was to create a new branch for the
field
part in the key, so that a new field
property is added to the structure rather than adding sub properties
to the last existing field
property. This can be
achieved by specifying the special index (-1)
at the
corresponding position in the key as shown below:
config.addProperty("tables.table(1).fields.field(-1).name", "size");
config.addProperty("tables.table(1).fields.field.type", "int");
The first line in this fragment specifies that a new branch is
to be created for the field
property (index -1).
In the second line no index is specified for the field, so the
last one is used - which happens to be the field that has just
been created. So these two statements add a fully defined field
to the second table. This is the default pattern for adding new
properties or whole hierarchies of properties: first create a new
branch in the properties tree and then populate its sub properties.
As an additional example let's add a complete new table definition
to our example configuration:
// Add a new table element and define the name
config.addProperty("tables.table(-1).name", "versions");
// Add a new field to the new table
// (an index for the table is not necessary because the latest is used)
config.addProperty("tables.table.fields.field(-1).name", "id");
config.addProperty("tables.table.fields.field.type", "int");
// Add another field to the new table
config.addProperty("tables.table.fields.field(-1).name", "date");
config.addProperty("tables.table.fields.field.type", "java.sql.Date");
...
For more information about adding properties to a hierarchical
configuration also have a look at the javadocs for
HierarchicalConfiguration
.
Escaping special characters
Some characters in property keys or values require a special
treatment.
Per default the dot character is used as delimiter by most
configuration classes (we will learn how to change this for
hierarchical configurations in a later section). In some
configuration formats however, dots can be contained in the
names of properties. For instance, in XML the dot is a legal
character that can occur in any tag. The same is true for the names
of properties in windows ini files. So the following XML
document is completely valid:
<?xml version="1.0" encoding="ISO-8859-1" ?>
<configuration>
<test.value>42</test.value>
<test.complex>
<test.sub.element>many dots</test.sub.element>
</test.complex>
</configuration>
This XML document can be loaded by XMLConfiguration
without trouble, but when we want to access certain properties
we face a problem: The configuration claims that it does not
store any values for the properties with the keys
test.value
or test.complex.test.sub.element
!
Of course, it is the dot character contained in the property
names, which causes this problem. A dot is always interpreted
as a delimiter between elements. So given the property key
test.value
the configuration would look for an
element named test
and then for a sub element
with the name value
. To change this behavior it is
possible to escape a dot character, thus telling the configuration
that it is really part of an element name. This is simply done
by duplicating the dot. So the following statements will return
the desired property values:
int testVal = config.getInt("test..value");
String complex = config.getString("test..complex.test..sub..element");
Note the duplicated dots wherever the dot does not act as
delimiter. This way it is possible to access properties containing
dots in arbitrary combination. However, as you can see, the
escaping can be confusing sometimes. So if you have a choice,
you should avoid dots in the tag names of your XML configuration
files or other configuration sources.
Another source of problems is related to list delimiter characters
in the values of properties. Like other configuration classes
XMLConfiguration
implements
list handling.
This means that the values of XML elements and attributes are
checked whether they contain a list delimiter character. If this
is the case, the value is split, and a list property is created.
Per default this feature is enabled. Have a look at the
following example:
<?xml version="1.0" encoding="ISO-8859-1" ?>
<configuration>
<pi>3,1415</pi>
</configuration>
Here we use the comma as delimiter for fraction digits (as is
standard for some languages). However, the configuration will
interpret the comma as list delimiter character and assign the
property pi the two values 3 and 1415. This was not
desired.
XML has a natural way of defining list properties by simply
repeating elements. So defining multiple values of a property in
a single element or attribute is a rather untypical use case.
Unfortunately, early versions of Commons Configuration had list
delimiter splitting enabled per default. Later it became obvious
that this feature can cause serious problems related to the
interpretation of property values and the escaping of delimiter
characters. For reasons of backwards compatibility we have to
stick to this approach in the 1.x series though.
In the next major release the handling of lists will probably be
reworked. Therefore it is recommended not to use this feature.
You are save if you disable it immediately after the creation of
an XMLConfiguration
object (and before a file is
loaded). This can be achieved as follows:
XMLConfiguration config = new XMLConfiguration();
config.setDelimiterParsingDisabled(true);
config.setAttributeSplittingDisabled(true);
config.load("config.xml");
Expression engines
In the previous chapters we saw many examples about how properties
in a XMLConfiguration
object (or more general in a
HierarchicalConfiguration
object, because this is the
base class, which implements this functionality) can be queried or
modified using a special syntax for the property keys. Well, this
was not the full truth. Actually, property keys are not processed
by the configuration object itself, but are delegated to a helper
object, a so called Expression engine.
The separation of the task of interpreting property keys into a
helper object is a typical application of the Strategy
design pattern. In this case it also has the advantage that it
becomes possible to plug in different expression engines into a
HierarchicalConfiguration
object. So by providing
different implementations of the
ExpressionEngine
interface hierarchical configurations can support alternative
expression languages for accessing their data.
Before we discuss the available expression engines that ship
with Commons Configuration, it should be explained how an
expression engine can be associated with a configuration object.
HierarchicalConfiguration
and all derived classes
provide a setExpressionEngine()
method, which expects
an implementation of the ExpressionEngine
interface as
argument. After this method was called, the configuration object will
use the passed expression engine, which means that all property keys
passed to methods like getProperty()
,
getString()
, or addProperty()
must
conform to the syntax supported by this engine. Property keys
returned by the getKeys()
method will follow this
syntax, too.
In addition to instance specific expression engines that change the
behavior of single configuration objects it is also possible to set
a global expression engine. This engine is shared between all
hierarchical configuration objects, for which no specific expression
engine was set. The global expression engine can be set using the
static setDefaultExpressionEngine()
method of
HierarchicalConfiguration
. By invoking this method with
a custom expression engine the syntax of all hierarchical configuration
objects can be altered at once.
The default expression engine
The syntax described so far for property keys of hierarchical
configurations is implemented by a specific implementation of the
ExpressionEngine
interface called
DefaultExpressionEngine
. An instance of this class
is installed as the global expression engine in
HierarchicalConfiguration
. So all newly created
instances of this class will make use of this engine (which is
the reason that our examples above worked).
After reading the examples of property keys provided so far in
this document you should have a sound understanding regarding
the features and the syntax supported by the
DefaultExpressionEngine
class. But it can do a
little bit more for you: it defines a bunch of properties,
which can be used to customize most tokens that can appear in a
valid property key. You prefer curly brackets over parenthesis
as index markers? You find the duplicated dot as escaped
property delimiter counter-intuitive? Well, simply go ahead and
change it! The following example shows how the syntax of a
DefaultExpressionEngine
object is modified. Then
this object is set as the global expression engine, so that from
now on all hierarchical configuration objects will take up this
new syntax:
DefaultExpressionEngine engine = new DefaultExpressionEngine();
// Use a slash as property delimiter
engine.setPropertyDelimiter("/");
// Indices should be provided in curly brackets
engine.setIndexStart("{");
engine.setIndexEnd("}");
// For attributes use simply a @
engine.setAttributeStart("@");
engine.setAttributeEnd(null);
// A Backslash is used for escaping property delimiters
engine.setEscapedDelimiter("\\/");
// Now install this engine as the global engine
HierarchicalConfiguration.setDefaultExpressionEngine(engine);
// Access properties using the new syntax
HierarchicalConfiguration config = ...
String tableName = config.getString("tables/table{0}/name");
String tableType = config.getString("tables/table{0}@type");
Tip: Sometimes when processing an XML document you
don't want to distinguish between attributes and "normal"
child nodes. You can achieve this by setting the
AttributeEnd
property to null and the
AttributeStart
property to the same value as the
PropertyDelimiter
property. Then the syntax for
accessing attributes is the same as the syntax for other
properties:
DefaultExpressionEngine engine = new DefaultExpressionEngine();
engine.setAttributeEnd(null);
engine.setAttributeStart(engine.getPropertyDelimiter());
...
Object value = config.getProperty("tables.table(0).name");
// name can either be a child node of table or an attribute
The XPATH expression engine
The expression language provided by the DefaultExpressionEngine
class is powerful enough to address all properties in a
hierarchical configuration, but it is not always convenient to
use. Especially if list structures are involved, it is often
necessary to iterate through the whole list to find a certain
element.
Think about our example configuration that stores information about
database tables. A use case could be to load all fields that belong
to the "users" table. If you knew the index of this
table, you could simply build a property key like
tables.table(<index>).fields.field.name
,
but how do you find out the correct index? When using the
default expression engine, the only solution to this problem is
to iterate over all tables until you find the "users"
table.
Life would be much easier if an expression language could be used,
which would directly support queries of such kind. In the XML
world, the XPATH syntax has grown popular as a powerful means
of querying structured data. In XPATH a query that selects all
field names of the "users" table would look something
like tables/table[@name='users']/fields/name
(here
we assume that the table's name is modelled as an attribute).
This is not only much simpler than an iteration over all tables,
but also much more readable: it is quite obvious, which fields
are selected by this query.
Given the power of XPATH it is no wonder that we got many
user requests to add XPATH support to Commons Configuration.
Well, here is it!
For enabling XPATH syntax for property keys you need the
XPathExpressionEngine
class. This class
implements the ExpressionEngine
interface and can
be plugged into a HierarchicalConfiguration
object
using the setExpressionEngine()
method. It is also
possible to set an instance of this class as the global
expression engine, so that all hierarchical configuration
objects make use of XPATH syntax. The following code fragment
shows how XPATH support can be enabled for a configuration
object:
HierarchicalConfiguration config = ...
config.setExpressionEngine(new XPathExpressionEngine());
// Now we can use XPATH queries:
List<Object> fields = config.getList("tables/table[1]/fields/name");
XPATH expressions are not only used for selecting properties
(i.e. for the several getter methods), but also for adding new
properties. For this purpose the keys passed into the
addProperty()
method must conform to a special
syntax. They consist of two parts: the first part is an
arbitrary XPATH expression that selects the node where the new
property is to be added to, the second part defines the new
element to be added. Both parts are separated by whitespace.
Okay, let's make an example. Say, we want to add a type
property under the first table (as a sibling to the name
element). Then the first part of our key will have to select
the first table element, the second part will simply be
type
, i.e. the name of the new property:
config.addProperty("tables/table[1] type", "system");
(Note that indices in XPATH are 1-based, while in the default
expression language they are 0-based.) In this example the part
tables/table[1]
selects the target element of the
add operation. This element must exist and must be unique, otherwise an exception
will be thrown. type
is the name of the new element
that will be added. If instead of a normal element an attribute
should be added, the example becomes
config.addProperty("tables/table[1] @type", "system");
It is possible to add complete paths at once. Then the single
elements in the new path are separated by "/"
characters. The following example shows how data about a new
table can be added to the configuration. Here we use full paths:
// Add new table "tasks" with name element and type attribute
config.addProperty("tables table/name", "tasks");
// last() selects the last element of this name,
// which is the newest table element
config.addProperty("tables/table[last()] @type", "system");
// Now add fields
config.addProperty("tables/table[last()] fields/field/name", "taskid");
config.addProperty("tables/table[last()]/fields/field[last()] type", "int");
config.addProperty("tables/table[last()]/fields field/name", "name");
config.addProperty("tables/table[last()]/fields field/name", "startDate");
...
The first line of this example adds the path table/name
to the tables
element, i.e. a new table
element will be created and added as last child to the
tables
element. Then a new name
element
is added as child to the new table
element. To this
element the value "tasks" is assigned. The next line
adds a type
attribute to the new table element. To
obtain the correct table
element, to which the
attribute must be added, the XPATH function last()
is used; this function selects the last element with a given
name, which in this case is the new table
element.
The following lines all use the same approach to construct a new
element hierarchy: At first complete new branches are added
(fields/field/name
), then to the newly created
elements further children are added.
There is one gotcha with these keys described so far: they do
not work with the setProperty()
method! This is
because setProperty()
has to check whether the
passed in key already exists; therefore it needs a key which can
be interpreted by query methods. If you want to use
setProperty()
, you can pass in regular keys (i.e.
without a whitespace separator). The method then tries to figure
out which part of the key already exists in the configuration
and adds new nodes as necessary. In principle such regular keys
can also be used with addProperty()
. However, they
do not contain sufficient information to decide where new nodes
should be added.
To make this clearer let's go back to the example with the
tables. Consider that there is a configuration which already
contains information about some database tables. In order to add
a new table element in the configuration
addProperty()
could be used as follows:
config.addProperty("tables/table/name", "documents");
In the configuration a <tables>
element
already exists, also <table>
and
<name>
elements. How should the expression
engine know where new node structures are to be added? The
solution to this problem is to provide this information in the
key by stating:
config.addProperty("tables table/name", "documents");
Now it is clear that new nodes should be added as children of
the <tables>
element. More information about
keys and how they play together with addProperty()
and setProperty()
can be found in the Javadocs for
XPathExpressionEngine
.
Note: XPATH support is implemented through
Commons JXPath.
So when making use of this feature, be sure you include the
commons-jxpath jar in your classpath.
In this tutorial we don't want to describe XPATH syntax and
expressions in detail. Please refer to corresponding documentation.
It is important to mention that by embedding Commons JXPath the
full extent of the XPATH 1.0 standard can be used for constructing
property keys.
Validation of XML configuration files
XML parsers provide support for validation of XML documents to ensure that they
conform to a certain DTD or XML Schema. This feature can be useful for
configuration files, too. XMLConfiguration
allows this feature
to be enabled when files are loaded.
Validation using a DTD
The easiest way to turn on validation is to simply set the
validating
property to true as shown in the
following example:
XMLConfiguration config = new XMLConfiguration();
config.setFileName("myconfig.xml");
config.setValidating(true);
// This will throw a ConfigurationException if the XML document does not
// conform to its DTD.
config.load();
Setting the validating
flag to true will cause
XMLConfiguration
to use a validating XML parser. At this parser
a custom ErrorHandler
will be registered, which throws
exceptions on simple and fatal parsing errors.
Validation using a Schema
XML Parsers also provide support for validating XML documents using an
XML Schema. XMLConfiguration provides a simple mechanism for enabling
this by setting the schemaValidation
flag to true. This
will also set the validating
flag to true so both do not
need to be set. The XML Parser will then use the schema defined in the
XML document to validate it. Enabling schema validation will also
enable the parser's namespace support.
XMLConfiguration config = new XMLConfiguration();
config.setFileName("myconfig.xml");
config.setSchemaValidation(true);
// This will throw a ConfigurationException if the XML document does not
// conform to its Schema.
config.load();
Default Entity Resolution
There is also some support for dealing with DTD files. Often the
DTD of an XML document is stored locally so that it can be quickly
accessed. However the DOCTYPE
declaration of the document
points to a location on the web as in the following example:
<?xml version="1.0" encoding="ISO-8859-1"?>
<!DOCTYPE web-app
PUBLIC "-//Sun Microsystems, Inc.//DTD Web Application 2.2//EN"
"http://java.sun.com/j2ee/dtds/web-app_2.2.dtd">
When working with XML documents directly you would use an
EntityResolver
in such a case. The task of such an
entity resolver is to point the XML parser to the location of the
file referred to by the declaration. So in our example the entity
resolver would load the DTD file from a local cache instead of
retrieving it from the internet.
XMLConfiguration
provides a simple default implementation of
an EntityResolver
. This implementation is initialized
by calling the registerEntityId()
method with the
public IDs of the entities to be retrieved and their corresponding
local URLs. This method has to be called before the configuration
is loaded. To continue our example, consider that the DTD file for
our example document is stored on the class path. We can register it
at XMLConfiguration
using the following code:
XMLConfiguration config = new XMLConfiguration();
// load the URL to the DTD file from class path
URL dtdURL = getClass().getResource("web-app_2.2.dtd");
// register it at the configuration
config.registerEntityId("-//Sun Microsystems, Inc.//DTD Web Application 2.2//EN",
dtdURL);
config.setValidating(true); // enable validation
config.setFileName("web.xml");
config.load();
This basically tells the XML configuration to use the specified
URL when it encounters the given public ID. Note that the call to
registerEntityId()
has to be performed before the
configuration is loaded. So you cannot use one of the constructors
that directly load the configuration.
Enhanced Entity Resolution
While the default entity resolver can be used under certain circumstances,
it does not work well when using the DefaultConfigurationBuilder.
Furthermore, in many circumstances the programmatic nature of
registering entities will tie the application tightly to the
XML content. In addition, because it only works with the public id it
cannot support XML documents using an XML Schema.
XML
Entity and URI Resolvers describes using a set of catalog files to
resolve entities. Commons Configuration provides support for
this Catalog Resolver through its own CatalogResolver class.
<?xml version="1.0" encoding="ISO-8859-1"?>
<Employees xmlns="https://commons.apache.org/employee"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="https://commons.apache.org/employee https://commons.apache.org/sample.xsd">
<Employee>
<SSN>555121211</SSN>
<Name>John Doe</Name>
<DateOfBirth>1975-05-15</DateOfBirth>
<EmployeeType>Exempt</EmployeeType>
<Salary>100000</Salary>
</Employee>
</Employees>
The XML sample above is an XML document using a default namespace of
https://commons.apache.org/employee. The schemaLocation allows a set
of namespaces and hints to the location of their corresponding
schemas. When processing the document the parser will pass the hint,
in this case https://commons.apache.org/sample.xsd, to the entity resolver
as the system id. More information on using schema locations can be found
at schemaLocation.
The example that follows shows how to use the CatalogResolver when
processing an XMLConfiguration. It should be noted that by using the
setEntityResolver method any EntityResolver may be used, not just those
provided by Commons Configuration.
CatalogResolver resolver = new CatalogResolver();
resolver.setCatalogFiles("local/catalog.xml","http://test.org/catalogs/catalog1.xml");
XMLConfiguration config = new XMLConfiguration();
config.setEntityResolver(resolver);
config.setSchemaValidation(true); // enable schema validation
config.setFileName("config.xml");
config.load();
Extending Validation and Entity Resolution
The mechanisms provided with Commons Configuration will hopefully be
sufficient in most cases, however there will certainly be circumstances
where they are not. XMLConfiguration provides two extension mechanisms
that should provide applications with all the flexibility they may
need. The first, registering a custom Entity Resolver has already been
discussed in the preceding section. The second is that XMLConfiguration
provides a generic way of setting up the XML parser to use: A preconfigured
DocumentBuilder
object can be passed to the
setDocumentBuilder()
method.
So an application can create a DocumentBuilder
object
and initialize it according to its special needs. Then this
object must be passed to the XMLConfiguration
instance
before invocation of the load()
method. When loading
a configuration file, the passed in DocumentBuilder
will
be used instead of the default one. Note: If a custom
DocumentBuilder
is used, the default implementation of
the EntityResolver
interface is disabled. This means
that the registerEntityId()
method has no effect in
this mode.