Commons Id

Overview

A Universally Unique Identifier (UUID) is a 128-bit identifier described in Internet Engineering Task Force RFC 4122: A Universally Unique IDentifier (UUID) URN Namespace .

Generators for versions 1,3,4 and 5 UUID's are provided. The value held in a UUID is represented by a specific hexadecimal format of the binary fields. An example UUID string representation is: F81D4FAE-7DEC-11D0-A765-00A0C91E6BF6.

A cautionary note: there is no standard regarding binary representation of a UUID other than its string format.

UUID version 4

The version 4 UUID is UUID based on random bytes. We fill the 128-bits with random bits (6 of the bits are correspondingly set to flag the version and variant of the UUID). No special configuration or implementation decisions are required to generate version 4 UUID's.

UUID version 3

Version 3 UUIDs are initialized using a name, a namespace, and the MD5 hashing algorithm.

UUID version 5

Version 5 UUIDs are initialized using a name, a namespace, and the SHA-1 hashing algorithm.

UUID version 1

The version 1 UUID is a combination of node identifier (MAC address), timestamp and a random seed. The version one generator uses the commons-discovery package to determine the implementation. The implementations are specified by system properties.

Property	Default
org.apache.commons.id.uuid.clock.Clock	org.apache.commons.id.uuid.clock.SystemClockImpl
org.apache.commons.id.uuid.NodeManager	org.apache.commons.id.uuid.NodeManagerImpl
org.apache.commons.id.uuid.state.State	org.apache.commons.id.uuid.state.ReadOnlyResourceImpl
org.apache.commons.id.uuid.config.resource.filename	[No default, you must explicitly configure this for each jvm instance.]

Persistent State

The UUID draft specification calls for persisting generator state to stable non-volatile storage (provisions are made for systems that can not provide persistent storage.) Persisting state decreases the likelihood of duplicating time and random seed (clock sequence) values, which are two components of the version one identifier. When the previous clock sequence is unknown the generator must generate new random bytes for the clock sequence. The system time may be set backwards during normal operation of a system; accordingly the generator is required to change the clock sequence value.

The State interface in the org.apache.commons.id.uuid.state package provides the interface for persistent state used by the VersionOneGenerator. Three implementations are provided to accommodate different scenarios. The InMemoryStateImpl follows the recommendations of the specification for those instances when no persistent storage is available and the hardware (MAC) address cannot be read. The ReadOnlyResourceImpl implementation is useful for situations or containers that allow resource loading, but forbid explicit I/O. The xml state file contains the node identifier (hardware address) and is loaded as a system resource. Finally, the ReadWriteFileImpl extends the ReadOnlyResourceImpl (both share the same loading of configuration data); however the ReadWriteFileImpl uses IO to write the clock sequence and last timestamp used to file.

The following is an example configuration file xml (be certain to change the uuid.state file for each virtual machine instance):

   <?xml version="1.0" encoding="UTF-8" ?>
   <!DOCTYPE uuidstate [
      <!ELEMENT uuidstate (node*)>
      <!ELEMENT node EMPTY>
      <!ATTLIST node id ID #REQUIRED>
      <!ATTLIST node clocksequence CDATA #IMPLIED>
      <!ATTLIST node lasttimestamp CDATA #IMPLIED>
   ]>
   <uuidstate synchInterval="3000">
        <node id="AA-BB-CC-DD-EE-11" />
        <node id="22-33-44-55-66-77" />
   </uuidstate>

The "synchInterval" attribute is specified as the number of milliseconds between writes to the file to update the "clocksequence" and "lasttimestamp". This interval should be set large enough to provide adequate performance, yet attempt not to specify a time longer than the time needed to restart the virtual machine and generate the next UUID. See the IETF draft for more on this strategy (specification: "4.2.1.3 Writing stable storage".)

The UUID specification is written with the frame of reference that one or more physical (MAC address) node identifiers belong to a machine. Java's Virtual Machine concept is that a physical machine hosts a virtual machine. The ReadOnlyResourceImpl and ReadWriteFileImpl implementations assume that each virtual machine instance is assigned a distinct configuration file with distinct identifiers/addresses. Without this assumption a system wide mutex or mutual exclusion object is required to prevent multiple virtual machine instance (either different jvm's or concurrent instances of the same jvm) from generating duplicates at the same time using the same clock sequence and identifier. Writing a custom implementation of the NodeManager interface allows one to change this assumption. Several means of locking the node identifier are possible, such as file system locks, sockets, and more - but not discussed here, as not all are appropriate for all application containers.

Time Resolution

Another obstacle in UUID generation for various systems is the time resolution called for in the UUID draft is based on 100-nanosecond intervals from the Gregorian changeover epoch. The Java language provides millisecond precision when retrieving system time; however the actual time resolution is operating system and chipset dependent. The issue is that calls for the system time in rapid succession produce duplicate time values and sub-millisecond resolution is only provided by performance counters, interrupts, or otherwise. The UUID specification provides a means of compensating for this - suggesting use of an artificial time produced from the actual time and a counter that may not exceed the next interval of the system's effective-resolution. The org.apache.commons.id.clock.Clock interface provides the SPI for uuid time stamps. The SystemClockImpl implementation uses the millisecond resolution of the System.currentTimeMillis plus a count up to 10,000.

Now assume your system has an effective resolution of 54 milliseconds (the clock increments after 54 milliseconds). This would allow less than 200 UUID's to be generated per millisecond. In the case where greater numbers must be generated, the ThreadClockImpl is provided as one potential solution. This implementation uses a threaded clock class to increment on a scheduled interval and up to (10,000 multiplied by the interval length) UUID's may be generated. Other methods to increase the generator throughput are described in the UUID draft (such as adding more node identifiers or pre-generating id's to deal with sporadic demand).

Security

One final issue to consider in UUID generation is security. A version one uuid exposes the node identifier as part of its string format. This may be very undesirable during non-secure transmision of the identifier. Another aspect of the security concern relates to privacy given that the version one uuid may identify a time and place (machine address). Your security requirements may determine the uuid version, the source of the identifier and/or the state implementation you chose.

Development

Project Documentation

Commons

ASF