Apache Commons logo

Commons JCS™

Indexed Disk Auxiliary Cache

The Indexed Disk Auxiliary Cache is an optional plugin for the JCS. It is primarily intended to provide a secondary store to ease the memory burden of the cache. When the memory cache exceeds its maximum size it tells the cache hub that the item to be removed from memory should be spooled to disk. The cache checks to see if any auxiliaries of type "disk" have been configured for the region. If the "Indexed Disk Auxiliary Cache" is used, the item will be spooled to disk.

Disk Indexing

The Indexed Disk Auxiliary Cache follows the fastest pattern of disk caching. Items are stored at the end of a file dedicated to the cache region. The first byte of each disk entry specifies the length of the entry. The start position in the file is saved in memory, referenced by the item's key. Though this still requires memory, it is insignificant given the performance trade off. Depending on the key size, 500,000 disk entries will probably only require about 3 MB of memory. Locating the position of an item is as fast as a map lookup and the retrieval of the item only requires 2 disk accesses.

When items are removed from the disk cache, the location of the available block on the storage file is recorded in skip list set. This allows the disk cache to reuse empty spots, thereby keeping the file size to a minimum.

Purgatory

Writing to the disk cache is asynchronous and made efficient by using a memory staging area called purgatory. Retrievals check purgatory then disk for an item. When items are sent to purgatory they are simultaneously queued to be put to disk. If an item is retrieved from purgatory it will no longer be written to disk, since the cache hub will move it back to memory. Using purgatory insures that there is no wait for disk writes, unnecessary disk writes are avoided for borderline items, and the items are always available.

Persistence

When the disk cache is properly shutdown, the memory index is written to disk and the value file is defragmented. When the cache starts up, the disk cache can be configured to read or delete the index file. This provides an unreliable persistence mechanism.

Size limitation

There are two ways to limit the cache size: using element count and element size. When using element count, in disk store there will be at most MaxKeySize elements. When using element size, there will be at most KeySize kB of elements stored in the data file. The file can be bigger due to fragmentation. The limit does not cover the size of key file so the total space occupied by the cache might be a bit bigger. The mode is chosen using DiskLimitType. Allowed values are: COUNT and SIZE.

Configuration

Configuration is simple and is done in the auxiliary cache section of the cache.ccf configuration file. In the example below, I created an Indexed Disk Auxiliary Cache referenced by DC . It uses files located in the "DiskPath" directory.

The Disk indexes are equipped with an LRU storage limit. The maximum number of keys is configured by the maxKeySize parameter. If the maximum key size is less than 0, no limit will be placed on the number of keys. By default, the max key size is 5000.

jcs.auxiliary.DC=
    org.apache.commons.jcs3.auxiliary.disk.indexed.IndexedDiskCacheFactory
jcs.auxiliary.DC.attributes=
    org.apache.commons.jcs3.auxiliary.disk.indexed.IndexedDiskCacheAttributes
jcs.auxiliary.DC.attributes.DiskPath=g:\dev\jakarta-turbine-stratum\raf
jcs.auxiliary.DC.attributes.MaxKeySize=100000

Additional Configuration Options

The indexed disk cache provides some additional configuration options.

The purgatory size of the Disk cache is equipped with an LRU storage limit. The maximum number of elements allowed in purgatory is configured by the MaxPurgatorySize parameter. By default, the max purgatory size is 5000.

Initial testing indicates that the disk cache performs better when the key and purgatory sizes are limited.

jcs.auxiliary.DC.attributes.MaxPurgatorySize=10000

Slots in the data file become empty when items are removed from the disk cache. The indexed disk cache keeps track of empty slots in the data file, so they can be reused. The slot locations are stored in the recycle bin.

If all the items put on disk are the same size, then the recycle bin will always return perfect matches. However, if the items are of various sizes, the disk cache will use the free spot closest in size but not smaller than the item being written to disk. Since some recycled spots will be larger than the items written to disk, unusable gaps will result. Optimization is intended to remove these gaps.

The Disk cache can be configured to defragment the data file at runtime. Since defragmentation is only necessary if items have been removed, the deframentation interval is determined by the number of removes. Currently there is no way to schedule defragmentation to run at a set time. If you set the OptimizeAtRemoveCount to -1, no optimizations of the data file will occur until shutdown. By default the value is -1.

In version 1.2.7.9 of JCS, the optimization routine was significantly improved. It now occurs in place, without the aid of a temporary file.

jcs.auxiliary.DC.attributes.OptimizeAtRemoveCount=30000

A Complete Configuration Example

In this sample cache.ccf file, I configured the cache to use a disk cache, called DC, by default. Also, I explicitly set a cache region called myRegion1 to use DC. I specified custom settings for all of the Indexed Disk Cache configuration parameters.

##############################################################
##### Default Region Configuration
jcs.default=DC
jcs.default.cacheattributes=org.apache.commons.jcs3.engine.CompositeCacheAttributes
jcs.default.cacheattributes.MaxObjects=100
jcs.default.cacheattributes.MemoryCacheName=org.apache.commons.jcs3.engine.memory.lru.LRUMemoryCache

##############################################################
##### CACHE REGIONS
jcs.region.myRegion1=DC
jcs.region.myRegion1.cacheattributes=org.apache.commons.jcs3.engine.CompositeCacheAttributes
jcs.region.myRegion1.cacheattributes.MaxObjects=1000
jcs.region.myRegion1.cacheattributes.MemoryCacheName=org.apache.commons.jcs3.engine.memory.lru.LRUMemoryCache

##############################################################
##### AUXILIARY CACHES
# Indexed Disk Cache
jcs.auxiliary.DC=org.apache.commons.jcs3.auxiliary.disk.indexed.IndexedDiskCacheFactory
jcs.auxiliary.DC.attributes=org.apache.commons.jcs3.auxiliary.disk.indexed.IndexedDiskCacheAttributes
jcs.auxiliary.DC.attributes.DiskPath=target/test-sandbox/indexed-disk-cache
jcs.auxiliary.DC.attributes.MaxPurgatorySize=10000
jcs.auxiliary.DC.attributes.MaxKeySize=10000
jcs.auxiliary.DC.attributes.OptimizeAtRemoveCount=300000
jcs.auxiliary.DC.attributes.OptimizeOnShutdown=true
jcs.auxiliary.DC.attributes.DiskLimitType=COUNT

Using Thread Pools to Reduce Threads

The Indexed Disk Cache allows you to use fewer threads than active regions. By default the disk cache will use the standard cache event queue which has a dedicated thread. Although the standard queue kills its worker thread after a minute of inactivity, you may want to restrict the total number of threads. You can accomplish this by using a pooled event queue.

The configuration file below defines a disk cache called DC2. It uses an event queue of type POOLED. The queue is named disk_cache_event_queue. The disk_cache_event_queue is defined in the bottom of the file.

##############################################################
################## DEFAULT CACHE REGION  #####################
# sets the default aux value for any non configured caches
jcs.default=DC2
jcs.default.cacheattributes=org.apache.commons.jcs3.engine.CompositeCacheAttributes
jcs.default.cacheattributes.MaxObjects=200001
jcs.default.cacheattributes.MemoryCacheName=org.apache.commons.jcs3.engine.memory.lru.LRUMemoryCache
jcs.default.cacheattributes.UseMemoryShrinker=false
jcs.default.cacheattributes.MaxMemoryIdleTimeSeconds=3600
jcs.default.cacheattributes.ShrinkerIntervalSeconds=60
jcs.default.elementattributes=org.apache.commons.jcs3.engine.ElementAttributes
jcs.default.elementattributes.IsEternal=false
jcs.default.elementattributes.MaxLife=700
jcs.default.elementattributes.IdleTime=1800
jcs.default.elementattributes.IsSpool=true
jcs.default.elementattributes.IsRemote=true
jcs.default.elementattributes.IsLateral=true

##############################################################
################## AUXILIARY CACHES AVAILABLE ################

# Disk Cache Using a Pooled Event Queue -- this allows you
# to control the maximum number of threads it will use.
# Each region uses 1 thread by default in the SINGLE model.
# adding more threads than regions does not help performance.
# If you want to use a separate pool for each disk cache, either use
# the single model or define a different auxiliary for each region and use the Pooled type.
# SINGLE is generally best unless you have a huge # of regions.
jcs.auxiliary.DC2=org.apache.commons.jcs3.auxiliary.disk.indexed.IndexedDiskCacheFactory
jcs.auxiliary.DC2.attributes=org.apache.commons.jcs3.auxiliary.disk.indexed.IndexedDiskCacheAttributes
jcs.auxiliary.DC2.attributes.DiskPath=target/test-sandbox/raf
jcs.auxiliary.DC2.attributes.MaxPurgatorySize=10000
jcs.auxiliary.DC2.attributes.MaxKeySize=10000
jcs.auxiliary.DC2.attributes.OptimizeAtRemoveCount=300000
jcs.auxiliary.DC2.attributes.OptimizeOnShutdown=true
jcs.auxiliary.DC2.attributes.EventQueueType=POOLED
jcs.auxiliary.DC2.attributes.EventQueuePoolName=disk_cache_event_queue

##############################################################
################## OPTIONAL THREAD POOL CONFIGURATION ########

# Disk Cache Event Queue Pool
thread_pool.disk_cache_event_queue.useBoundary=false
thread_pool.disk_cache_event_queue.maximumPoolSize=15
thread_pool.disk_cache_event_queue.minimumPoolSize=1
thread_pool.disk_cache_event_queue.keepAliveTime=3500
thread_pool.disk_cache_event_queue.startUpSize=1