Apache Commons logo Commons Compress

Examples

Archivers and Compressors

Commons Compress calls all formats that compress a single stream of data compressor formats while all formats that collect multiple entries inside a single (potentially compressed) archive are archiver formats.

The compressor formats supported are gzip, bzip2, xz, lzma, Pack200a and Z, the archiver formats are 7z, ar, arj, cpio, dump, tar and zip. Pack200 is a special case as it can only compress JAR files.

We currently only provide read support for lzma, arj, dump and Z. arj can only read uncompressed archives, 7z can read archives with many compression and encryption algorithms supported by 7z but doesn't support encryption when writing archives.

Common Notes

The stream classes all wrap around streams provided by the calling code and they work on them directly without any additional buffering. On the other hand most of them will benefit from buffering so it is highly recommended that users wrap their stream in Buffered(In|Out)putStreams before using the Commons Compress API.

Factories

Compress provides factory methods to create input/output streams based on the names of the compressor or archiver format as well as factory methods that try to guess the format of an input stream.

To create a compressor writing to a given output by using the algorithm name:

CompressorOutputStream gzippedOut = new CompressorStreamFactory()
    .createCompressorOutputStream(CompressorStreamFactory.GZIP, myOutputStream);

Make the factory guess the input format for a given archiver stream:

ArchiveInputStream input = new ArchiveStreamFactory()
    .createArchiveInputStream(originalInput);

Make the factory guess the input format for a given compressor stream:

CompressorInputStream input = new CompressorStreamFactory()
    .createCompressorInputStream(originalInput);

Note that there is no way to detect the lzma format so only the two-arg version of createCompressorInputStream can be used. Prior to Compress 1.9 the .Z format hasn't been auto-detected either.

Unsupported Features

Many of the supported formats have developed different dialects and extensions and some formats allow for features (not yet) supported by Commons Compress.

The ArchiveInputStream class provides a method canReadEntryData that will return false if Commons Compress can detect that an archive uses a feature that is not supported by the current implementation. If it returns false you should not try to read the entry but skip over it.

Concatenated Streams

For the bzip2, gzip and xz formats a single compressed file may actually consist of several streams that will be concatenated by the command line utilities when decompressing them. Starting with Commons Compress 1.4 the *CompressorInputStreams for these formats support concatenating streams as well, but they won't do so by default. You must use the two-arg constructor and explicitly enable the support.

ar

In addition to the information stored in ArchiveEntry a ArArchiveEntry stores information about the owner user and group as well as Unix permissions.

Adding an entry to an ar archive:

ArArchiveEntry entry = new ArArchiveEntry(name, size);
arOutput.putArchiveEntry(entry);
arOutput.write(contentOfEntry);
arOutput.closeArchiveEntry();

Reading entries from an ar archive:

ArArchiveEntry entry = (ArArchiveEntry) arInput.getNextEntry();
byte[] content = new byte[entry.getSize()];
LOOP UNTIL entry.getSize() HAS BEEN READ {
    arInput.read(content, offset, content.length - offset);
}

Traditionally the AR format doesn't allow file names longer than 16 characters. There are two variants that circumvent this limitation in different ways, the GNU/SRV4 and the BSD variant. Commons Compress 1.0 to 1.2 can only read archives using the GNU/SRV4 variant, support for the BSD variant has been added in Commons Compress 1.3. Commons Compress 1.3 also optionally supports writing archives with file names longer than 16 characters using the BSD dialect, writing the SVR4/GNU dialect is not supported.

It is not possible to detect the end of an AR archive in a reliable way so ArArchiveInputStream will read until it reaches the end of the stream or fails to parse the stream's content as AR entries.

cpio

In addition to the information stored in ArchiveEntry a CpioArchiveEntry stores various attributes including information about the original owner and permissions.

The cpio package supports the "new portable" as well as the "old" format of CPIO archives in their binary, ASCII and "with CRC" variants.

Adding an entry to a cpio archive:

CpioArchiveEntry entry = new CpioArchiveEntry(name, size);
cpioOutput.putArchiveEntry(entry);
cpioOutput.write(contentOfEntry);
cpioOutput.closeArchiveEntry();

Reading entries from an cpio archive:

CpioArchiveEntry entry = cpioInput.getNextCPIOEntry();
byte[] content = new byte[entry.getSize()];
LOOP UNTIL entry.getSize() HAS BEEN READ {
    cpioInput.read(content, offset, content.length - offset);
}

Traditionally CPIO archives are written in blocks of 512 bytes - the block size is a configuration parameter of the Cpio*Stream's constuctors. Starting with version 1.5 CpioArchiveInputStream will consume the padding written to fill the current block when the end of the archive is reached. Unfortunately many CPIO implementations use larger block sizes so there may be more zero-byte padding left inside the original input stream after the archive has been consumed completely.

dump

In addition to the information stored in ArchiveEntry a DumpArchiveEntry stores various attributes including information about the original owner and permissions.

As of Commons Compress 1.3 only dump archives using the new-fs format - this is the most common variant - are supported. Right now this library supports uncompressed and ZLIB compressed archives and can not write archives at all.

Reading entries from an dump archive:

DumpArchiveEntry entry = dumpInput.getNextDumpEntry();
byte[] content = new byte[entry.getSize()];
LOOP UNTIL entry.getSize() HAS BEEN READ {
    dumpInput.read(content, offset, content.length - offset);
}

Prior to version 1.5 DumpArchiveInputStream would close the original input once it had read the last record. Starting with version 1.5 it will not close the stream implicitly.

tar

The TAR package has a dedicated documentation page.

Adding an entry to a tar archive:

TarArchiveEntry entry = new TarArchiveEntry(name);
entry.setSize(size);
tarOutput.putArchiveEntry(entry);
tarOutput.write(contentOfEntry);
tarOutput.closeArchiveEntry();

Reading entries from an tar archive:

TarArchiveEntry entry = tarInput.getNextTarEntry();
byte[] content = new byte[entry.getSize()];
LOOP UNTIL entry.getSize() HAS BEEN READ {
    tarInput.read(content, offset, content.length - offset);
}

zip

The ZIP package has a dedicated documentation page.

Adding an entry to a zip archive:

ZipArchiveEntry entry = new ZipArchiveEntry(name);
entry.setSize(size);
zipOutput.putArchiveEntry(entry);
zipOutput.write(contentOfEntry);
zipOutput.closeArchiveEntry();

ZipArchiveOutputStream can use some internal optimizations exploiting RandomAccessFile if it knows it is writing to a file rather than a non-seekable stream. If you are writing to a file, you should use the constructor that accepts a File argument rather than the one using an OutputStream or the factory method in ArchiveStreamFactory.

Reading entries from an zip archive:

ZipArchiveEntry entry = zipInput.getNextZipEntry();
byte[] content = new byte[entry.getSize()];
LOOP UNTIL entry.getSize() HAS BEEN READ {
    zipInput.read(content, offset, content.length - offset);
}

Reading entries from an zip archive using the recommended ZipFile class:

ZipArchiveEntry entry = zipFile.getEntry(name);
InputStream content = zipFile.getInputStream(entry);
try {
    READ UNTIL content IS EXHAUSTED
} finally {
    content.close();
}

jar

In general, JAR archives are ZIP files, so the JAR package supports all options provided by the ZIP package.

To be interoperable JAR archives should always be created using the UTF-8 encoding for file names (which is the default).

Archives created using JarArchiveOutputStream will implicitly add a JarMarker extra field to the very first archive entry of the archive which will make Solaris recognize them as Java archives and allows them to be used as executables.

Note that ArchiveStreamFactory doesn't distinguish ZIP archives from JAR archives, so if you use the one-argument createArchiveInputStream method on a JAR archive, it will still return the more generic ZipArchiveInputStream.

The JarArchiveEntry class contains fields for certificates and attributes that are planned to be supported in the future but are not supported as of Compress 1.0.

Adding an entry to a jar archive:

JarArchiveEntry entry = new JarArchiveEntry(name, size);
entry.setSize(size);
jarOutput.putArchiveEntry(entry);
jarOutput.write(contentOfEntry);
jarOutput.closeArchiveEntry();

Reading entries from an jar archive:

JarArchiveEntry entry = jarInput.getNextJarEntry();
byte[] content = new byte[entry.getSize()];
LOOP UNTIL entry.getSize() HAS BEEN READ {
    jarInput.read(content, offset, content.length - offset);
}

bzip2

Note that BZipCompressorOutputStream keeps hold of some big data structures in memory. While it is true recommended for any stream that you close it as soon as you no longer needed, this is even more important for BZipCompressorOutputStream.

Uncompressing a given bzip2 compressed file (you would certainly add exception handling and make sure all streams get closed properly):

FileInputStream fin = new FileInputStream("archive.tar.bz2");
BufferedInputStream in = new BufferedInputStream(fin);
FileOutputStream out = new FileOutputStream("archive.tar");
BZip2CompressorInputStream bzIn = new BZip2CompressorInputStream(in);
final byte[] buffer = new byte[buffersize];
int n = 0;
while (-1 != (n = bzIn.read(buffer))) {
    out.write(buffer, 0, n);
}
out.close();
bzIn.close();

gzip

The implementation of this package is provided by the java.util.zip package of the Java class library.

Uncompressing a given gzip compressed file (you would certainly add exception handling and make sure all streams get closed properly):

FileInputStream fin = new FileInputStream("archive.tar.gz");
BufferedInputStream in = new BufferedInputStream(fin);
FileOutputStream out = new FileOutputStream("archive.tar");
GZipCompressorInputStream gzIn = new GZipCompressorInputStream(in);
final byte[] buffer = new byte[buffersize];
int n = 0;
while (-1 != (n = gzIn.read(buffer))) {
    out.write(buffer, 0, n);
}
out.close();
gzIn.close();

Pack200

The Pack200 package has a dedicated documentation page.

The implementation of this package is provided by the java.util.zip package of the Java class library.

Uncompressing a given pack200 compressed file (you would certainly add exception handling and make sure all streams get closed properly):

FileInputStream fin = new FileInputStream("archive.pack");
BufferedInputStream in = new BufferedInputStream(fin);
FileOutputStream out = new FileOutputStream("archive.jar");
Pack200CompressorInputStream pIn = new Pack200CompressorInputStream(in);
final byte[] buffer = new byte[buffersize];
int n = 0;
while (-1 != (n = pIn.read(buffer))) {
    out.write(buffer, 0, n);
}
out.close();
pIn.close();

XZ

The implementation of this package is provided by the public domain XZ for Java library.

Uncompressing a given XZ compressed file (you would certainly add exception handling and make sure all streams get closed properly):

FileInputStream fin = new FileInputStream("archive.tar.xz");
BufferedInputStream in = new BufferedInputStream(fin);
FileOutputStream out = new FileOutputStream("archive.tar");
XZCompressorInputStream xzIn = new XZCompressorInputStream(in);
final byte[] buffer = new byte[buffersize];
int n = 0;
while (-1 != (n = xzIn.read(buffer))) {
    out.write(buffer, 0, n);
}
out.close();
xzIn.close();

Z

Uncompressing a given Z compressed file (you would certainly add exception handling and make sure all streams get closed properly):

FileInputStream fin = new FileInputStream("archive.tar.Z");
BufferedInputStream in = new BufferedInputStream(fin);
FileOutputStream out = new FileOutputStream("archive.tar");
ZCompressorInputStream zIn = new ZCompressorInputStream(in);
final byte[] buffer = new byte[buffersize];
int n = 0;
while (-1 != (n = zIn.read(buffer))) {
    out.write(buffer, 0, n);
}
out.close();
zIn.close();

lzma

The implementation of this package is provided by the public domain XZ for Java library.

Uncompressing a given lzma compressed file (you would certainly add exception handling and make sure all streams get closed properly):

FileInputStream fin = new FileInputStream("archive.tar.lzma");
BufferedInputStream in = new BufferedInputStream(fin);
FileOutputStream out = new FileOutputStream("archive.tar");
LZMACompressorInputStream lzmaIn = new LZMACompressorInputStream(in);
final byte[] buffer = new byte[buffersize];
int n = 0;
while (-1 != (n = xzIn.read(buffer))) {
    out.write(buffer, 0, n);
}
out.close();
lzmaIn.close();

7z

Note that Commons Compress currently only supports a subset of compression and encryption algorithms used for 7z archives. For writing only uncompressed entries, LZMA2, BZIP2 and Deflate are supported - reading also supports LZMA and AES-256/SHA-256.

Multipart archives are not supported at all.

7z archives can use multiple compression and encryption methods as well as filters combined as a pipeline of methods for its entries. Prior to Compress 1.8 you could only specify a single method when creating archives - reading archives using more than one method has been possible before. Starting with Compress 1.8 it is possible to configure the full pipeline using the setContentMethods method of SevenZOutputFile. Methods are specified in the order they appear inside the pipeline when creating the archive, you can also specify certain parameters for some of the methods - see the Javadocs of SevenZMethodConfiguration for details.

When reading entries from an archive the getContentMethods method of SevenZArchiveEntry will properly represent the compression/encryption/filter methods but may fail to determine the configuration options used. As of Compress 1.8 only the dictionary size used for LZMA2 can be read.

Currently solid compression - compressing multiple files as a single block to benefit from patterns repeating accross files - is only supported when reading archives. This also means compression ratio will likely be worse when using Commons Compress compared to the native 7z executable.

Adding an entry to a 7z archive:

SevenZOutputFile sevenZOutput = new SevenZOutputFile(file);
SevenZArchiveEntry entry = sevenZOutput.createArchiveEntry(fileToArchive, name);
sevenZOutput.putArchiveEntry(entry);
sevenZOutput.write(contentOfEntry);
sevenZOutput.closeArchiveEntry();

Uncompressing a given 7z archive (you would certainly add exception handling and make sure all streams get closed properly):

SevenZFile sevenZFile = new SevenZFile(new File("archive.7z"));
SevenZArchiveEntry entry = sevenZFile.getNextEntry();
byte[] content = new byte[entry.getSize()];
LOOP UNTIL entry.getSize() HAS BEEN READ {
    sevenZFile.read(content, offset, content.length - offset);
}

arj

Note that Commons Compress doesn't support compressed, encrypted or multi-volume ARJ archives, yet.

Uncompressing a given arj archive (you would certainly add exception handling and make sure all streams get closed properly):

ArjArchiveEntry entry = arjInput.getNextEntry();
byte[] content = new byte[entry.getSize()];
LOOP UNTIL entry.getSize() HAS BEEN READ {
    arjInput.read(content, offset, content.length - offset);
}

Snappy

There are two different "formats" used for Snappy, one only contains the raw compressed data while the other provides a higher level "framing format" - Commons Compress offers two different stream classes for reading either format.

Uncompressing a given framed Snappy file (you would certainly add exception handling and make sure all streams get closed properly):

FileInputStream fin = new FileInputStream("archive.tar.sz");
BufferedInputStream in = new BufferedInputStream(fin);
FileOutputStream out = new FileOutputStream("archive.tar");
FramedSnappyCompressorInputStream zIn = new FramedSnappyCompressorInputStream(in);
final byte[] buffer = new byte[buffersize];
int n = 0;
while (-1 != (n = zIn.read(buffer))) {
    out.write(buffer, 0, n);
}
out.close();
zIn.close();