Apache Commons logo Commons Compress

General Notes

Archivers and Compressors

Commons Compress calls all formats that compress a single stream of data compressor formats while all formats that collect multiple entries inside a single (potentially compressed) archive are archiver formats.

The compressor formats supported are gzip, bzip2, XZ, LZMA, Pack200, DEFLATE, Brotli, DEFLATE64, ZStandard and Z, the archiver formats are 7z, ar, arj, cpio, dump, tar and zip. Pack200 is a special case as it can only compress JAR files.

We currently only provide read support for arj, dump, Brotli, DEFLATE64 and Z. arj can only read uncompressed archives, 7z can read archives with many compression and encryption algorithms supported by 7z but doesn't support encryption when writing archives.

Buffering

The stream classes all wrap around streams provided by the calling code and they work on them directly without any additional buffering. On the other hand most of them will benefit from buffering so it is highly recommended that users wrap their stream in Buffered(In|Out)putStreams before using the Commons Compress API.

Factories

Compress provides factory methods to create input/output streams based on the names of the compressor or archiver format as well as factory methods that try to guess the format of an input stream.

To create a compressor writing to a given output by using the algorithm name:

CompressorOutputStream gzippedOut = new CompressorStreamFactory()
    .createCompressorOutputStream(CompressorStreamFactory.GZIP, myOutputStream);

Make the factory guess the input format for a given archiver stream:

ArchiveInputStream input = new ArchiveStreamFactory()
    .createArchiveInputStream(originalInput);

Make the factory guess the input format for a given compressor stream:

CompressorInputStream input = new CompressorStreamFactory()
    .createCompressorInputStream(originalInput);

Note that there is no way to detect the LZMA or Brotli formats so only the two-arg version of createCompressorInputStream can be used. Prior to Compress 1.9 the .Z format hasn't been auto-detected either.

Restricting Memory Usage

Starting with Compress 1.14 CompressorStreamFactory has an optional constructor argument that can be used to set an upper limit of memory that may be used while decompressing or compressing a stream. As of 1.14 this setting only affects decompressing Z, XZ and LZMA compressed streams.

Since Compress 1.19 SevenZFile also has an optional constructor to pass an upper memory limit which is supported in LZMA compressed streams. Since Compress 1.21 this setting also is taken into account when reading the metadata of an archive.

For the Snappy and LZ4 formats the amount of memory used during compression is directly proportional to the window size.

Statistics

Starting with Compress 1.17 most of the CompressorInputStream implementations as well as ZipArchiveInputStream and all streams returned by ZipFile.getInputStream implement the InputStreamStatistics interface. SevenZFile provides statistics for the current entry via the getStatisticsForCurrentEntry method. This interface can be used to track progress while extracting a stream or to detect potential zip bombs when the compression ratio becomes suspiciously large.

Archivers

Unsupported Features

Many of the supported formats have developed different dialects and extensions and some formats allow for features (not yet) supported by Commons Compress.

The ArchiveInputStream class provides a method canReadEntryData that will return false if Commons Compress can detect that an archive uses a feature that is not supported by the current implementation. If it returns false you should not try to read the entry but skip over it.

Entry Names

All archive formats provide meta data about the individual archive entries via instances of ArchiveEntry (or rather subclasses of it). When reading from an archive the information provided the getName method is the raw name as stored inside of the archive. There is no guarantee the name represents a relative file name or even a valid file name on your target operating system at all. You should double check the outcome when you try to create file names from entry names.

Common Extraction Logic

Apart from 7z all formats provide a subclass of ArchiveInputStream that can be used to create an archive. For 7z SevenZFile provides a similar API that does not represent a stream as our implementation requires random access to the input and cannot be used for general streams. The ZIP implementation can benefit a lot from random access as well, see the zip page for details.

Assuming you want to extract an archive to a target directory you'd call getNextEntry, verify the entry can be read, construct a sane file name from the entry's name, create a File and write all contents to it - here IOUtils.copy may come handy. You do so for every entry until getNextEntry returns null.

A skeleton might look like:

File targetDir = ...
try (ArchiveInputStream i = ... create the stream for your format, use buffering...) {
    ArchiveEntry entry = null;
    while ((entry = i.getNextEntry()) != null) {
        if (!i.canReadEntryData(entry)) {
            // log something?
            continue;
        }
        String name = fileName(targetDir, entry);
        File f = new File(name);
        if (entry.isDirectory()) {
            if (!f.isDirectory() && !f.mkdirs()) {
                throw new IOException("failed to create directory " + f);
            }
        } else {
            File parent = f.getParentFile();
            if (!parent.isDirectory() && !parent.mkdirs()) {
                throw new IOException("failed to create directory " + parent);
            }
            try (OutputStream o = Files.newOutputStream(f.toPath())) {
                IOUtils.copy(i, o);
            }
        }
    }
}

where the hypothetical fileName method is written by you and provides the absolute name for the file that is going to be written on disk. Here you should perform checks that ensure the resulting file name actually is a valid file name on your operating system or belongs to a file inside of targetDir when using the entry's name as input.

If you want to combine an archive format with a compression format - like when reading a "tar.gz" file - you wrap the ArchiveInputStream around CompressorInputStream for example:

try (InputStream fi = Files.newInputStream(Paths.get("my.tar.gz"));
     InputStream bi = new BufferedInputStream(fi);
     InputStream gzi = new GzipCompressorInputStream(bi);
     ArchiveInputStream o = new TarArchiveInputStream(gzi)) {
}

Common Archival Logic

Apart from 7z all formats that support writing provide a subclass of ArchiveOutputStream that can be used to create an archive. For 7z SevenZOutputFile provides a similar API that does not represent a stream as our implementation requires random access to the output and cannot be used for general streams. The ZipArchiveOutputStream class will benefit from random access as well but can be used for non-seekable streams - but not all features will be available and the archive size might be slightly bigger, see the zip page for details.

Assuming you want to add a collection of files to an archive, you can first use createArchiveEntry for each file. In general this will set a few flags (usually the last modified time, the size and the information whether this is a file or directory) based on the File or Path instance. Alternatively you can create the ArchiveEntry subclass corresponding to your format directly. Often you may want to set additional flags like file permissions or owner information before adding the entry to the archive.

Next you use putArchiveEntry in order to add the entry and then start using write to add the content of the entry - here IOUtils.copy may come handy. Finally you invoke closeArchiveEntry once you've written all content and before you add the next entry.

Once all entries have been added you'd invoke finish and finally close the stream.

A skeleton might look like:

Collection<File> filesToArchive = ...
try (ArchiveOutputStream o = ... create the stream for your format ...) {
    for (File f : filesToArchive) {
        // maybe skip directories for formats like AR that don't store directories
        ArchiveEntry entry = o.createArchiveEntry(f, entryName(f));
        // potentially add more flags to entry
        o.putArchiveEntry(entry);
        if (f.isFile()) {
            try (InputStream i = Files.newInputStream(f.toPath())) {
                IOUtils.copy(i, o);
            }
        }
        o.closeArchiveEntry();
    }
    o.finish();
}

where the hypothetical entryName method is written by you and provides the name for the entry as it is going to be written to the archive.

If you want to combine an archive format with a compression format - like when creating a "tar.gz" file - you wrap the ArchiveOutputStream around a CompressorOutputStream for example:

try (OutputStream fo = Files.newOutputStream(Paths.get("my.tar.gz"));
     OutputStream gzo = new GzipCompressorOutputStream(fo);
     ArchiveOutputStream o = new TarArchiveOutputStream(gzo)) {
}

7z

Note that Commons Compress currently only supports a subset of compression and encryption algorithms used for 7z archives. For writing only uncompressed entries, LZMA, LZMA2, BZIP2 and Deflate are supported - in addition to those reading supports AES-256/SHA-256 and DEFLATE64.

Writing multipart archives is not supported at all. Multipart archives can be read by concatenating the parts for example by using MultiReadOnlySeekableByteChannel.

7z archives can use multiple compression and encryption methods as well as filters combined as a pipeline of methods for its entries. Prior to Compress 1.8 you could only specify a single method when creating archives - reading archives using more than one method has been possible before. Starting with Compress 1.8 it is possible to configure the full pipeline using the setContentMethods method of SevenZOutputFile. Methods are specified in the order they appear inside the pipeline when creating the archive, you can also specify certain parameters for some of the methods - see the Javadocs of SevenZMethodConfiguration for details.

When reading entries from an archive the getContentMethods method of SevenZArchiveEntry will properly represent the compression/encryption/filter methods but may fail to determine the configuration options used. As of Compress 1.8 only the dictionary size used for LZMA2 can be read.

Currently solid compression - compressing multiple files as a single block to benefit from patterns repeating across files - is only supported when reading archives. This also means compression ratio will likely be worse when using Commons Compress compared to the native 7z executable.

Reading or writing requires a SeekableByteChannel that will be obtained transparently when reading from or writing to a file. The class org.apache.commons.compress.utils.SeekableInMemoryByteChannel allows you to read from or write to an in-memory archive.

Some 7z archives don't contain any names for the archive entries. The native 7zip tools derive a default name from the name of the archive itself for such entries. Starting with Compress 1.19 SevenZFile has an option to mimic this behavior, but by default unnamed archive entries will return null from SevenZArchiveEntry#getName.

Adding an entry to a 7z archive:

SevenZOutputFile sevenZOutput = new SevenZOutputFile(file);
SevenZArchiveEntry entry = sevenZOutput.createArchiveEntry(fileToArchive, name);
sevenZOutput.putArchiveEntry(entry);
sevenZOutput.write(contentOfEntry);
sevenZOutput.closeArchiveEntry();

Uncompressing a given 7z archive (you would certainly add exception handling and make sure all streams get closed properly):

SevenZFile sevenZFile = new SevenZFile(new File("archive.7z"));
SevenZArchiveEntry entry = sevenZFile.getNextEntry();
byte[] content = new byte[entry.getSize()];
LOOP UNTIL entry.getSize() HAS BEEN READ {
    sevenZFile.read(content, offset, content.length - offset);
}

Uncompressing a given in-memory 7z archive:

byte[] inputData; // 7z archive contents
SeekableInMemoryByteChannel inMemoryByteChannel = new SeekableInMemoryByteChannel(inputData);
SevenZFile sevenZFile = new SevenZFile(inMemoryByteChannel);
SevenZArchiveEntry entry = sevenZFile.getNextEntry();
sevenZFile.read();  // read current entry's data

Encrypted 7z Archives

Currently Compress supports reading but not writing of encrypted archives. When reading an encrypted archive a password has to be provided to one of SevenZFile's constructors. If you try to read an encrypted archive without specifying a password a PasswordRequiredException (a subclass of IOException) will be thrown.

When specifying the password as a byte[] one common mistake is to use the wrong encoding when creating the byte[] from a String. The SevenZFile class expects the bytes to correspond to the UTF16-LE encoding of the password. An example of reading an encrypted archive is

SevenZFile sevenZFile = new SevenZFile(new File("archive.7z"), "secret".getBytes(StandardCharsets.UTF_16LE));
SevenZArchiveEntry entry = sevenZFile.getNextEntry();
byte[] content = new byte[entry.getSize()];
LOOP UNTIL entry.getSize() HAS BEEN READ {
    sevenZFile.read(content, offset, content.length - offset);
}

Starting with Compress 1.17 new constructors have been added that accept the password as char[] rather than a byte[]. We recommend you use these in order to avoid the problem above.

SevenZFile sevenZFile = new SevenZFile(new File("archive.7z"), "secret".toCharArray());
SevenZArchiveEntry entry = sevenZFile.getNextEntry();
byte[] content = new byte[entry.getSize()];
LOOP UNTIL entry.getSize() HAS BEEN READ {
    sevenZFile.read(content, offset, content.length - offset);
}

Random-Access to 7z Archives

Prior to Compress 1.20 7z archives could only be read sequentially. The getInputStream(SevenZArchiveEntry) method introduced with Compress 1.20 now provides random access but at least when the archive uses solid compression random access will likely be significantly slower than sequential access.

Recovering from Certain Broken 7z Archives

SevenZFile tries to recover archives that look as if they were part of a multi-volume archive where the first volume has been removed too early.

This option has to be enabled explicitly in SevenZFile.Builder. The way recovery works is by Compress scanning an archive from the end for something that might look like valid 7z metadata and use that, if it can successfully parse the block of data. When doing so Compress may encounter blocks of metadata that look like the metadata of very large archives which in turn may make Compress allocate a lot of memory. Therefore we strongly recommend you also set a memory limit inside the SevenZFile.Builder if you enable recovery.

ar

In addition to the information stored in ArchiveEntry a ArArchiveEntry stores information about the owner user and group as well as Unix permissions.

Adding an entry to an ar archive:

ArArchiveEntry entry = new ArArchiveEntry(name, size);
arOutput.putArchiveEntry(entry);
arOutput.write(contentOfEntry);
arOutput.closeArchiveEntry();

Reading entries from an ar archive:

ArArchiveEntry entry = (ArArchiveEntry) arInput.getNextEntry();
byte[] content = new byte[entry.getSize()];
LOOP UNTIL entry.getSize() HAS BEEN READ {
    arInput.read(content, offset, content.length - offset);
}

Traditionally the AR format doesn't allow file names longer than 16 characters. There are two variants that circumvent this limitation in different ways, the GNU/SRV4 and the BSD variant. Commons Compress 1.0 to 1.2 can only read archives using the GNU/SRV4 variant, support for the BSD variant has been added in Commons Compress 1.3. Commons Compress 1.3 also optionally supports writing archives with file names longer than 16 characters using the BSD dialect, writing the SVR4/GNU dialect is not supported.

Version of Apache Commons Compress Support for Traditional AR Format Support for GNU/SRV4 Dialect Support for BSD Dialect
1.0 to 1.2 read/write read -
1.3 and later read/write read read/write

It is not possible to detect the end of an AR archive in a reliable way so ArArchiveInputStream will read until it reaches the end of the stream or fails to parse the stream's content as AR entries.

arj

Note that Commons Compress doesn't support compressed, encrypted or multi-volume ARJ archives, yet.

Uncompressing a given arj archive (you would certainly add exception handling and make sure all streams get closed properly):

ArjArchiveEntry entry = arjInput.getNextEntry();
byte[] content = new byte[entry.getSize()];
LOOP UNTIL entry.getSize() HAS BEEN READ {
    arjInput.read(content, offset, content.length - offset);
}

cpio

In addition to the information stored in ArchiveEntry a CpioArchiveEntry stores various attributes including information about the original owner and permissions.

The cpio package supports the "new portable" as well as the "old" format of CPIO archives in their binary, ASCII and "with CRC" variants.

Adding an entry to a cpio archive:

CpioArchiveEntry entry = new CpioArchiveEntry(name, size);
cpioOutput.putArchiveEntry(entry);
cpioOutput.write(contentOfEntry);
cpioOutput.closeArchiveEntry();

Reading entries from an cpio archive:

CpioArchiveEntry entry = cpioInput.getNextCPIOEntry();
byte[] content = new byte[entry.getSize()];
LOOP UNTIL entry.getSize() HAS BEEN READ {
    cpioInput.read(content, offset, content.length - offset);
}

Traditionally CPIO archives are written in blocks of 512 bytes - the block size is a configuration parameter of the Cpio*Stream's constructors. Starting with version 1.5 CpioArchiveInputStream will consume the padding written to fill the current block when the end of the archive is reached. Unfortunately many CPIO implementations use larger block sizes so there may be more zero-byte padding left inside the original input stream after the archive has been consumed completely.

jar

In general, JAR archives are ZIP files, so the JAR package supports all options provided by the ZIP package.

To be interoperable JAR archives should always be created using the UTF-8 encoding for file names (which is the default).

Archives created using JarArchiveOutputStream will implicitly add a JarMarker extra field to the very first archive entry of the archive which will make Solaris recognize them as Java archives and allows them to be used as executables.

Note that ArchiveStreamFactory doesn't distinguish ZIP archives from JAR archives, so if you use the one-argument createArchiveInputStream method on a JAR archive, it will still return the more generic ZipArchiveInputStream.

The JarArchiveEntry class contains fields for certificates and attributes that are planned to be supported in the future but are not supported as of Compress 1.0.

Adding an entry to a jar archive:

JarArchiveEntry entry = new JarArchiveEntry(name, size);
entry.setSize(size);
jarOutput.putArchiveEntry(entry);
jarOutput.write(contentOfEntry);
jarOutput.closeArchiveEntry();

Reading entries from an jar archive:

JarArchiveEntry entry = jarInput.getNextJarEntry();
byte[] content = new byte[entry.getSize()];
LOOP UNTIL entry.getSize() HAS BEEN READ {
    jarInput.read(content, offset, content.length - offset);
}

dump

In addition to the information stored in ArchiveEntry a DumpArchiveEntry stores various attributes including information about the original owner and permissions.

As of Commons Compress 1.3 only dump archives using the new-fs format - this is the most common variant - are supported. Right now this library supports uncompressed and ZLIB compressed archives and can not write archives at all.

Reading entries from an dump archive:

DumpArchiveEntry entry = dumpInput.getNextDumpEntry();
byte[] content = new byte[entry.getSize()];
LOOP UNTIL entry.getSize() HAS BEEN READ {
    dumpInput.read(content, offset, content.length - offset);
}

Prior to version 1.5 DumpArchiveInputStream would close the original input once it had read the last record. Starting with version 1.5 it will not close the stream implicitly.

tar

The TAR package has a dedicated documentation page.

Adding an entry to a tar archive:

TarArchiveEntry entry = new TarArchiveEntry(name);
entry.setSize(size);
tarOutput.putArchiveEntry(entry);
tarOutput.write(contentOfEntry);
tarOutput.closeArchiveEntry();

Reading entries from an tar archive:

TarArchiveEntry entry = tarInput.getNextTarEntry();
byte[] content = new byte[entry.getSize()];
LOOP UNTIL entry.getSize() HAS BEEN READ {
    tarInput.read(content, offset, content.length - offset);
}

zip

The ZIP package has a dedicated documentation page.

Adding an entry to a zip archive:

ZipArchiveEntry entry = new ZipArchiveEntry(name);
entry.setSize(size);
zipOutput.putArchiveEntry(entry);
zipOutput.write(contentOfEntry);
zipOutput.closeArchiveEntry();

ZipArchiveOutputStream can use some internal optimizations exploiting SeekableByteChannel if it knows it is writing to a seekable output rather than a non-seekable stream. If you are writing to a file, you should use the constructor that accepts a File or SeekableByteChannel argument rather than the one using an OutputStream or the factory method in ArchiveStreamFactory.

Reading entries from an zip archive:

ZipArchiveEntry entry = zipInput.getNextZipEntry();
byte[] content = new byte[entry.getSize()];
LOOP UNTIL entry.getSize() HAS BEEN READ {
    zipInput.read(content, offset, content.length - offset);
}

Reading entries from an zip archive using the recommended ZipFile class:

ZipArchiveEntry entry = zipFile.getEntry(name);
InputStream content = zipFile.getInputStream(entry);
try {
    READ UNTIL content IS EXHAUSTED
} finally {
    content.close();
}

Reading entries from an in-memory zip archive using SeekableInMemoryByteChannel and ZipFile class:

byte[] inputData; // zip archive contents
SeekableInMemoryByteChannel inMemoryByteChannel = new SeekableInMemoryByteChannel(inputData);
ZipFile zipFile = new ZipFile(inMemoryByteChannel);
ZipArchiveEntry archiveEntry = zipFile.getEntry("entryName");
InputStream inputStream = zipFile.getInputStream(archiveEntry);
inputStream.read() // read data from the input stream

Creating a zip file with multiple threads:

A simple implementation to create a zip file might look like this:
public class ScatterSample {

  ParallelScatterZipCreator scatterZipCreator = new ParallelScatterZipCreator();
  ScatterZipOutputStream dirs = ScatterZipOutputStream.fileBased(File.createTempFile("scatter-dirs", "tmp"));

  public ScatterSample() throws IOException {
  }

  public void addEntry(ZipArchiveEntry zipArchiveEntry, InputStreamSupplier streamSupplier) throws IOException {
     if (zipArchiveEntry.isDirectory() && !zipArchiveEntry.isUnixSymlink())
        dirs.addArchiveEntry(ZipArchiveEntryRequest.createZipArchiveEntryRequest(zipArchiveEntry, streamSupplier));
     else
        scatterZipCreator.addArchiveEntry( zipArchiveEntry, streamSupplier);
  }

  public void writeTo(ZipArchiveOutputStream zipArchiveOutputStream)
  throws IOException, ExecutionException, InterruptedException {
     dirs.writeTo(zipArchiveOutputStream);
     dirs.close();
     scatterZipCreator.writeTo(zipArchiveOutputStream);
  }
}

Compressors

Concatenated Streams

For the bzip2, gzip and XZ formats as well as the framed lz4 format a single compressed file may actually consist of several streams that will be concatenated by the command line utilities when decompressing them. Starting with Commons Compress 1.4 the *CompressorInputStreams for these formats support concatenating streams as well, but they won't do so by default. You must use the two-arg constructor and explicitly enable the support.

Brotli

The implementation of this package is provided by the Google Brotli dec library.

Uncompressing a given Brotli compressed file (you would certainly add exception handling and make sure all streams get closed properly):

InputStream fin = Files.newInputStream(Paths.get("archive.tar.br"));
BufferedInputStream in = new BufferedInputStream(fin);
OutputStream out = Files.newOutputStream(Paths.get("archive.tar"));
BrotliCompressorInputStream brIn = new BrotliCompressorInputStream(in);
final byte[] buffer = new byte[buffersize];
int n = 0;
while (-1 != (n = brIn.read(buffer))) {
    out.write(buffer, 0, n);
}
out.close();
brIn.close();

bzip2

Note that BZipCompressorOutputStream keeps hold of some big data structures in memory. While it is recommended for any stream that you close it as soon as you no longer need it, this is even more important for BZipCompressorOutputStream.

Uncompressing a given bzip2 compressed file (you would certainly add exception handling and make sure all streams get closed properly):

InputStream fin = Files.newInputStream(Paths.get("archive.tar.bz2"));
BufferedInputStream in = new BufferedInputStream(fin);
OutputStream out = Files.newOutputStream(Paths.get("archive.tar"));
BZip2CompressorInputStream bzIn = new BZip2CompressorInputStream(in);
final byte[] buffer = new byte[buffersize];
int n = 0;
while (-1 != (n = bzIn.read(buffer))) {
    out.write(buffer, 0, n);
}
out.close();
bzIn.close();

Compressing a given file using bzip2 (you would certainly add exception handling and make sure all streams get closed properly):

InputStream in = Files.newInputStream(Paths.get("archive.tar"));
OutputStream fout = Files.newOutputStream(Paths.get("archive.tar.bz2"));
BufferedOutputStream out = new BufferedOutputStream(fout);
BZip2CompressorOutputStream bzOut = new BZip2CompressorOutputStream(out);
final byte[] buffer = new byte[buffersize];
int n = 0;
while (-1 != (n = in.read(buffer))) {
    bzOut.write(buffer, 0, n);
}
bzOut.close();
in.close();

DEFLATE

The implementation of the DEFLATE/INFLATE code used by this package is provided by the java.util.zip package of the Java class library.

Uncompressing a given DEFLATE compressed file (you would certainly add exception handling and make sure all streams get closed properly):

InputStream fin = Files.newInputStream(Paths.get("some-file"));
BufferedInputStream in = new BufferedInputStream(fin);
OutputStream out = Files.newOutputStream(Paths.get("archive.tar"));
DeflateCompressorInputStream defIn = new DeflateCompressorInputStream(in);
final byte[] buffer = new byte[buffersize];
int n = 0;
while (-1 != (n = defIn.read(buffer))) {
    out.write(buffer, 0, n);
}
out.close();
defIn.close();

Compressing a given file using DEFLATE (you would certainly add exception handling and make sure all streams get closed properly):

InputStream in = Files.newInputStream(Paths.get("archive.tar"));
OutputStream fout = Files.newOutputStream(Paths.get("some-file"));
BufferedOutputStream out = new BufferedOutputStream(fout);
DeflateCompressorOutputStream defOut = new DeflateCompressorOutputStream(out);
final byte[] buffer = new byte[buffersize];
int n = 0;
while (-1 != (n = in.read(buffer))) {
    defOut.write(buffer, 0, n);
}
defOut.close();
in.close();

DEFLATE64

Uncompressing a given DEFLATE64 compressed file (you would certainly add exception handling and make sure all streams get closed properly):

InputStream fin = Files.newInputStream(Paths.get("some-file"));
BufferedInputStream in = new BufferedInputStream(fin);
OutputStream out = Files.newOutputStream(Paths.get("archive.tar"));
Deflate64CompressorInputStream defIn = new Deflate64CompressorInputStream(in);
final byte[] buffer = new byte[buffersize];
int n = 0;
while (-1 != (n = defIn.read(buffer))) {
    out.write(buffer, 0, n);
}
out.close();
defIn.close();

gzip

The implementation of the DEFLATE/INFLATE code used by this package is provided by the java.util.zip package of the Java class library.

Uncompressing a given gzip compressed file (you would certainly add exception handling and make sure all streams get closed properly):

InputStream fin = Files.newInputStream(Paths.get("archive.tar.gz"));
BufferedInputStream in = new BufferedInputStream(fin);
OutputStream out = Files.newOutputStream(Paths.get("archive.tar"));
GzipCompressorInputStream gzIn = new GzipCompressorInputStream(in);
final byte[] buffer = new byte[buffersize];
int n = 0;
while (-1 != (n = gzIn.read(buffer))) {
    out.write(buffer, 0, n);
}
out.close();
gzIn.close();

Compressing a given file using gzip (you would certainly add exception handling and make sure all streams get closed properly):

InputStream in = Files.newInputStream(Paths.get("archive.tar"));
OutputStream fout = Files.newOutputStream(Paths.get("archive.tar.gz"));
BufferedOutputStream out = new BufferedOutputStream(fout);
GzipCompressorOutputStream gzOut = new GzipCompressorOutputStream(out);
final byte[] buffer = new byte[buffersize];
int n = 0;
while (-1 != (n = in.read(buffer))) {
    gzOut.write(buffer, 0, n);
}
gzOut.close();
in.close();

LZ4

There are two different "formats" used for lz4. The format called "block format" only contains the raw compressed data while the other provides a higher level "frame format" - Commons Compress offers two different stream classes for reading or writing either format.

Uncompressing a given framed LZ4 file (you would certainly add exception handling and make sure all streams get closed properly):

InputStream fin = Files.newInputStream(Paths.get("archive.tar.lz4"));
BufferedInputStream in = new BufferedInputStream(fin);
OutputStream out = Files.newOutputStream(Paths.get("archive.tar"));
FramedLZ4CompressorInputStream zIn = new FramedLZ4CompressorInputStream(in);
final byte[] buffer = new byte[buffersize];
int n = 0;
while (-1 != (n = zIn.read(buffer))) {
    out.write(buffer, 0, n);
}
out.close();
zIn.close();

Compressing a given file using the LZ4 frame format (you would certainly add exception handling and make sure all streams get closed properly):

InputStream in = Files.newInputStream(Paths.get("archive.tar"));
OutputStream fout = Files.newOutputStream(Paths.get("archive.tar.lz4"));
BufferedOutputStream out = new BufferedOutputStream(fout);
FramedLZ4CompressorOutputStream lzOut = new FramedLZ4CompressorOutputStream(out);
final byte[] buffer = new byte[buffersize];
int n = 0;
while (-1 != (n = in.read(buffer))) {
    lzOut.write(buffer, 0, n);
}
lzOut.close();
in.close();

lzma

The implementation of this package is provided by the public domain XZ for Java library.

Uncompressing a given LZMA compressed file (you would certainly add exception handling and make sure all streams get closed properly):

InputStream fin = Files.newInputStream(Paths.get("archive.tar.lzma"));
BufferedInputStream in = new BufferedInputStream(fin);
OutputStream out = Files.newOutputStream(Paths.get("archive.tar"));
LZMACompressorInputStream lzmaIn = new LZMACompressorInputStream(in);
final byte[] buffer = new byte[buffersize];
int n = 0;
while (-1 != (n = xzIn.read(buffer))) {
    out.write(buffer, 0, n);
}
out.close();
lzmaIn.close();

Compressing a given file using LZMA (you would certainly add exception handling and make sure all streams get closed properly):

InputStream in = Files.newInputStream(Paths.get("archive.tar"));
OutputStream fout = Files.newOutputStream(Paths.get("archive.tar.lzma"));
BufferedOutputStream out = new BufferedOutputStream(fout);
LZMACompressorOutputStream lzOut = new LZMACompressorOutputStream(out);
final byte[] buffer = new byte[buffersize];
int n = 0;
while (-1 != (n = in.read(buffer))) {
    lzOut.write(buffer, 0, n);
}
lzOut.close();
in.close();

Pack200

The Pack200 package has a dedicated documentation page.

The implementation of this package used to be provided by the java.util.zip package of the Java class library. Starting with Compress 1.21 the implementation uses a copy of the pack200 code of the now retired Apache Harmony™ project that ships with Compress itself.

Uncompressing a given pack200 compressed file (you would certainly add exception handling and make sure all streams get closed properly):

InputStream fin = Files.newInputStream(Paths.get("archive.pack"));
BufferedInputStream in = new BufferedInputStream(fin);
OutputStream out = Files.newOutputStream(Paths.get("archive.jar"));
Pack200CompressorInputStream pIn = new Pack200CompressorInputStream(in);
final byte[] buffer = new byte[buffersize];
int n = 0;
while (-1 != (n = pIn.read(buffer))) {
    out.write(buffer, 0, n);
}
out.close();
pIn.close();

Compressing a given jar using pack200 (you would certainly add exception handling and make sure all streams get closed properly):

InputStream in = Files.newInputStream(Paths.get("archive.jar"));
OutputStream fout = Files.newOutputStream(Paths.get("archive.pack"));
BufferedOutputStream out = new BufferedInputStream(fout);
Pack200CompressorOutputStream pOut = new Pack200CompressorOutputStream(out);
final byte[] buffer = new byte[buffersize];
int n = 0;
while (-1 != (n = in.read(buffer))) {
    pOut.write(buffer, 0, n);
}
pOut.close();
in.close();

Snappy

There are two different "formats" used for Snappy, one only contains the raw compressed data while the other provides a higher level "framing format" - Commons Compress offers two different stream classes for reading either format.

Starting with 1.12 we've added support for different dialects of the framing format that can be specified when constructing the stream. The STANDARD dialect follows the "framing format" specification while the IWORK_ARCHIVE dialect can be used to parse IWA files that are part of Apple's iWork 13 format. If no dialect has been specified, STANDARD is used. Only the STANDARD format can be detected by CompressorStreamFactory.

Uncompressing a given framed Snappy file (you would certainly add exception handling and make sure all streams get closed properly):

InputStream fin = Files.newInputStream(Paths.get("archive.tar.sz"));
BufferedInputStream in = new BufferedInputStream(fin);
OutputStream out = Files.newOutputStream(Paths.get("archive.tar"));
FramedSnappyCompressorInputStream zIn = new FramedSnappyCompressorInputStream(in);
final byte[] buffer = new byte[buffersize];
int n = 0;
while (-1 != (n = zIn.read(buffer))) {
    out.write(buffer, 0, n);
}
out.close();
zIn.close();

Compressing a given file using framed Snappy (you would certainly add exception handling and make sure all streams get closed properly):

InputStream in = Files.newInputStream(Paths.get("archive.tar"));
OutputStream fout = Files.newOutputStream(Paths.get("archive.tar.sz"));
BufferedOutputStream out = new BufferedOutputStream(fout);
FramedSnappyCompressorOutputStream snOut = new FramedSnappyCompressorOutputStream(out);
final byte[] buffer = new byte[buffersize];
int n = 0;
while (-1 != (n = in.read(buffer))) {
    snOut.write(buffer, 0, n);
}
snOut.close();
in.close();

XZ

The implementation of this package is provided by the public domain XZ for Java library.

When you try to open an XZ stream for reading using CompressorStreamFactory, Commons Compress will check whether the XZ for Java library is available. Starting with Compress 1.9 the result of this check will be cached unless Compress finds OSGi classes in its classpath. You can use XZUtils#setCacheXZAvailability to override this default behavior.

Uncompressing a given XZ compressed file (you would certainly add exception handling and make sure all streams get closed properly):

InputStream fin = Files.newInputStream(Paths.get("archive.tar.xz"));
BufferedInputStream in = new BufferedInputStream(fin);
OutputStream out = Files.newOutputStream(Paths.get("archive.tar"));
XZCompressorInputStream xzIn = new XZCompressorInputStream(in);
final byte[] buffer = new byte[buffersize];
int n = 0;
while (-1 != (n = xzIn.read(buffer))) {
    out.write(buffer, 0, n);
}
out.close();
xzIn.close();

Compressing a given file using XZ (you would certainly add exception handling and make sure all streams get closed properly):

InputStream in = Files.newInputStream(Paths.get("archive.tar"));
OutputStream fout = Files.newOutputStream(Paths.get("archive.tar.xz"));
BufferedOutputStream out = new BufferedInputStream(fout);
XZCompressorOutputStream xzOut = new XZCompressorOutputStream(out);
final byte[] buffer = new byte[buffersize];
int n = 0;
while (-1 != (n = in.read(buffer))) {
    xzOut.write(buffer, 0, n);
}
xzOut.close();
in.close();

Z

Uncompressing a given Z compressed file (you would certainly add exception handling and make sure all streams get closed properly):

InputStream fin = Files.newInputStream(Paths.get("archive.tar.Z"));
BufferedInputStream in = new BufferedInputStream(fin);
OutputStream out = Files.newOutputStream(Paths.get("archive.tar"));
ZCompressorInputStream zIn = new ZCompressorInputStream(in);
final byte[] buffer = new byte[buffersize];
int n = 0;
while (-1 != (n = zIn.read(buffer))) {
    out.write(buffer, 0, n);
}
out.close();
zIn.close();

Zstandard

The implementation of this package is provided by the Zstandard JNI library.

Uncompressing a given Zstandard compressed file (you would certainly add exception handling and make sure all streams get closed properly):

InputStream fin = Files.newInputStream(Paths.get("archive.tar.zstd"));
BufferedInputStream in = new BufferedInputStream(fin);
OutputStream out = Files.newOutputStream(Paths.get("archive.tar"));
ZstdCompressorInputStream zsIn = new ZstdCompressorInputStream(in);
final byte[] buffer = new byte[buffersize];
int n = 0;
while (-1 != (n = zsIn.read(buffer))) {
    out.write(buffer, 0, n);
}
out.close();
zsIn.close();

Compressing a given file using the Zstandard format (you would certainly add exception handling and make sure all streams get closed properly):

InputStream in = Files.newInputStream(Paths.get("archive.tar"));
OutputStream fout = Files.newOutputStream(Paths.get("archive.tar.zstd"));
BufferedOutputStream out = new BufferedOutputStream(fout);
ZstdCompressorOutputStream zOut = new ZstdCompressorOutputStream(out);
final byte[] buffer = new byte[buffersize];
int n = 0;
while (-1 != (n = in.read(buffer))) {
    zOut.write(buffer, 0, n);
}
zOut.close();
in.close();

Extending Commons Compress

Starting in release 1.13, it is now possible to add Compressor- and ArchiverStream implementations using the Java's ServiceLoader mechanism.

Extending Commons Compress Compressors

To provide your own compressor, you must make available on the classpath a file called META-INF/services/org.apache.commons.compress.compressors.CompressorStreamProvider.

This file MUST contain one fully-qualified class name per line.

For example:

org.apache.commons.compress.compressors.TestCompressorStreamProvider

This class MUST implement the Commons Compress interface org.apache.commons.compress.compressors.CompressorStreamProvider.

Extending Commons Compress Archivers

To provide your own compressor, you must make available on the classpath a file called META-INF/services/org.apache.commons.compress.archivers.ArchiveStreamProvider.

This file MUST contain one fully-qualified class name per line.

For example:

org.apache.commons.compress.archivers.TestArchiveStreamProvider

This class MUST implement the Commons Compress interface org.apache.commons.compress.archivers.ArchiveStreamProvider.