In addition to the information stored in ArchiveEntry a TarArchiveEntry stores various attributes including information about the original owner and permissions.
There are several different tar formats and the TAR package of Compress 1.4 mostly only provides the common functionality of the existing variants.
The original format (often called "ustar") didn't support file names longer than 100 characters or bigger than 8 GiB and the tar package will by default fail if you try to write an entry that goes beyond those limits.
The tar package does not support the full POSIX tar standard nor more modern GNU extension of said standard.
The longFileMode option of TarArchiveOutputStream controls how files with names longer than 100 characters are handled. The possible choices are:
TarArchiveInputStream will recognize the GNU tar as well as the POSIX extensions (starting with Commons Compress 1.2) for long file names and reads the longer names transparently.
The bigNumberMode option of TarArchiveOutputStream controls how files larger than 8GiB or with other big numeric values that can't be encoded in traditional header fields are handled. The possible choices are:
Starting with Commons Compress 1.4 TarArchiveInputStream will recognize the star as well as the POSIX extensions for big numeric values and reads them transparently.
The original ustar format only supports 7-Bit ASCII file names, later implementations use the platform's default encoding to encode file names. The POSIX standard recommends using PAX extension headers for non-ASCII file names instead.
Commons Compress 1.1 to 1.3 assumed file names would be encoded using ISO-8859-1. Starting with Commons Compress 1.4 you can specify the encoding to expect (to use when writing) as a parameter to TarArchiveInputStream (TarArchiveOutputStream), it now defaults to the platform's default encoding.
Since Commons Compress 1.4 another optional parameter - addPaxHeadersForNonAsciiNames - of TarArchiveOutputStream controls whether PAX extension headers will be written for non-ASCII file names. By default they will not be written to preserve space. TarArchiveInputStream will read them transparently if present.
TarArchiveInputStream will recognize sparse file entries stored using the "oldgnu" format (--sparse-version=0.0 in GNU tar) but is not able to extract them correctly. canReadEntryData will return false on such entries. The other variants of sparse files can currently not be detected at all.
The end of a tar archive is signalled by two consecutive records of all zeros. Unfortunately not all tar implementations adhere to this and some only write one record to end the archive. Commons Compress will always write two records but stop reading an archive as soon as finds one record of all zeros.
Prior to version 1.5 this could leave the second EOF record inside the stream when getNextEntry or getNextTarEntry returned null Starting with version 1.5 TarArchiveInputStream will try to read a second record as well if present, effectively consuming the archive completely.