Apache Commons logo Commons IO

Best practices

This document presents a number of 'best practices' in the IO area.

java.io.File

Often, you have to deal with files and filenames. There are many things that can go wrong:

  • A class works in Unix but doesn't on Windows (or vice versa)
  • Invalid filenames due to double or missing path separators
  • UNC filenames (on Windows) don't work with my home-grown filename utility function
  • etc. etc.

These are good reasons not to work with filenames as Strings. Using java.io.File instead handles many of the above cases nicely. Thus, our best practice recommendation is to use java.io.File instead of String for filenames to avoid platform dependencies.

Version 1.1 of commons-io now includes a dedicated filename handling class - FilenameUtils. This does handle many of these filename issues, however we still recommend, wherever possible, that you use java.io.File objects.

Let's look at an example.

 public static String getExtension(String filename) {
   int index = filename.lastIndexOf('.');
   if (index == -1) {
     return "";
   } else {
     return filename.substring(index + 1);
   }
 }

Easy enough? Right, but what happens if someone passes in a full path instead of only a filename? Consider the following, perfectly legal path: "C:\Temp\documentation.new\README". The method as defined above would return "new\README" - definitely not what you wanted.

Please use java.io.File for filenames instead of Strings. The functionality that the class provides is well tested. In FileUtils you will find other useful utility functions around java.io.File.

Instead of:

 String tmpdir = "/var/tmp";
 String tmpfile = tmpdir + System.getProperty("file.separator") + "test.tmp";
 InputStream in = new java.io.FileInputStream(tmpfile);

...write:

 File tmpdir = new File("/var/tmp");
 File tmpfile = new File(tmpdir, "test.tmp");
 InputStream in = new java.io.FileInputStream(tmpfile);

Buffering streams

IO performance depends a lot on the buffering strategy. Usually, it's quite fast to read packets with the size of 512 or 1024 bytes because these sizes match well with the packet sizes used on hard disks in file systems or file system caches. But as soon as you have to read only a few bytes and that many times performance drops significantly.

Make sure you're properly buffering streams when reading or writing streams, especially when working with files. Just decorate your FileInputStream with a BufferedInputStream:

 InputStream in = new java.io.FileInputStream(myfile);
 try {
   in = new java.io.BufferedInputStream(in);
   
   in.read(.....
 } finally {
   IOUtils.closeQuietly(in);
 }

Pay attention that you're not buffering an already buffered stream. Some components like XML parsers may do their own buffering so decorating the InputStream you pass to the XML parser does nothing but slowing down your code. If you use our CopyUtils or IOUtils you don't need to additionally buffer the streams you use as the code in there already buffers the copy process. Always check the Javadocs for information. Another case where buffering is unnecessary is when you write to a ByteArrayOutputStream since you're writing to memory only.