Java, JDBC and "memory leaks"

Every time Java is discussed on Slashdot, someone says that the overheads of automatic memory management aren’t worth it because Java still has memory leaks.

After further discussion, it generally turns out that they’re not talking about memory leaks; rather, they are talking about failure to free up resources in a timely manner — resource hogging. It’s a subtle distinction. In a memory leak, the system loses track of the memory, so it never gets freed during the life of the program. In the case of Java resource hogging, the Java system is still keeping track of the resources, and will eventually free them — it just doesn’t do it soon enough.

A common situation where resource hogging occurs is JDBC, querying a SQL database from a Java application or servlet environment. The problem is, JDBC query code is surprisingly tricky to get completely correct. It’s easy to write code where an exception causes active JDBC objects to be left unclosed, leading to the application being unreliable, overloading the database server, or using more memory than it needs.

MySQL and PostgreSQL are extremely liberal in what they are prepared to accept. For example, you can generally close a connection and rely on the database to implicitly close everything else, including abandoning any uncommitted transactions. This is not the case with IBM DB2, which will actually refuse to let you close a connection unless you have cleared out everything properly. So it’s not just a resource usage issue — you can also suddenly find yourself having to do a ton of debugging when your data load increases and you need to swap out your development database engine for something more scalable.

So, it pays to get your JDBC code right the first time. To illustrate the painful construction of some hopefully correct JDBC query code, I’m going to discuss the process of writing a simple example program in Eclipse.

Let’s start with the outer skeleton. We put the parameters in constants up top for clarity, open a logger, and for our main() routine we’ll load the JDBC driver and then call a doSomething() method that’s going to perform a simple JDBC query.

public class JDBCExample {

  private static final String JDBC_DRIVER = "org.postgresql.Driver";
  private static final String JDBC_URL = "jdbc:postgresql:example";

  private static final Logger logger = Logger.getLogger("JDBCExample");

  public static void main(String[] args) {
    try {
      Class.forName(JDBC_DRIVER);
      logger.info("Loaded JDBC driver.");
      doSomething();
    } catch (ClassNotFoundException e) {
      logger.severe("Can't load database driver: " + e.getMessage());
    }
  }
}

I’m breaking up the driver loading and the actual query this way mostly so it can look more like real-world code, where your query isn’t going to be inside main(), and should be robust against resource hogging even if called from a long-lived servlet. (Oh, and I’ll assume you can handle using Alt-Shift-M to add the appropriate imports.)

Well, so far, nothing special. There’s no unloading of the driver required, and if we can’t load it we’re not going to get anywhere so we may as well crash out. It’s when we start to write the doSomething() method that things get interesting.

public static void doSomething() {
  Connection connection = DriverManager.getConnection(JDBC_URL);
  // Do stuff
}

Eclipse duly underlines DriverManager.getConnection, pointing out that we have an unhandled exception. So, we click the “quick fix” to add a try...catch block, and properly log the exception we catch. Now we have:

try {
  Connection connection = DriverManager.getConnection(JDBC_URL);
  // Do stuff
} catch (SQLException e) {
  logger.severe("Failed to get JDBC connection: " + e.getMessage());
}

Now, as soon as we have the Connection object, it’s our responsibility to ensure that whatever else happens, the connection gets closed. We do that with a try...finally block:

try {
  Connection connection = DriverManager.getConnection(JDBC_URL);
  try {
    // Do stuff
  } finally {
    connection.close();
  }
} catch (SQLException e) {
  logger.severe("Failed to get JDBC connection: " + e.getMessage());
}

Note that we can’t use the same try block as we use to catch the SQLException, because the connection isn’t in scope there. If we were to define the connection variable earlier to get around that problem, we’d hit a more subtle snag: if the getConnection call failed, the finally block would attempt to close an unopened connection, resulting in a second runtime exception that would quite likely obscure the true problem.

OK, you may say, but what about testing if the connection was made, and only closing it if it was? The problem is, testing if the connection variable is null doesn’t tell you whether the Connection object succeeded in opening a connection. If it failed and you try to close() the object, you’ll get another exception. (Really. I’ve seen it happen, dealing with a flaky database connection at work.)

People who try the if (connection != null) { connection.close(); } approach often try to patch up that problem by putting another try...catch inside the finally blocks. Of course, you still end up hiding the true exception, making debugging harder.

I should also note that there’s also no isOpen() method to test if a non-null Connection object is really open, and that isClosed() only tells you if the connection was closed explicitly after a successful opening. So the only clean way to ensure that the connection is closed if and only if it gets opened is to use a nested try block.

Back to the code. Once you have a connection in JDBC, the next step is to set up a Statement or PreparedStatement. I’m going to pick PreparedStatement because it’s the more generally useful.

As with the Connection object, once we have a PreparedStatement we are responsible for making sure it gets closed no matter what. So, we need to follow our PreparedStatement statement = connection.prepareStatement(...) line with another try...catch block. Once more, we can’t use the same try block that surrounds the line itself. Once more, we can’t assume that just because we have a non-null value in our statement variable, that we must have a PreparedStatement that needs closing. So now we have:

try {
  Connection connection = DriverManager.getConnection(JDBC_URL);
  try {
    PreparedStatement statement = connection.prepareStatement("SELECT FIRST,LAST FROM PEOPLE WHERE LAST = ?");
    try {
      // Do stuff
    } finally {
      statement.close();
    }
  } finally {
    connection.close();
  }
} catch (SQLException e) {
  logger.severe("Failed to get JDBC connection: " + e.getMessage());
}

Almost there. Where we have // Do stuff above, we continue our programming with the next three lines:

statement.setString(1, "Smith");
statement.execute();
ResultSet results = statement.getResultSet();

Oh, look, at this point we have a ResultSet object we’re responsible for.

Now, under normal circumstances, a ResultSet will automatically be closed once the connection is closed; so says the JavaDoc. Unfortunately, unusual circumstances happen more often than you might think. For example, in a servlet environment on Windows you can get illegal access errors if you close a connection without closing the ResultSet. In an environment with connection pooling, the unclosed ResultSet can hang around until the connection slot gets reused. Of course, since the ResultSet holds the actual data retrieved, it can be very large.

So, let’s do the right thing. Let’s use another try...finally block to ensure that whatever happens, we explicitly close the ResultSet when we’re done with it. With that done, we have what I hope is the final correct code (also available from GitHub):

public static void doSomething() {
  try {
    Connection connection = DriverManager.getConnection(JDBC_URL);
    try {
      PreparedStatement statement = connection.prepareStatement("SELECT FIRST,LAST FROM PEOPLE WHERE LAST = ?");
      try {
        statement.setString(1, "Smith");
        statement.execute();
        ResultSet results = statement.getResultSet();
        try {
          while (results.next()) {
            String first = results.getString(1);
            String last = results.getString(2);
            logger.info(first + " " + last);
          }
        } finally {
          results.close();
        }
      } finally {
        statement.close();
      }
    } finally {
      connection.close();
    }
  } catch (SQLException e) {
    logger.severe("Failed to get JDBC connection: " + e.getMessage());
  }
}

There’s just one more catch to note, which occurs in the special case where you use explicit transactions with commit() and rollback(): You need to keep track of whether you have pending transactions on the connection, and make sure you either commit them or roll them back. If you don’t, DB2 will whine at you, saying “The transaction remains active, and the connection cannot be closed.” (To which the obvious retort is “Oh yeah? Watch me.”)

JDBC doesn’t seem to provide any way to query the Connection object, so if you’re doing explicit commits, a boolean flag is probably the way to go. Immediately before executing your first statement, set a flag to say that there’s stuff that needs cleaning up; reset it immediately after the commit. Then in the connection-closing finally block, check the flag, and do an abandon() if it’s still set.

The code above isn’t exactly rocket science (or applied cryptography), but it is a good deal more complicated and fiddly than you might expect. So after writing and debugging a bunch of JDBC code for connecting to a flaky database, I decided to take a look at how common it was for Java programmers to actually get all the steps right.

I started off with Sun’s JDBC tutorial, which glosses over the problem. I downloaded the reference code samples (link at bottom of page), and discovered that none of them provides a correct example.

You might think I’m being unreasonable here; that they’re just command-line examples, so they don’t have to be bulletproof, right? I disagree. I think that getting the code right in examples is particularly important, because tens of thousands of Java programmers are going to be copying and pasting that code as the starting point for their code, or doing the same things step by step in their own software. They’re not going to know that there are missing try...finally blocks, and Eclipse isn’t going to tell them.

The reference code examples are particularly sloppy, in fact. Look at InsertRow.java and you’ll see that if the JDBC driver fails to load, the program eats the exception, and continues to try and open a connection anyway! Anyone using this code as an example of how to do things in a J2EE servlet environment is going to make a system administrator very unhappy.

Speaking of J2EE, I checked my copy of Java for the Web with Servlets, JSP, and EJB from New Riders publishing. Chapter 4 covers JDBC. The example code creates Connection, Statement and ResultSet objects, blithely assumes nothing will go wrong, and then closes them all together at the end of the outer try block that catches the SQLException. The examples also perform the sin of catch (Exception e).

The O’Reilly ONJava example takes much the same approach. Conveniently, they have the appropriate chapter of Java Enterprise in a Nutshell on the web. When I got to the section titled “Handling Errors” I hoped it would redeem itself, but sadly not.

I checked a couple of real open source Java web applications — some wiki software, and a helpdesk system, both highly rated on freshmeat.net. Both had incorrect handling of error conditions in their JDBC code. One took the “assume it all works” approach, the other took the “assume that if the variable isn’t null, then the object must have opened the database successfully” approach.

Searching for correct examples netted me Ben Teese’s correct example, the only one I’ve found. Meanwhile, Sun’s developer network has an article which discusses the importance of closing JDBC resources correctly — but then gets it wrong.

In conclusion: Getting JDBC code to be rock solid and never leak resources is tougher than you might think. Most of the example code out there on the web is wrong, and even widely used Java software gets it wrong.