Monday, April 16, 2012

Java - String.format(), Formatter.format(), StringUtils performance

I’m re-writing a module I wrote a few years ago to improve performance and meet the customer requirements – we’re talking about up to 200.000 XML messages per day, with 90 messages / sec at peak times, to get an idea about how many times these methods are being called, every bit of wasted time / resources is considerably multiplied.

I’ve already done various things like reducing number of operations and waste in objects being created, but now I’m down to lower level inner parts.

Been using TreeMaps to cache some search data and avoid trips to the database, and the first implementation was using multi-keyed objects (containing multiple string keys) and comparator classes; although it was nice using typed objects as map keys it was on the other hand a bit wasteful and slower. I’ve since changed those to plain “packed” string keys, to reduce the number of objects being created, with the disadvantage of loosing the typing in the process, but it does run faster.

I did mention in the previous post that I’ve used recently VisualVM to sample application activity and spotted that these little pack() methods took somewhat an unusual amount of time - still looking in reducing the number of operations, but I said hey let’s see if they can be improved with little effort.


What the code does is fairly straight forward, takes multiple strings (“a” - “b” - “c”) and packs them all together into a padded, upper-cased string key, keeping the field lengths as in the database. That to ensure the resulting keys can be ordered and searched using get() and subMap() methods.

Although been using Apache Commons Lang packages for years now and if using Maven in Eclipse you can always look at the sources to see how exactly they do things, I chose String.format() as it should be faster, right? Well, apparently I wasn’t right. Started writing a simple performance test to compare with Formatter and then I thought lets add a StringUtils implementation and see how it fares…

First, it just seems that String.format() was the slowest and StringUtils was the fastest! And second, String.format() is not so bad in Java 7 (although it might not seem so as the tests seem to run in different order).

Java HotSpot 6
(1.6.0_31 x64)
Win 2008 R2 SP1 x64
Java HotSpot 7
(1.7.0_03 x64)
Win 2008 R2 SP1 x64
String.format() 110 ms 63 ms
Formatter.format() 47 ms 94 ms
StringUtils 15 ms 16 ms

I repeated the tests a few times and although obviously timings change slightly between runs, but StringUtils is consistently a few times faster. Obviously the test doesn’t measure in any way the garbage collection costs and have no idea how it will fare in the application performance tests just yet, but does promise a slight improvement with little code changes.

Posting the code for review, in case I’m missing something – using junit-4.10.jar and commons-lang-2.6.jar (http://search.maven.org/).
import java.util.Formatter;

import org.apache.commons.lang.StringUtils;
import org.apache.commons.lang.time.StopWatch;
import org.junit.BeforeClass;
import org.junit.Test;

public class PerfTest {
  /**
   * Constants
   */
  private int MAX_COUNT = 1000;
  
  /**
   * 
   */
  @BeforeClass
  public static void javaInfo() {
      System.out.println(String.format("java version \"%s\"", System.getProperty("java.version")));
      System.out.println(String.format("%s (build %s)",       System.getProperty("java.vm.name"), System.getProperty("java.vm.version")));
      System.out.println();
  }
  
  /**
   * String.format() test
   */
  @Test
  public void test1() {
    StopWatch stopWatch = new StopWatch();
    stopWatch.start();
    
    String result = StringUtils.EMPTY;
    for (int i = 0, n = MAX_COUNT; i < n; i++) {
      result = String.format("|%-5S|%-10S|%-15S|", "a", "bb", "ccc");
    }
    
    stopWatch.stop();
    System.out.println("String.format()    = " + result);
    System.out.println("Execution time     = " + stopWatch.toString());
    System.out.println();
  }
  
  /**
   * Formatter.format() test
   */
  @Test
  public void test2() {
    StopWatch stopWatch = new StopWatch();
    stopWatch.start();
    
    String result = StringUtils.EMPTY;
    for (int i = 0, n = MAX_COUNT; i < n; i++) {
      Formatter formatter = new Formatter();
      result = formatter.format("|%1$-5S|%2$-10S|%3$-15S|", "a", "bb", "ccc").toString();
    }
    stopWatch.stop();
    System.out.println("Formatter.format() = " + result);
    System.out.println("Execution time     = " + stopWatch.toString());
    System.out.println();
  }
  
  /**
   * StringUtils test
   */
  @Test
  public void test3() {
    StopWatch stopWatch = new StopWatch();
    stopWatch.start();
    
    String result = StringUtils.EMPTY;
    for (int i = 0, n = MAX_COUNT; i < n; i++) {
      result = '|' + StringUtils.rightPad(StringUtils.upperCase("a"),    5)
             + '|' + StringUtils.rightPad(StringUtils.upperCase("bb"),  10)
             + '|' + StringUtils.rightPad(StringUtils.upperCase("ccc"), 15)
             + '|';
    }
    stopWatch.stop();
    System.out.println("StringUtils        = " + result);
    System.out.println("Execution time     = " + stopWatch.toString());
    System.out.println();
  }
}

7 comments :

  1. I just wonder why don't you reuse the Formatter? thread-safe problem?

    ReplyDelete
    Replies
    1. Very good question - at the time the JavaDoc note for Formatter re thread safety caugh my eye ("Formatters are not necessarily safe for multithreaded access. Thread safety is optional and is the responsibility of users of methods in this class."), but diging into it again brings another issue - Formatter writes into a destination, in this case a StringBuilder, so reusing it is not an option as it appends into that destination.

      Another reason why I prefer StringUtils (and this goes for all Apache Commons classes) and I cannot recommend it enough, if you use Eclipse with Maven or if you manually download the sources JAR yourself, you can actually dig into the sources and see how it implemented. It also has so many useful methods, I recommend reading their JavaDoc pages:
      https://commons.apache.org/proper/commons-lang/apidocs/org/apache/commons/lang3/StringUtils.html

      Delete
    2. You could just download the Java source, I think that most IDE's support linking that, though the process is more manual than with maven.

      Delete
    3. You're perfectly right, Caleb!

      You can attach the src.zip (if available) in say Eclipse:
      http://www.mkyong.com/eclipse/eclipse-how-to-attach-jdk-source-code/

      Or you can explore the JDK sources from OpenJDK:
      http://www.oracle.com/technetwork/articles/javase/opensourcejdk-jsp-136417.html

      and it appears String.format() in fact employs Formatter.format() for the actual work...
      http://hg.openjdk.java.net/jdk6/jdk6/jdk/file/ffa98eed5766/src/share/classes/java/lang/String.java

      Delete
  2. Hi there,

    I ran your test and the results are absolutely unreliable!
    Any further method is executed quicker than previous!
    Hence if I interchange methods or even put them in a reverse order (like StringUtils then Formatter.format then String.format) then I'm getting totally different results! A first method is always the slowest and last is always the quickest.

    Could you try changing the method order?

    I'm running on: java version "1.6.0_35"
    Java HotSpot(TM) 64-Bit Server VM (build 20.10-b01)
    Windows 7

    ReplyDelete
    Replies
    1. Hello,

      You're absolutely right, timings go change in between runs, that is why so important to run tests repeatedly to see whether the pattern repeats.

      More importantly, as I described in the post, I wrote the test to get some proof on what possibly might have helped me to sqeeze some performance out of my code with little changes, then go back to make them in my application and see if these changes do have an impact in the real life production application - and in my case they did and were proved in a production environment.

      If I was to need a stonger proof before coding it my application, I would suggest changing the test to make the COUNT_MAX a much larger value and to avoid having to run the tests manually in a repeated manner, change the test code to declare and start the 3 StopWatches outside of a tests then resume and suspend them inside of the testX() methods (see the JavaDoc for commons-lang StopWatch). That way you can have a test method that runs the 3 tests any number of times and in any order you want to make sure order doesn't affect the final conclusion.

      In fact, I actually make those changes before writing this reply using MAX_COUNT=100,000 and calling tests in order 1,2,3,3,2,1,1,2,3,3,2,1 and I could confirm the results - 1.392s, 1.333s, 0.348s. The first two might slighy change in between runs, but the third one was definitely faster. Again, for what I needed at a time - a faster method for my particular scenario - it did the job in proving. I wasn't trying to prove that one Java build-in method was faster than the other nor they will always continue to be, it was just to help me solve a real problem for me.

      Delete
    2. Understood. I treated your article as the 3 represented options performance comparison. Was wrong.

      As for performance analysis, eventually I collated one method and got such performance gauging results per 100 000 000 iterations:

      Total execution time of String.format = 24570
      Average execution time of String.format = 24
      Total execution time of formatter.format = 27045
      Average execution time of formatter.format = 27
      Total execution time StringUtils.rightPad = 4138
      Average execution time StringUtils.rightPad = 4
      Total execution time of usual java concatenation = 1
      Average execution time of usual java concatenation = 0

      Definitely these results will vary depending on hardware, system load, java version etc. But seems like Java concatenation is the fastest
      To re-iterate: I used java 6, iCore i5 2.6 GHz, 16 GB RAM

      Delete