Java Compiler String Addition

Lately I have seen quite a few misconceptions regarding how string concatenation is handled in the Java world so I would like to write this short blog entry with a couple of stupid examples in order to show the basics of how it is done.

Note: I am focusing here in the bytecode generated by the Java compiler, regardless any optimization the runtime could apply.

Concatenating constant strings

Imagine we are writing a new class where we have defined three final string fields and we want to add a new method which just returns the sum of the three:

public class StringConcatenation {

        public final String A = "A", B= "B", C= "C";

        public String concatFinalStrings() {
                return A + B + C;
        }
}

Taking a look to the bytecode generated by the Java compiler we can see the following (I have included only the relevant parts of the command java -v for the sake of clarity):

Constant pool:
   ...
   #9 = String             #49            //  ABC
   ...

  public java.lang.String concatFinalStrings();
    descriptor: ()Ljava/lang/String;
    flags: ACC_PUBLIC
    Code:
      stack=1, locals=1, args_size=1
         0: ldc           #9                  // String ABC
         2: areturn
      LineNumberTable
}

The previous snippet shows how the Java compiler will generate for us a new constant (position 9 of the constants pool) containing the concatentation of the three strings, and the implementation of the method is just an ldc instruction. As far as I remember, the JLS specifies that compile time constants should end up in an internted string

For the reader: remove the final modifier at the fields declaration and analyze the generated bytecode. Do you see any difference?

Concatenating non constant strings

The example shown at the previous section could be very unrealistic for many of you so let’s see if we can get something more interesting in place. Now we are planning to add a new method which just invokes three other methods and links together the results , like this:

public class StringConcatenation {

        public String concatVariableStrings() {
                return getA() + getB() + getC();
        }

        public String getA() { return "A"; }
        public String getB() { return "B"; }
        public String getC() { return "C"; }
}

Taking a closer look at the generated bytecode (again just included the most important sections):

  public java.lang.String concatVariableStrings();
    descriptor: ()Ljava/lang/String;
    flags: ACC_PUBLIC
    Code:
      stack=2, locals=1, args_size=1
         0: new           #8                  // class java/lang/StringBuilder
         3: dup
         4: invokespecial #9                  // Method java/lang/StringBuilder."<init>":()V
         7: aload_0
         8: invokevirtual #12                 // Method getA:()Ljava/lang/String;
        11: invokevirtual #10                 // Method java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/lang/StringBuilder;
        14: aload_0
        15: invokevirtual #13                 // Method getB:()Ljava/lang/String;
        18: invokevirtual #10                 // Method java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/lang/StringBuilder;
        21: aload_0
        22: invokevirtual #14                 // Method getC:()Ljava/lang/String;
        25: invokevirtual #10                 // Method java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/lang/StringBuilder;
        28: invokevirtual #11                 // Method java/lang/StringBuilder.toString:()Ljava/lang/String;
        31: areturn
      LineNumberTable:
        line 12: 0

we can see the Java compiler is creating a StringBuilder under the hood, so no temporary String objects are allocated in order to compose the final string which needs to be returned from the method. If you need to improve the performance, you should write the builder by your own, setting the initial size in the constructor.

Maybe this is not new for you (as it should be), but, lately, I have received quite a few questions regarding this behaviour.

Environment: I have used javac and javap 1.8.0_05 in OSx to compile the examples.

For the curious

If you “port” the previous examples to Scala and take a look to the generated bytecode (of the second example), you will see that the Scala compiler will use the same approach shown before; generating a StringBuilder (scala.collection.mutable.StringBuilder) under the covers.