String - Best Pratices


Never, never, never use the String constructor
Whenever you want to instantiate a String object, never use its constructor but always instantiate it directly. For example:
//slow instantiation
             String slow = new String("Yet another string object");
            
             //fast instantiation
             String fast = "Yet another string object";
From the Javadoc:
“Unless an explicit copy of original is needed, use of this constructor is unnecessary since Strings are immutable.”

Pattern and regular expressions
For performance reasons, instances of pattern (java.util.regex.Pattern) should be used as a final and static class variable because each pattern compilation is thread-safe. So, do not instantiate a new Pattern in your methods, compile them once and just share them.
From the Javadoc:
« All of the state involved in performing a match resides in the matcher, so many matchers can share the same pattern. […]Instances of this class are immutable and are safe for use by multiple concurrent threads. Instances of the Matcher class are not safe for such use. »

String replacement
It is not advisable to use the string replacement method like chaine.replaceAll ("regex", "replacement") that instantiate a new Pattern for each string replacement, penalizing CPU and memory.
From the Javadoc:
“An invocation of this method of the form str.replaceAll(regex, repl) yields exactly the same result as the expression Pattern.compile(regex).matcher(str).replaceAll(repl)”

Use the method equalsIgnoreCase ()
Do not use the equals method using the methods toLowerCase / toUpperCase in order to avoid unnecessary strings copies but rather encourage the method equalsIgnoreCase. So, do not:
if(("test".equals("TEST".toLowerCase())))
but:
if(("test".equalsIgnoreCase("TEST")))

String concatenation
Always carefully use Strings in your code. A simple concatenation of strings can reduce performance of program. For example if we concatenate strings using + operator in a for loop then everytime + is used, it creates a new String object. This will affect both memory usage and performance time.

Compile time resolution vs runtime resolution
You can concatenate multiple strings using either + operator or String.concat()  or   StringBuffer/StringBuilder.append(). But which is the best one in terms of performance? In general, programmers think that StringBuffer/StringBuilder.append() is better than + operator or String.concat() method. But this assumption is not true under certain conditions.

The JVM compiler does a good job of optimization. It does compile time resolution instead of runtime resolution, this happens when you create a String object using 'new' key word.
Before compilation:
String result = "This is"+"testing the"+"difference"+"between"+"String"+"and"+"StringBuffer/StringBuilder ";
After compilation:
String result = "This is testing the difference between String and StringBuffer";

String object is resolved at compile time whereas StringBuffer/StringBuilder object is resolved at runtime. Runtime resolution takes place when the value of the string is not known in advance whereas compile time resolution happens when the value of the string is known in advance. Here is an example:
Before compilation:
public String getString(String str1,String str2) {
            return str1+str2;
}
After compilation:
           return new StringBuffer().append(str1).append(str2).toString();
This resolves at runtime and takes much more time to execute.

StringBuilder or StringBuffer, but set an initial size to get best performance for String concatenation
String concatenation should be done with StringBuilder in a multithreaded environment, not with StringBuffer containing synchronized methods and thus blocking.

In addition, if the approximate or minimum size of the chain can be predicted, it must be specify in the constructor to avoid unnecessary memory reallocations: StringBuffer/StringBuilder maintains a character array internally. When you create StringBuffer/StringBuilder with default constructor StringBuffer() without setting initial length, then the StringBuffer/StringBuilder is initialized with 16 characters. The default capacity is 16 characters. When the StringBuffer/StringBuilder reaches its maximum capacity, it will increase its size by twice the size plus 2 ( 2*old size +2).

If you use default size, initially and go on adding characters, then it increases its size by 34(2*16 +2) after it adds 16th character and it increases its size by 70(2*34+2) after it adds 34th character. Whenever it reaches its maximum capacity it has to create a new character array and recopy old and new characters. It is obviously expensive. So it is always good to initialize with proper size that gives very good performance.

Templating
Working with templates is a good idea, it avoids heavy strings concatenation and looks pretty for the non-geeks people because they can see these templates in files or a DB. But working with a template engine to render the result is better. StringTemplate is one of them. From the StringTemplate home page:
“StringTemplate is a java template engine for generating source code, web pages, emails, or any other formatted text output.”

For instance:
StringTemplateGroup templateGroup = new StringTemplateGroup("spam group", "templates");
             StringTemplate spam419 = templateGroup.getInstanceOf("spam419"); // loading the file spam419.st
             spam419.setAttribute("hubLabel ", hubLabel);
             spam419.setAttribute("senderfullname ", sender.getFullName());

No more message.replace(String, String) and Pattern instantiation use, and a heavy perf increase!

Using loggers
The string building in arguments of methods debug and info must always be protected by using methods isDebugEnabled () or isInfoEnabled (). In order to logs some entries, we use log4j in this way:
if(logger.isDebugEnabled()){
         logger.debug("Info : x = " + info.getX() + ", y = " + info.getY() + ", str = " + infos.getStr());
}
This kind of code is not at all aesthetic and quickly becomes heavy. All the more since the test is done in the debug method which print nothing if the level is not enabled. So why do the work twice? SLF4J proposes a new alternative method equivalent to the Java printf:
logger.debug("Info : x = {}, y = {}, str = {}", new Object[]{info.getX(), info.getY(), infos.getStr()});

Better huh?

Just remember that SLF4J is an API and supports various logging frameworks, like the classic log4j and the out-performed logger logback for example.

Source:
http://download.oracle.com/javase/7/docs/api/java/lang/String.html
http://download.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html
http://www.slf4j.org/manual.html
http://www.stringtemplate.org/
https://github.com/SpringSource/greenhouse/tree/master/src/main/java/org/springframework/templating
http://www.precisejava.com/javaperf/j2se/StringAndStringBuffer.htm

Labels: ,