Compact Strings in Java 9 with Examples
Compact String is one of the performance enhancements introduced in the JVM as part of JDK 9. Till JDK 8, whenever we create one String object then internally it is represented as char, which consist the characters of the String object.
What is the need of Compact String?
- Till JDK 8, Java represent String object as char because every character in java is of 2 bytes because Java internally uses UTF-16.
- If any String contains a word in the English language then the character can be represented using a single byte only, we don’t need 2 bytes for each character. Many characters require 2 bytes to represent them but most of the characters require only 1 byte, which falls under LATIN-1 character set. So, there is a scope to improve memory consumption and performance.
- Java 9 introduced the concept of compact Strings. The main purpose of the compact string is whenever we create a string object and the characters inside the object can be represented using 1 byte, which is nothing but LATIN-1 representation, then internally java will create one byte. In other cases, if any character requires more than 1 byte to represent it then each character is stored using 2 bytes i.e. UTF-16 representation.
- Thats how Java developers changed the internal implementation of String i.e. known as Compact String, which will improve the memory consumption and performance of String.
String class internal implementation before Java 9:
Java 8 or before
Note: In the above program, we can see that before Java 9, Java represent String object as a char only. Suppose we create one String object and object contains the characters which can be represented using 1 byte. Instead of representing the object as byte it will create char only, which will consume more memory.
JDK developers analyzed that most of the strings can be represented only using Latin-1 characters set. A Latin-1 char can be stored in one byte, which is exactly half of the size of char. This will improve the performance of String.
String class internal implementation from Java 9
Java 9 and after
Note: Now the question is how will it distinguish between the LATIN-1 and UTF-16 representations? Java developers introduced one final byte variable coder that preserves the information about characters representation. The value of coder value can be:
static final byte LATIN1 = 0; static final byte UTF16 = 1;
Thus, the new String implementation known as Compact String in Java 9 is better than String before Java 9 in terms of performance because Compact String uses approximately the half area as compared with String in the heap from JDK 9.
Let’s see the difference of the memory used by a String object before Java 9 and from Java 9:
Key points to note when we are running on Java 8 or earlier:
- Here, we created a String object with 13 characters and characters inside the object can be represented using 1 byte, which is nothing but LATIN-1 representation.
- If we run the above program with JDK version 8 or earlier then As JDK 8 uses UTF-16 as default, Internally String will be represented as char.
- Here we don’t need char, we can represent each character with 1 byte only. Instead of creating byte, char will be created and for each character, 2 bytes are assigned in the heap memory. This is nothing but wastage of heap memory.
Key points to note when we are running on Java 9:
- From Java 9 as per need char or byte will be created for String objects. Here as we can see we created String object s1 with 13 characters and object s2 with 14 characters.
- Each character present inside object s1 can be represented using 1 byte only. That’s why for object s1, one byte will be created.
- Now for s2, we have one additional character apart from the characters present in object s1 i.e. €. We cant represent € character using LATIN-1 character set. Here we need 2 bytes to represent €. That’s why here Java will use UTF-16 to present the characters represent inside s2.
- For object s2, Internally char will be created.
- This is how the new String implementation known as Compact String in Java 9 is better than String before Java 9 in terms of memory consumption and performance.