In this post we’ll look at five ways in which we can use efficient coding to help our garbage collector spend less CPU time allocating and freeing memory, and reduce GC overhead. Long GCs can often lead to our code being stopped while memory is reclaimed (AKA “stop the world”).
Some background
The GC is built to handle large amounts of allocations of short lived objects (think of something like rendering a web page, where most of the objects allocated become obsolete once the page is served).
The GC does this using what’s called a “young generation” – a heap segment where new objects are allocated. Each object has an “age” (placed in the object’s header bits) which defines how many collections it has “survived” without being reclaimed. Once a certain age is reached, the object is copied into another section in the heap called a “survivor” or “old” generation.
The process, while efficient, still comes at a cost. Being able to reduce the number of temporary allocations can really help us increase throughput, especially in high-scale applications.
Below are five ways we can write everyday code that’s more memory efficient, without having to spend a lot of time on it, or reducing code readability.
1. Avoid implicit Strings
Strings are an integral part of almost every data structure we manage. Being much heavier than other primitive values, they have a much stronger impact on memory usage.
One of the most important things to note is that Strings are immutable. They cannot be modified after allocation. Operators such as “+” for concatenation actually allocate a new String containing the contents of the strings being joined. What’s worse, is there’s an implicit StringBuilder object that’s allocated to actually do the work of combining them.
For example –
The compiler generates comparable code behind the scenes:
But it gets worse.
Let’s look at this example –
In this example we have 3 StringBuilders allocated in the background – one for each plus operation, and two additional Strings – one to hold the result of the second assignment and another to hold the string passed into the print method. That’s 5 additional objects in what would otherwise appear to be a pretty trivial statement.
Think about what happens in real-world code scenarios such as generating a web page, working with XML or reading text from a file. Nested within loop structures, you could be looking at hundreds or thousands of objects that are implicitly allocated. While the VM has mechanisms to deal with this, it comes at a cost – one paid by your users.
The solution: One way of reducing this is being proactive with StringBuilder allocations. The example below achieves the same result as the code above while allocating only one StringBuilder and one String to hold the final result, instead of the original five objects.
By being mindful of the way Strings and StringBuilders are implicitly allocated you can materially reduce the amount of short-term allocations in high-scale code locations.
2. Plan List capacities
Dynamic collections such as ArrayLists are among the most basic structures to hold dynamic length data. ArrayLists and other collections such as HashMaps and TreeMaps are implemented using underlying Object[] arrays. Like Strings (themselves wrappers over char[] arrays), arrays are also immutable. The obvious question then becomes – how can we add/put items in collections if their underlying array’s size is immutable? The answer is obvious as well – by allocating more arrays.
Let’s look at this example –
The value of len determines the ultimate length of items once the loop finishes. This value, however, is unknown to the constructor of the ArrayList which allocates a new Object array with a default size. Whenever the capacity of the internal array is exceeded, it’s replaced with a new array of sufficient length, making the previous array garbage.
If you’re executing the loop thousands of times you may be forcing a new array to be allocated and a previous one to be collected multiple times. For code running in a high-scale environment, these allocations and deallocations are all deducted from your machine’s CPU cycles.
The solution: Whenever possible, allocate lists and maps with an initial capacity, like so:
This ensures that no unnecessary allocations and deallocations of internal arrays occur at runtime as the list now has sufficient capacity to begin with. If you don’t know the exact size, it’s better to go with an estimate (e.g. 1024, 4096) of what an average size would be, and add some buffer to prevent accidental overflows.
3. Use efficient primitive collections
Current versions of the Java compiler support arrays or maps with a primitive key or value type through the use of “boxing” – wrapping the primitive value in a standard object which can be allocated and recycled by the GC.
This can have some negative implications. Java implements most collections using internal arrays. For each key/value entry added to a HashMap an internal objectis allocated to hold both values. This is a necessary evil when dealing with maps, which means an extra allocation and possible deallocation made every time you put an item into a map. There’s also the possible penalty of outgrowing capacity and having to reallocate a new internal array. When dealing with large maps containing thousands or more entries, these internal allocations can have increasing costs for your GC.
A very common case is to hold a map between a primitive value (such as an Id) and an object. Since Java’s HashMap is built to hold object types (vs. primitives), this means that every insertion into the map can potentially allocate yet another object to hold the primitive value (“boxing” it).
The standard Integer.valueOf method caches the values between 0 – 255, but for each number greater than that, a new object will be allocated in addition to the internal key / value entry object. This can potentially more than triple GC overhead for the map. For those coming from a C++ background this can really be troubling news, where STL templates solve this problem very efficiently.
Luckily, this problem is being worked on for next versions of Java. Until then, it’s been dealt with quite efficiently by some great libraries which provide primitive trees, maps and lists for each of Java’s primitive types. I strongly recommend Trove, which I’ve worked with for quite a while and found that can really reduce GC overhead in high-scale code.
4. Use Streams instead of in-memory buffers
Most of the data we manipulate in server applications comes to us in the form of files or data streamed over the network from another web service or a DB. In most cases, the incoming data is in serialized form, and needs to be deserialized into Java objects before we can begin operating on it. This stage is very prone to large implicit allocations.
The easiest thing to do usually is read the data into memory using a ByteArrayInputStream, ByteBuffer and then pass that on to the deserialization code.
This can be a bad move, as you’d need to allocate and later deallocate room for that data in its entirety while constructing new objects out of it . And since the size of the data can be of unknown size, you guessed it – you’ll have to allocate and deallocate internal byte[] arrays to hold the data as it grows beyond the initial buffer’s capacity.
The solution is pretty straightforward. Most persistence libraries such as Java’s native serialization, Google’s Protocol Buffers, etc. are built to deserialize data directly from the incoming file or network stream, without ever having to keep it in memory, and without having to allocate new internal byte arrays to hold the data as it grows. If available, go for that approach vs. loading the data into memory. Your GC will thank you.
5. Aggregate Lists
Immutability is a beautiful thing, but in some high-scale situations it can have some serious drawbacks. One scenario is when passing List objects between methods.
When returning a collection from a function, it’s usually advisable to create the collection object (e.g. ArrayList) within the method, fill it and return it in the form of an immutable Collection interface.
There are some cases where this doesn’t work well. The most noticeable one is when collections are aggregated from multiple method calls into a final collection. While immutability provides more clarity, in high-scale situations it can also mean massive allocation of interim collections.
The solution in this case would be not to return new collections, but instead aggregate values into a single collection that’s passed into those methods as a parameter.
Example 1 (inefficient) –
Example 2 –
Example 2, while disobeying the rules of immutability (which should normally be adhered to) can save N list allocations (along with any interim array allocations). In high-scale situations this can be a boon to your GC.
Additional reading
- String interning – http://plumbr.eu/blog/reducing-memory-usage-with-string-intern
- Efficient wrappers – http://vanillajava.blogspot.co.il/2013/04/low-gc-coding-efficient-listeners.html
- Using Trove – http://java-performance.info/primitive-types-collections-trove-library/
0 comments:
Post a Comment