This articles references two blogs posted on TopTal blog:
- Spring Batch Tutorial: Batch Processing Made Easy with Spring
- Caching in Spring with EhCache Annotations
Motivation
There are a lot of resources on Batch processing in Spring and also on cache support in Spring. This post explores, how these two areas combine.
All examples are derived from a real world use case scenario from an reinsurance company. The company maintains a huge proprietary data store. We developed batch jobs, which transformed source data into its derivations. These derivations were mostly used for more efficient reporting. Reporting departments depended on our work. Once a transformation was specified and ordered, our batch jobs were expected to provide results within hours, ideally on the same working day.
The most important strategy, which we used was caching. We performed various operations on input data. Some of them very cheap and simple calculations. Other ones however, were expensive lookups in various dictionaries, mostly in a database. The database was very well optimized and was able to perform a lookup in order of few milliseconds or tenths of milliseconds. However, we processed billions of records and this could lead to execution times of hours, days and weeks.
It was only natural to avoid repeated lookups by employing caching. Let’s explore problems and strategies which we encountered.
Difference to online services
Before we proceed, we need to realize, what we expect from caching.
When developing an online service, users are generally very satisfied with response times of e.g. 50ms. If the result depends on an expensive operation, which takes e.g. 500ms to load, than we cache it. We also do not expect the cache to grow too much. We use any kind of cache, that comes out of the box with the framework.
However, in batch applications, we often process billions of records. We generally needs to process a record in fraction of milliseconds. This means, that the lookup in the cache itself must be just as fast. Due to the volume of processed data, the cache grows fast and needs to be limited. Usually, the cache, that comes out of the box, does not have these properties.
Example batch job
Our example batch job will calculate checksum of rows of pseudorandom words. For every integer (called “word number”), we run a simple deterministic algorithms, which creates a seemingly random word:
2277 -> popczbuiyyzhndkxynrqohumbzcyndtdtkievvtqruigkptopwuknzjeb 248053 -> jylqesjiwatntiaozbsifogqykayrsavbgkxjycdxuvlceizlgstmclyh 668659 -> cpkltqvxeubxczkkeluwmtcohzbjbgtiobghkbwiwfudkahqdmdvmlocl
These words are generated by a WordGenerator
and the operation represents an expensive lookup in a dictionary:
Next, we will generate pseudorandom rows of words. These represent flat files, which contained source data in our batch processes. Our example rows have fixed cardinalities. First column will only contain word number ranging from 0 to 10. Second column is ranging from 0 to 100. N-th column is ranging from 0 to Nth power of 10. This setup is similar to flat files, which we encountered in real production. Some columns, contained only few values (e.g. years from 1960 till 2018). Other columns contained broad range of values (e.g. bank transaction IDs):
Here is an example of such flat file:
[1, 65, 961, 5313, 98361, 180915] [9, 57, 406, 8290, 69353, 333818] [4, 45, 225, 2156, 73740, 467670] [8, 29, 769, 558, 44227, 582468] [3, 69, 587, 4424, 48614, 199967] [1, 60, 33, 7401, 19605, 352871] [6, 48, 851, 1267, 40345, 486722] [0, 32, 43, 9670, 94479, 601521] [5, 72, 214, 3535, 15218, 219020] [3, 64, 307, 6513, 69858, 371924] [9, 52, 478, 378, 90597, 505775] [2, 88, 670, 5133, 44732, 104221] [8, 76, 488, 8999, 65471, 238073]
Our batch process will calculate sum of hashcodes of all words (not word numbers)
Implementation in Spring batch
The source code of the implementation is located on GitHub.
The configuration of Spring batch is very simple and consists only of a single type step:
RowItemReader
simply generates the flat file with word numbers.
ChecksumProcessor
converts every word number from the row into a word and calculates sum of all word hashcodes:
CheckSumWriter
adds up checksums from one chunk, into what is going to be the checksum of the whole flat file.
Finally, StepExecutionListener
collects and prints results and statistics of the step.
To compare various strategies, our job will process the same flat file 8 times. This is represented by 8 steps, each of them configured with a different variation of ChecksumProcessor
:
Results
First, we will execute all steps in an Java 8 JVM, with 4GB of heap space.
Plain
This step uses no caching at all. It gives us a reference speed of how fast we can process records without any optimization.
Executing step: [Plain] Finished step. Word source: Plain generator. Processor: Simple processor Speed: 67474.11 rows/sec
Spring cached
This step uses a dedicated bean managed by Spring. This beans delegates calls to a plain word generator, but with having this invocation cached using @Cacheable
annotation:
Results:
Executing step: [Spring cached] Finished step. Word source: Cached by Spring. Processor: Simple processor Word source statistics: No statistics Speed: 83101.34 rows/sec
We can see only a slight increase of performance. This is due to the fact, that we did not configure this cache at all. First of all, you really need to set up a suitable cache size. All caches in following steps will use size of max 200000 entries. Default Spring cache might be too small. It is necessary to gain access to CacheManager
and configure the cache there. CacheManager
also provides statistics on how the cache performed.
There are other things to look out for. For example, our real world setup was using a version of Spring, which created JDK proxies to enable caching annotations. Time measurements and profiling sessions confirmed, that a considerable amount of time was being used in JDK proxies. The Spring version used in our example uses cglib. This should enable JVM’s JIT compiler to inline the caching advice more effectively.
Another problem with default caches was generation of keys. A key consisted of all parameters of cached method and was constructed by reflection. This was another source of delay.
We can only guess, why the default cache did not enable better performance. In a future article, we will explore, how Spring’s caching can be configured for higher performance.
HashMap
This step uses JRE’s HashMap
. The size of the map is unlimited.
Executing step: [HashMap] Finished step. Word source: Cached manually - HashMap. Processor: Simple processor Word source statistics: Hit count: 11154589, miss count: 845411 Speed: 290824.49 rows/sec
We see a huge improvement in speed. Processor is accessing the map directly, without any proxies. This enables JVM’s JIT compiler to fully inline access to the cache.
However, if we run this step with JVM’s option -Xmx128m
, we would soon run into OutOfMemoryError
. And this is simply not an option. We observed, that OOME in production would bring down the whole infrastructure. Many components of the batch execution framework simply could not recover. We had to restart our JVM’s and middleware services. This required a lot of bureaucracy and took a lot of waiting time.
For this reason, all caches had to be limited in size.
LinkedHashMap, Guava, EhCache
We measured three caches with a maximum size fixed to 200000 elements.
Executing step: [LinkedHashMap] Finished step. Word source: Cached manually - LimitedLinkedHashMap. Processor: Simple processor Word source statistics: Hit count: 10189804, miss count: 1810196 Speed: 184842.88 rows/sec
Executing step: [Guava] Finished step. Word source: Cached manually - Guava. Processor: Simple processor Word source statistics: CacheStats{hitCount=10190217, missCount=1809783} Speed: 135722.04 rows/sec
Executing step: [EhCache] Finished step. Word source: Cached manually - EhCache. Processor: Simple processor Word source statistics: Hits: 10297676 Misses: 1702324 Speed: 117240.17 rows/sec
All of them obviously performed worse, than an unlimited map. It would be interested to see, why Guava and EhCache performed worse than LinkedHashMap
.
LinkedHashMap(smart)
You might have already observed, that the function (Word number) -> Word
is not the ideal candidate for caching. After all, we are not interested in the Word itself. We only need its hashcode. In light of this, this step uses LinkedHashMap
, but with capacity 20 times larger than other caches. We can afford this size, because instead of long strings, we cache only simple integers.
Executing step: [LinkedHashMap(smart)] Finished step. Word source: Plain generator. Processor: Cached processor Word source statistics: No statistics Processor statistics: Hit count: 11154589, miss count: 845411 Speed: 448833.03 rows/sec
As expected, this strategy enabled highest speed.
HashMap(soft)
This step is using a very unusual cache. It is again a simple unlimited HashMap
, but referenced using Java’s SoftReference
. Let’s execute it using -Xmx128m
:
Executing step: [HashMap(soft)] Cache has been garbage collected. Creating empty one. Cache has been garbage collected. Creating empty one. Cache has been garbage collected. Creating empty one. Cache has been garbage collected. Creating empty one. Cache has been garbage collected. Creating empty one. Cache has been garbage collected. Creating empty one. Cache has been garbage collected. Creating empty one. ----------------------------------------------------------------------- Finished step. Word source: Cached manually - HashMap - soft reference GC count: 7 . Processor: Simple processor Word source statistics: Garbage collected: 7 times. Last cache statistics: Hit count: 360050, miss count: 122464 Speed: 172324.66 rows/sec
Usage of SoftReference
allowed JVM to garbage collected the whole HashMap
. When we found out, we simply created a new empty one. It happened 7 times and we still got a slightly better performance than with Guava or EhCache.
We used this strategy frequently, mostly for defensive reasons. As already mentioned, an OOME crashed whole infrastructure and this was the way how to prevent it. We made educated guesses on how large caches we could afford and we adjusted these estimates further during runtime.
You might ask yourself, why not wrap every value in a separate soft reference. Java comes with a WeakHashMap
but not with a SoftHashMap
. Guava’s CacheBuilder
comes with option for soft values. This approach is not recommended for our use case. A huge number of soft references comes with performance hits. It is therefor preferable to soft reference whole segments of cache or the whole cache itself.
Conclusion
We explored a type of batch application, which processes a huge number of very small input records. We showed, that the most efficient caching can be achieved by avoiding any kinds of AOP proxies and by avoiding reflection based key generation.
In scenarios, where input data is transformed several times, it is advisable to put cache around function compositions, which have smallest possible key sizes and smallest possible value sizes.
If size of caches cannot be determined exactly to prevent OOME errros, than caches referenced via soft references represent a useful defensive mechanism.