In this blog post, the engineering team at Kelley Blue Book – which is part of the Cox Automotive family of brands – provides a behind-the-scenes peek into the technology that powers one of the most trusted resources in the automotive industry.
On the Kelley Blue Book (KBB) engineering team, we highly value the performance and availability of our applications to support our heavily trafficked consumer site.
Fortunately for us, these applications require only a relatively moderate amount of primarily read-only data, in the hundreds of thousands of data points for most of the core functionality across several high traffic areas. To achieve this core functionality, we do not often need to create or update existing information.
Our needs in a data store
Given our high availability and performance goals and our desire to ensure data consistency across systems, a highly-performant distributed data-store with great read capabilities seemed like the best match for our needs. Because we wanted our solution to be easily usable for multiple teams with slightly different data requirements, we elected to use a distributed cache approach, with independent caches allocated to each application to ensure separation of concerns.
The service we chose
Our applications run on AWS, so we chose to use ElastiCache, which provides the ability to choose from either Memcached or Redis. We ended up crafting our common codebase to be able to use either one, which gives our teams the ability to make the choices that work best, but the teams primarily ended up going with Memcached.
We have been using ElastiCache as a key-value store, where the values are JSON-encoded objects and the keys are unique, for instance matching the API endpoint that returns the data. This key syntax made it easy for us to write a simple function which allows us to utilize read-through caching, where our cache sits in front of the data store and we will make a request to the data store if the key is not found in the cache.
The benefits
This approach provides several benefits beyond just decreasing the response time of the current request. These additional benefits include reducing the strain on our APIs, improving our response times for subsequent requests and ensuring availability of previously requested data for the applications even if the underlying data store becomes unavailable or suffers from degraded performance. As shown in the graphic below, our response times are significantly higher when the cache is disabled. In fact, the average response time is almost doubled when the cache is disabled.
Pre-loading commonly requested fields
We have a good knowledge of which data will be most frequently utilized in our systems. As such, we chose to preload our high-level data, such as the details of each basic vehicle configuration. For requests that are likely not to be repeated, such as the pricing information for a fully configured vehicle for a specific location, we’ll make that request on demand.
We set a TTL (time to live) of the items in the cache to 8 hours and will “pre-cache” items every 6 hours, or whenever the data version for underlying data changes. We pre-cache anything that is likely to be requested multiple times. We have considered a few architectures for our approach to pre-caching, but for a few years now, we have been using AWS step functions, breaking different categories of data into different steps. We first run all the base steps, then we run the next set in parallel. For each step, we run those in parallel up to a maximum concurrency to ensure we fall below throttling limits.
By using this approach to storing and reading our data, our response times have decreased significantly, which helps our visitors search for their vehicle information approximately 50% faster than before. We’re really excited about the performance gains we have been able to make so far, and are looking forward to sharing more about our next set of improvements in a future blog post.