For years, the Kubernetes community has been on a mission to improve the stability and performance predictability of the API server.
A major focus of this effort has been taming list requests, which have historically been a primary source of high memory usage and heavy load on the etcd datastore.
With each release, we've chipped away at the problem, and today, we're thrilled to announce the final major piece of this puzzle.
The snapshottable API server cache feature has graduated to Beta in Kubernetes v1.34, culminating a multi-release effort to allow virtually all read requests to be served directly from the API server's cache.
The path to the current state involved several key enhancements over recent releases that paved the way for today's announcement.
While the API server has long used a cache for performance, a key milestone was guaranteeing consistent reads of the latest data from it. This v1.31 enhancement allowed the watch cache to be used for strongly-consistent read requests for the first time, a huge win as it enabled filtered collections (e.g. "a list of pods bound to this node") to be safely served from the cache instead of etcd, dramatically reducing its load for common workloads.
Another key improvement was tackling the problem of memory spikes when transmitting large responses. The streaming encoder, introduced in v1.33, allowed the API server to send list items one by one, rather than buffering the entire multi-gigabyte response in memory. This made the memory cost of sending a response predictable and minimal, regardless of its size.
Despite these huge improvements, a critical gap remained. Any request for a historical LIST—most commonly used for paginating through large result sets—still had to bypass the cache and query etcd directly. This meant that the cost of retrieving the data was still unpredictable and could put significant memory pressure on the API server.
The snapshottable API server cache solves this final piece of the puzzle. This feature enhances the watch cache, enabling it to generate efficient, point-in-time snapshots of its state.
Here’s how it works: for each update, the cache creates a lightweight snapshot. These snapshots are "lazy copies," meaning they don't duplicate objects but simply store pointers, making them incredibly memory-efficient.
When a list request for a historical resourceVersion arrives, the API server now finds the corresponding snapshot and serves the response directly from its memory.
This closes the final major gap, allowing paginated requests to be served entirely from the cache.
With this final piece in place, the synergy of these three features ushers in a new era of API server predictability and performance:
The result is a system where the resource cost of read operations is almost fully predictable and much more resiliant to spikes in request load.
This means dramatically reduced memory pressure, a lighter load on etcd, and a more stable, scalable, and reliable control plane for all Kubernetes clusters.
With its graduation to Beta, the SnapshottableCache feature gate is enabled by default in Kubernetes v1.34. There are no actions required to start benefiting from these performance and stability improvements.
Special thanks for designing, implementing, and reviewing these critical features go to:
...and many others in SIG API Machinery. This milestone is a testament to the community's dedication to building a more scalable and robust Kubernetes.