Should You Run Stateful Applications In Kubernetes?

Quick Links

The Problems With State

Running Stateful Services In Kubernetes

Managing State Outside of Kubernetes

Avoiding Kubernetes for Stateful Services

Should You Run Stateful Apps In Kubernetes?

Summary

Kubernetes is often approached from the perspective of stateless systems. A stateless application is easy to containerize, distribute, and scale because it doesn't need to store any data outside its environment. It doesn't matter if the container's stopped or moved to a different host - new instances can replace older ones without any repercussions.

Most real applications aren't like this though. All but the simplest systems possess state that's usually stored in a database or a persistent filesystem. Data that configures your service or is created by users must be retained and made accessible to all your containers, irrespective of where they're located.

The challenge of maintaining state across transient environments is encountered by most organizations using containers, orchestration, and cloud native working practices. Stateful workloads can be accommodated by Kubernetes but external alternatives exist too. In this article, you'll learn some of the approaches that make Kubernetes work with stateful apps.

The Problems With State

The term "state" describes the data associated with an application at a particular point in time. It is long-lived information such as database content and user accounts that will need to be retrieved throughout the system's lifetime. The state continually changes as data is created and modified while your service is in use.

Correct application functioning is dependent on each instance being able to access the persistent state. If you distribute four replicas of a component across two physical hosts, both of those machines will need access to your data store. This means the application instances have interlinked dependencies that can't be automatically replaced.

The constraints around stateful services conflict with the Kubernetes model of ephemeral containers that can be replaced at any time. When you're working with a stateful application, you need to make special provision so containers can reliably access the state they need. This requires additional configuration to provide reliable data persistence that remains stable as your application scales.

Running Stateful Services In Kubernetes

Kubernetes support for stateful systems has grown over the past few years, supported by an increase in community interest. Stateful applications can be assembled from officially supported resources such as stateful sets and persistent volumes. These offer integrated methods for storing and managing your data.

Persistent volumes provide data storage to your Pods. Files written to a persistent volume are stored independently of the Pod that creates them. The volume's content persists in your cluster after the Pods are destroyed, allowing their replacements to access the stored state.

StatefulSets are API objects that represent stateful applications. They function similarly to Deployments but assign a unique identifier to each Pod they encapsulate. Pods retain their identifiers even if they're restarted or scheduled onto another Node. This allows you to implement procedures where Pod ordering and identity is important. The reliable identifiers let you rematch volumes to Pods after a scheduling event and gracefully rollout application updates in sequence.

These features mean it's now possible to run stateful applications inside your Kubernetes cluster. You can write data to persistent volumes and use StatefulSets instead of Deployments when Pods need to remember their identities.

Managing State Outside of Kubernetes

A popular route for running stateful services in Kubernetes is to locate the state outside your cluster. You architect your system so that its components are decoupled from the storage they require. They can access persistent data in separate services over the network.

You can maintain your own database server, connect to existing network file shares, or use a fully managed service from your cloud provider. The applications in your Kubernetes cluster should be configured to interact with your storage systems using their APIs or direct access protocols.

This is a good way of promoting decoupling in your services. Removing persistent filesystem access from your containerized applications makes them more portable across environments. Containers can be launched using stateless deployment models as their storage dependencies are reduced to basic network calls. You can benefit from the flexibility of Kubernetes without incurring the complexity cost of using persistent volumes and stateful sets to store state in your cluster.

Avoiding Kubernetes for Stateful Services

A third school of thought is to avoid Kubernetes altogether for stateful services. This is usually an over-reaction - if you're not comfortable maintaining state in your cluster, you can still use the method outlined above to deploy your applications using an adjacent storage provider.

Nonetheless there are still some systems which might not make sense in Kubernetes. Extremely filesystem-dependent architectures which work with large numbers of files could be challenging to implement and scale using persistent volumes. An external storage system managed alongside Kubernetes might add unacceptable latency when file interactions are the core function of your service.

In these circumstances you may have to reach for alternative deployment approaches that give you more control of data storage and I/O operations. However work is ongoing in the ecosystem to enhance the storage options for containerized systems. Cloud native storage solutions are emerging as higher-level abstractions of concepts like persistent volumes and stateful sets, implementing distributed filesystems that remain performant when used across multiple nodes. Ceph, Minio, and Portworx are some of the contenders in this space.

Should You Run Stateful Apps In Kubernetes?

Most stateful applications can be deployed without issues using Kubernetes. The principal decision is whether you keep your persistent data inside your cluster, by using persistent volumes and stateful sets, or interface with an externally managed data store.

Persistent volumes work for most use cases but they do come with some limitations. Not all volume access modes are supported by every implementation so it's important to check which features your Kubernetes distribution supports.

Relatively few drivers offer the

        ReadWriteMany

mode which permits the volume to be bound to several Nodes simultaneously, with each of them able to read and write data. The

        ReadWriteOnce

mode is the most broadly supported, allowing each Node to read data but only one of them to write. These constraints can affect your application's scheduling - a system with several Pods that need to write to a shared database instance will need to run them all on a single Node, unless

        ReadWriteMany

is available. This limits your ability to scale your services.

Utilizing an externally managed database or object storage system is an effective way to mitigate these lingering issues while still benefiting from the flexibility of Kubernetes. This does require your application to be fully decoupled from its storage so it might not be an option if you're migrating a legacy service.

Working with older applications presents the strongest case for not running a stateful app in Kubernetes. You can run into roadblocks if you're unable to be intentional about where state is stored and how it's managed. In these situations it's usually best to refactor your system before you try to distribute it across deployment nodes.

Summary

Although stateful applications and Kubernetes aren't quite a natural match, it's possible to accommodate persistent data in your cluster by combining stateful sets and persistent volumes. These provide officially supported methods for orchestrating stateful systems using Kubernetes but you need to remain mindful of the scheduling constraints they impose.

Because in-cluster state management adds complexity, keeping persistent data in an external service is a popular way to streamline your deployments. Managed databases, object storage platforms, and private networks allow you to provision storage outside your cluster then securely consume it from within. You'll need to adapt your application to support external storage interfaces but can then benefit from increased deployment flexibility.

Applications where the state consists of simple config files can utilize ConfigMaps to run in Kubernetes without having to adopt persistent file storage. ConfigMaps are first-class objects which are automatically provided to your Pods when they're needed, either as environment variables or mounted files. They remove the need for persistent volumes when you're only storing a handful of long-lived settings.