Quick Links

Setting resource limits on your Kubernetes pods prevents an errant container from impacting other workloads. Kubernetes lets you cap resources, including CPU and memory consumption. Pods can be terminated when their limits are exceeded, maintaining the overall stability of the cluster.

Resource Units

Before defining limits, it's worth noting how Kubernetes expresses resource availability.

CPU consumption is measured in terms of vCPUs used. A limit of

        0.5
    

vCPUs indicates that the pod can consume half of the available time of one of the available vCPUs. A vCPU is what you'll see advertised on cloud providers' hosting pages. When using bare-metal hardware, it's one hyperthread on your processor.

Memory is measured in bytes. You can specify it as an integer number of bytes, or as a friendlier quantity, such as

        512Mi
    

or

        1Gi
    

.

Creating a CPU Limit

To add a CPU limit to pod containers, include the

        resources:limits
    

field in your container's manifest:

apiVersion: v1
    

kind: Pod

metadata:

name: demo

namespace: demo

spec:

containers:

- name: my-container

image: example/example

resources:

limits:

cpu: "0.5"

The example above will limit your containers to 0.5 vCPUs. They'll be throttled so that they cannot consume more than half of the available CPU time within a 100ms period.

Creating a Memory Limit

Memory limits are created in a similar way. Change the

        limits:cpu
    

field in the manifest to

        limits:memory
    

:

limits:
    

memory: "512Mi"

The container will be limited to 512Mi of RAM. Kubernetes will still allow it to access more if the node it's scheduled on has excess capacity. Otherwise, exceeding the limit will result in the container being marked as a candidate for termination.

Storage Limits

All Kubernetes nodes have an amount of ephemeral storage available. This storage is used by pods to store caches and logs. The ephemeral storage pool is also where the Kubernetes cluster keeps container images.

You can set up limits for a pod's ephemeral storage use. This is a beta feature intended to ensure that a single pod's cache can't consume the entire storage pool. Use the

        limits:ephemeral-storage
    

container manifest field:

limits:
    

ephemeral-storage: "1Gi"

This container would now be limited to using 1Gi of the available ephemeral storage. Pods that try to use more storage will be evicted. When there are multiple containers in a pod, the pod is evicted if the sum of the storage usage from all of the containers exceeds the overall storage limit.

Kubernetes usually tracks storage use by periodically scanning the node's ephemeral storage filesystem. It'll then sum the storage use of each pod and container. There is optional support for OS-level filesystem storage quotas, which enable more accurate monitoring.

You'll need a project quota-supported filesystem such as XFS or ext4. Make sure that the filesystem is mounted with project quota tracking enabled, then enable the

        LocalStorageCapacityIsolationFSQuotaMonitoring
    

feature flag in

        kubelet
    

. Guidance on configuring this system is provided in the Kubernetes documentation.

Resource Requests

In addition to resource limits, you can set resource requests. These are available for CPU, memory and ephemeral storage---change the

        limits
    

field to

        requests
    

in each of the examples shown above.

Setting a resource request indicates the amount of that resource that you expect the container will use. Kubernetes takes this information into account when determining which node to schedule the pod to.

Using memory as an example, a

        request
    

of

        512Mi
    

will result in the pod getting scheduled to a node with at least 512Mi of memory available. The availability is calculated by summing the memory requests of all the existing pods on the node and subtracting that from the node's total memory capacity.

A node will be ineligible to host a new container if the sum of the workload requests, including the new container's request, exceeds the available capacity. This remains the case even if the real-time memory use is actually very low. The available capacity has already been allocated to the existing containers to ensure that their requests can be satisfied.

Unlike a limit, Kubernetes always allows containers to exceed their resource request. They can consume any unused resource quantities that other containers have requested but are not currently using.

Using Requests and Limits

The differing behaviors of requests and limits mean that you should carefully consider the values that you use. It's usually best to keep requests low. You then set the limits as high as possible without affecting your workloads' ability to coexist.

Using a low resource request value gives your pods the best chance of getting scheduled to a node. The scheduler has more flexibility when making allocation decisions, as it's more likely that any given node will be able to host the container. The container will be provided with ready access to any excess resources it needs, beyond the request, up to the specified limit.

Each request and limit needs to be balanced in order to achieve the greatest effect. You should look at the requests and limits of the other pods running in your cluster. Make sure that you're aware of the total resource quantities provided by your nodes so that you don't set limits that are either too high (risking stability) or too low (a waste of capacity).

Conclusion

You should always set up resource limits for your Kubernetes workloads. Effective use of limits helps workloads peacefully coexist without risking the health of your cluster.

This is particularly important in the case of memory. Without limits, a container with an errant process can quickly consume all the memory offered by its node. Such an out-of-memory scenario could take down other pods scheduled to that node, as the OS-level memory manager would begin killing processes to reduce the memory use.

Setting a memory limit allows Kubernetes to terminate the container before it starts impacting other workloads in the cluster, let alone external processes. You lose your workload, but the overall cluster gains greater stability.