Kubernetes is a distributed system that’s designed to scale replicas of your services across multiple physical environments. In many cases this works well out-of-the-box. The Kubernetes scheduler automatically places your Pods (container instances) onto Nodes (worker machines) that have enough resources to support them.
Despite its best efforts, sometimes the scheduler won’t select a plan you agree with. You might want Pods to be colocated if they’ll be regularly communicating over the network; alternatively, some compute-intensive Pods might be best allocated to separate Nodes wherever possible.
Kubernetes has several mechanisms which let you guide the scheduler’s decision-making process so Pods end up on particular Nodes. In this article, we’ll focus specifically on the “affinity” and “anti-affinity” concepts that give you granular control of scheduling. Affinities define rules that either must or should be met before a Pod can be allocated to a Node.
How Does Affinity Work?
Affinities are used to express Pod scheduling constraints that can match characteristics of candidate Nodes and the Pods that are already running on those Nodes. A Pod that has an “affinity” to a given Node is more likely to be scheduled to it; conversely, an “anti-affinity” makes it less probable it’ll be scheduled. The overall balance of these weights is used to determine the final placement of each Pod.
Affinity assessments can produce either hard or soft outcomes. A “hard” result means the Node must have the characteristics defined by the affinity expression. “Soft” affinities act as a preference, indicating to the scheduler that it should use a Node with the characteristics if one is available. A Node that doesn’t meet the condition will still be selected if necessary.
Types of Affinity Condition
There are currently two different kinds of affinity that you can define:
- Node Affinity – Used to constrain the Nodes that can receive a Pod by matching labels of those Nodes. Node Affinity can only be used to set positive affinities that attract Pods to the Node.
- Inter-Pod Affinity – Used to constrain the Nodes that can receive a Pod by matching labels of the existing Pods already running on each of those Nodes. Inter-Pod Affinity can be either an attracting affinity or a repelling anti-affinity.
In the simplest possible example, a Pod that includes a Node Affinity condition of
label=value will only be scheduled to Nodes with a
label=value label. A Pod with the same condition but defined as an Inter-Pod Affinity will be scheduled to a Node that already hosts a Pod with a
Setting Node Affinities
Node Affinity has two distinct sub-types:
requiredDuringSchedulingIgnoredDuringExecution– This is the “hard” affinity matcher that requires the Node meet the constraints you define.
preferredDuringSchedulingIgnoredDuringExecution– This is the “soft” matcher to express a preference that’s ignored when it can’t be fulfilled.
IgnoredDuringExecution part of these verbose names makes it explicit that affinity is only considered while scheduling Pods. Once a Pod has made it onto a Node, affinity isn’t re-evaluated. Changes to the Node won’t cause a Pod eviction due to changed affinity values. A future Kubernetes release could add support for this behavior via the reserved
Node affinities are attached to Pods via their
spec.affinity.nodeAffinity manifest field:
apiVersion: v1 kind: Pod metadata: name: demo-pod spec: containers: - name: demo-container # ... affinity: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: hardware-class operator: In values: - a - b - c - matchExpressions: - key: internal operator: Exists
This manifest creates a hard affinity rule that schedules the Pod to a Node meeting the following criteria:
- It has a
hardware-classlabel with either
cas the value.
- It has an
internallabel with any value.
You can attach additional conditions by repeating the
matchExpressions clause. Supported operators for value comparisons are
Gt (greater than), and
Lt (less than).
matchExpression clauses grouped under a single
nodeSelectorTerms clause are combined with a boolean
AND. They all need to match for a Pod to gain affinity to a particular Node. You can use multiple
nodeSelectorTerms clauses too; these will be combined as a logical
OR operation. You can easily assemble complex scheduling criteria by utilizing both of these structures.
“Soft” scheduling preferences are set up in a similar way. Use
nodeAffinity.preferredDuringSchedulingIgnoredDuringExecution instead of or as well as
requiredDuringSchedulingIgnoredDuringExecution to configure these. Define each of your optional constraints as a
matchExpressions clause within a
apiVersion: v1 kind: Pod metadata: name: demo-pod spec: containers: - name: demo-container # ... affinity: nodeAffinity: preferredDuringSchedulingIgnoredDuringExecution: - weight: 1 preference: matchExpressions: - key: hardware-class operator: In values: - a - b - c
Preference-based rules have an additional field called
weight that accepts an integer from 1 to 100. Each Node that matches a preference has its total affinity weight incremented by the set amount; the Node that ends up with the highest overall weight will be allocated the Pod.
Setting Inter-Pod Affinities
Inter-Pod Affinities work very similarly to Node Affinities but do have some important differences. The “hard” and “soft” modes are indicated using the same
preferredDuringSchedulingIgnoredDuringExecution fields. These should be nested under the
spec.affinity.podAntiAffinity fields depending on whether you want to increase or reduce the Pod’s affinity upon a successful match.
Here’s a simple example that demonstrates both affinity and anti-affinity:
apiVersion: v1 kind: Pod metadata: name: demo-pod spec: containers: - name: demo-container # ... affinity: podAffinity: requiredDuringSchedulingIgnoredDuringExecution: - labelSelector: matchExpressions: - key: hardware-class operator: In values: - a - b - c topologyKey: topology.kubernetes.io/zone podAntiAffinity: preferredDuringSchedulingIgnoredDuringExecution: - weight: 1 podAffinityTerm: - labelSelector: matchExpressions: - key: app-component operator: In values: - background-worker topologyKey: topology.kubernetes.io/zone
The format differs slightly from Node Affinity. Each
matchExpressions constraint needs to be nested under a
labelSelector. For soft matches, this in turn should be located within a
podAffinityTerm. Pod affinities also offer a reduced set of comparison operators: you can use
Pod affinities need a
topologyKey field. This is used to limit the overall set of Nodes that are considered eligible for scheduling, before the
matchExpressions are evaluated. The rules above will schedule the Pod to a Node with the
topology.kubernetes.io/zone label and an existing Pod with the
hardware-class label set to
c. Nodes that also have a Pod with the
app-component=background-worker label will be given a reduced affinity.
Inter-Pod affinities are a powerful mechanism for controlling colocation of Pods. However they do have a significant impact on performance: Kubernetes warns against using them in clusters with more than a few hundred Nodes. Each new Pod scheduling request needs to check every other Pod on all the other Nodes to assess compatibility.
Other Scheduling Constraints
While we’ve focused on affinities in this article, Kubernetes provides other scheduler constraint mechanisms too. These are typically simpler but less automated approaches that work well for smaller clusters and deployments.
The most basic constraint is the
nodeSelector field. It’s defined on Pods as a set of label key-value pairs that must exist on Nodes hosting the Pod:
apiVersion: v1 kind: Pod metadata: name: demo spec: containers: - name: demo # ... nodeSelector: hardware-class: a internal: true
This manifest instructs Kubernetes to only schedule the Pod to Nodes with both the
hardware-class: a and
internal: true labels.
Node selection with the
nodeSelector field is a good way to quickly scaffold static configuration based on long-lived attributes of your Nodes. The affinity system is much more flexible when you want to express complex rules and optional preferences.
Affinities and anti-affinities are used to set up versatile Pod scheduling constraints in Kubernetes. Compared to other options like
nodeSelector, affinities are complex but give you more ways to identify compatible Nodes.
Affinities can act as soft preferences that signal a Pod’s “ideal” environment to Kubernetes even if it can’t be immediately satisfied. The system also has the unique ability of filtering Nodes based on their existing workloads so you can implement Pod colocation rules.
One final point to note is that affinity isn’t the end of the scheduling process. A Pod with a strong computed affinity to a Node might still end up elsewhere because of the input of Node taints. This mechanism lets you manage scheduling requests from the perspective of your Nodes. Taints actively repel incoming Pods away to other Nodes, effectively the opposite of the magnetic attraction of affinities. Node selectors, affinities, taints, and tolerations are all balanced to determine the final in-cluster location of each new Pod.