Graphic showing the Kubernetes logo

Kubernetes is a distributed system that’s designed to scale replicas of your services across multiple physical environments. In many cases this works well out-of-the-box. The Kubernetes scheduler automatically places your Pods (container instances) onto Nodes (worker machines) that have enough resources to support them.

Despite its best efforts, sometimes the scheduler won’t select a plan you agree with. You might want Pods to be colocated if they’ll be regularly communicating over the network; alternatively, some compute-intensive Pods might be best allocated to separate Nodes wherever possible.

Kubernetes has several mechanisms which let you guide the scheduler’s decision-making process so Pods end up on particular Nodes. In this article, we’ll focus specifically on the “affinity” and “anti-affinity” concepts that give you granular control of scheduling. Affinities define rules that either must or should be met before a Pod can be allocated to a Node.

How Does Affinity Work?

Affinities are used to express Pod scheduling constraints that can match characteristics of candidate Nodes and the Pods that are already running on those Nodes. A Pod that has an “affinity” to a given Node is more likely to be scheduled to it; conversely, an “anti-affinity” makes it less probable it’ll be scheduled. The overall balance of these weights is used to determine the final placement of each Pod.

Affinity assessments can produce either hard or soft outcomes. A “hard” result means the Node must have the characteristics defined by the affinity expression. “Soft” affinities act as a preference, indicating to the scheduler that it should use a Node with the characteristics if one is available. A Node that doesn’t meet the condition will still be selected if necessary.

Types of Affinity Condition

There are currently two different kinds of affinity that you can define:

  • Node Affinity – Used to constrain the Nodes that can receive a Pod by matching labels of those Nodes. Node Affinity can only be used to set positive affinities that attract Pods to the Node.
  • Inter-Pod Affinity – Used to constrain the Nodes that can receive a Pod by matching labels of the existing Pods already running on each of those Nodes. Inter-Pod Affinity can be either an attracting affinity or a repelling anti-affinity.
Advertisement

In the simplest possible example, a Pod that includes a Node Affinity condition of label=value will only be scheduled to Nodes with a label=value label. A Pod with the same condition but defined as an Inter-Pod Affinity will be scheduled to a Node that already hosts a Pod with a label=value label.

Setting Node Affinities

Node Affinity has two distinct sub-types:

  • requiredDuringSchedulingIgnoredDuringExecution – This is the “hard” affinity matcher that requires the Node meet the constraints you define.
  • preferredDuringSchedulingIgnoredDuringExecution – This is the “soft” matcher to express a preference that’s ignored when it can’t be fulfilled.

The IgnoredDuringExecution part of these verbose names makes it explicit that affinity is only considered while scheduling Pods. Once a Pod has made it onto a Node, affinity isn’t re-evaluated. Changes to the Node won’t cause a Pod eviction due to changed affinity values. A future Kubernetes release could add support for this behavior via the reserved requiredDuringSchedulingRequiredDuringExecution phrase.

Node affinities are attached to Pods via their spec.affinity.nodeAffinity manifest field:

apiVersion: v1
kind: Pod
metadata:
  name: demo-pod
spec:
  containers:
    - name: demo-container
    # ...
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
          - matchExpressions:
            - key: hardware-class
              operator: In
              values:
                - a
                - b
                - c
          - matchExpressions:
            - key: internal
              operator: Exists

This manifest creates a hard affinity rule that schedules the Pod to a Node meeting the following criteria:

  • It has a hardware-class label with either a, b, or c as the value.
  • It has an internal label with any value.

You can attach additional conditions by repeating the matchExpressions clause. Supported operators for value comparisons are In, NotIn, Exists, DoesNotExist, Gt (greater than), and Lt (less than).

Advertisement

The matchExpression clauses grouped under a single nodeSelectorTerms clause are combined with a boolean AND. They all need to match for a Pod to gain affinity to a particular Node. You can use multiple nodeSelectorTerms clauses too; these will be combined as a logical OR operation. You can easily assemble complex scheduling criteria by utilizing both of these structures.

“Soft” scheduling preferences are set up in a similar way. Use nodeAffinity.preferredDuringSchedulingIgnoredDuringExecution instead of or as well as requiredDuringSchedulingIgnoredDuringExecution to configure these. Define each of your optional constraints as a matchExpressions clause within a preference field:

apiVersion: v1
kind: Pod
metadata:
  name: demo-pod
spec:
  containers:
    - name: demo-container
    # ...
  affinity:
    nodeAffinity:
      preferredDuringSchedulingIgnoredDuringExecution:
      - weight: 1
        preference:
          matchExpressions:
          - key: hardware-class
            operator: In
            values:
              - a
              - b
              - c

Preference-based rules have an additional field called weight that accepts an integer from 1 to 100. Each Node that matches a preference has its total affinity weight incremented by the set amount; the Node that ends up with the highest overall weight will be allocated the Pod.

Setting Inter-Pod Affinities

Inter-Pod Affinities work very similarly to Node Affinities but do have some important differences. The “hard” and “soft” modes are indicated using the same requiredDuringSchedulingIgnoredDuringExecution and preferredDuringSchedulingIgnoredDuringExecution fields. These should be nested under the spec.affinity.podAffinity or spec.affinity.podAntiAffinity fields depending on whether you want to increase or reduce the Pod’s affinity upon a successful match.

Here’s a simple example that demonstrates both affinity and anti-affinity:

apiVersion: v1
kind: Pod
metadata:
  name: demo-pod
spec:
  containers:
    - name: demo-container
    # ...
  affinity:
    podAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        - labelSelector:
            matchExpressions:
              - key: hardware-class
                operator: In
                values:
                  - a
                  - b
                  - c
          topologyKey: topology.kubernetes.io/zone
    podAntiAffinity:
      preferredDuringSchedulingIgnoredDuringExecution:
        - weight: 1
          podAffinityTerm:
            - labelSelector:
                matchExpressions:
                  - key: app-component
                    operator: In
                    values:
                      - background-worker
          topologyKey: topology.kubernetes.io/zone

The format differs slightly from Node Affinity. Each matchExpressions constraint needs to be nested under a labelSelector. For soft matches, this in turn should be located within a podAffinityTerm. Pod affinities also offer a reduced set of comparison operators: you can use In, NotIn, Exists and DoesNotExist.

Advertisement

Pod affinities need a topologyKey field. This is used to limit the overall set of Nodes that are considered eligible for scheduling, before the matchExpressions are evaluated. The rules above will schedule the Pod to a Node with the topology.kubernetes.io/zone label and an existing Pod with the hardware-class label set to a, b, or c. Nodes that also have a Pod with the app-component=background-worker label will be given a reduced affinity.

Inter-Pod affinities are a powerful mechanism for controlling colocation of Pods. However they do have a significant impact on performance: Kubernetes warns against using them in clusters with more than a few hundred Nodes. Each new Pod scheduling request needs to check every other Pod on all the other Nodes to assess compatibility.

Other Scheduling Constraints

While we’ve focused on affinities in this article, Kubernetes provides other scheduler constraint mechanisms too. These are typically simpler but less automated approaches that work well for smaller clusters and deployments.

The most basic constraint is the nodeSelector field. It’s defined on Pods as a set of label key-value pairs that must exist on Nodes hosting the Pod:

apiVersion: v1
kind: Pod
metadata:
  name: demo
spec:
  containers:
    - name: demo
      # ...
  nodeSelector:
    hardware-class: a
    internal: true

This manifest instructs Kubernetes to only schedule the Pod to Nodes with both the hardware-class: a and internal: true labels.

Node selection with the nodeSelector field is a good way to quickly scaffold static configuration based on long-lived attributes of your Nodes. The affinity system is much more flexible when you want to express complex rules and optional preferences.

Conclusion

Affinities and anti-affinities are used to set up versatile Pod scheduling constraints in Kubernetes. Compared to other options like nodeSelector, affinities are complex but give you more ways to identify compatible Nodes.

Advertisement

Affinities can act as soft preferences that signal a Pod’s “ideal” environment to Kubernetes even if it can’t be immediately satisfied. The system also has the unique ability of filtering Nodes based on their existing workloads so you can implement Pod colocation rules.

One final point to note is that affinity isn’t the end of the scheduling process. A Pod with a strong computed affinity to a Node might still end up elsewhere because of the input of Node taints. This mechanism lets you manage scheduling requests from the perspective of your Nodes. Taints actively repel incoming Pods away to other Nodes, effectively the opposite of the magnetic attraction of affinities. Node selectors, affinities, taints, and tolerations are all balanced to determine the final in-cluster location of each new Pod.

Profile Photo for James Walker James Walker
James Walker is a contributor to How-To Geek DevOps. He is the founder of Heron Web, a UK-based digital agency providing bespoke software development services to SMEs. He has experience managing complete end-to-end web development workflows, using technologies including Linux, GitLab, Docker, and Kubernetes.
Read Full Bio »