Friday, September 27, 2024

Kubernetes: How to schedule pods on certain nodes

 In my previous post, I talked about how you can avoid pods from being scheduled on certain nodes. In this post, I will discuss opposite of that. How you can schedule pods on certain nodes. This can be a hard requirement (failing which pod won’t be scheduled)or a soft requirement (scheduler will try to meet the requirement but if it does not, pod will be scheduled on some other node). Let’s discuss how we can do this.

Use case

There are couple of use cases when you want some pods to be always scheduled on certain nodes. Some of them are:

  1. Pods have specific hardware/resource requirement and you want those pods to be always scheduled on nodes which have supporting resources.
  2. Pods have specific security requirement to align with certain industry standards and pods should always go to nodes which satisfy those requirements.

There are 2 ways of assigning a pod to a node:

  1. Node selector, which can do the job but is not very expressive
  2. Node affinity, which is a bit more complex than node selector but gives you more features as well.

We will discuss both these approaches.

First approach, using Node selector

With node selector, you decorate the pod with a label(key value pair) of the node on which you want this pod to be scheduled. Once you specify the node selector, pod will be scheduled on node which has corresponding matching label(s).

Node selector is a field in pod spec. After pod definition, you need to specify the node selector. Here are the steps to do do this.

Step 1: Label the node

You can assign the labels on the node using following command:

kubectl label nodes <node-name> <label-key>:<label-value>

In above command, you need to replace the place holders with appropriate values. <node-name> is the name of one of the nodes in your cluster. You can get the list of nodes in your cluster, using following command.

kubectl get nodes 

<label-key> and <label-value> can be any arbitrary key and and value.

Step 2: Add node selector to pod configuration

See the pod configuration below. After pod specification, node selector is defined for the pod.

apiVersion: v1
kind: Pod
metadata:
  name: <pod-name>
spec:
  containers:
  - name: <pod-name>
    image: <image-name>
  nodeSelector:
    <label-key>: <label-value>

Replace the placeholders with appropriate values. Make sure key and value in node selector match with label key and value of the node as mentioned in step 1 above.

Once this pod is created, it will be scheduled on node having specified label.

You can verify this by using following command and check the node on which pod is scheduled.

kubectl get pods -o wide

Multiple node selectors

You can specify multiple key value pairs in node selector and in that case scheduler will try to find a node with all key values pairs. If it can’t find any node with all labels mentioned in node selector, pod will not be scheduled. Here is how you specify multiple node selectors

apiVersion: v1
kind: Pod
metadata:
  name: <pod-name>
spec:
  containers:
  - name: <pod-name>
    image: <image-name>
  nodeSelector:
    <label-key-1>: <label-value>
    <label-key-2>: <label-value>

Since node selector is a map, key should be different for every entry.

Second approach, using Node affinity

We can use node selector for scheduling pod on certain nodes but node selector is not very expressive. Instead of using node selector we can use node affinity which is more expressive. Here are few differences between them:

  1. Node selector only supports AND operator but besides AND operator node affinity also supports other operators such as In, NotIn, Exist, DoesNotExist, Gt and Lt.
  2. You can specify a requirement as either hard or soft rules. We will discuss shortly what are hard and soft rules.

With flexibility comes complexity. Node affinity rules are complex when compared to node selector rules but once you understand them you will be able to write them down fairly easily.

Hard and soft rules

Hard rules are conditions which scheduler will look for before scheduling a pod on node failing which pod will not be scheduled. Hard requirements are specified with requiredDuringSchedulingIgnoredDuringExecution

Soft rules are conditions which scheduler will try to match before scheduling a pod on node but if it can’t find a node, it will still schedule the pod on most preferred node. Soft requirements are specified with preferredDuringSchedulingIgnoredDuringExecution

Using node affinity

NodeAffinity has 2 fields as mentioned above:

  • requiredDuringSchedulingIgnoredDuringExecution
  • preferredDuringSchedulingIgnoredDuringExecution

If you see carefully, one start with required and other starts with preferred and rest everything is same in both the fields. Both these fields specify 2 phases of pod lifecycle: scheduling and execution. First half of each field tells about conditions which scheduler should look for before scheduling the pod and second half tells about conditions which should meet during execution of pod on a node. “IgnoredDuringExecution” means, after a pod has been scheduled on a node and labels on node change afterwards, pods won’t be affected and will continue to run.

Using hard rules

Like nodeSelector, nodeAffinity is also a field of pod spec. Here is how you can use it. Replace the placeholders with appropriate values.

apiVersion: v1
kind: Pod
metadata:
  name: <pod-name>
spec:
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchExpressions:
          - key: <label-key>
            operator: In
            values:
            - <label-value-1>
            - <label-value-2>

  containers:
  - name: <pod-name>
    image: <image-name>

For Pod definition with above affinity rules, scheduler with try to find a node with label key as “<label-key>” and label value either of “<label-value-1>” or “<label-value-2>”. Since we have specified hard rule, scheduler will make sure if no node exists with either of label values, pod will not be scheduled.

Multiple nodeSelectorTerms vs matchExpressions

If you see carefully, both nodeSelectorTerms and matchExpressions are list. So, how do you decide whether you specify multiple nodeSelectorTerms or multiple matchExpressions. Between multiple nodeSelectorTerms “or” operation is used i.e. if any of the nodeSelectorTerms matches on a node, pod will be scheduled on the node but for multiple matchExpressions “and” operation is used i.e. all of the rules in a matchExpression should match for a node to be eligible to host a pod.

Using soft rules

For using soft rules, we need to use “preferred…” field of node affinity. Here is how you can use it. Replace the placeholders with appropriate values.

apiVersion: v1
kind: Pod
metadata:
  name: <pod-name>
spec:
  affinity:
    nodeAffinity:
      preferredDuringSchedulingIgnoredDuringExecution:
      - weight: 1
        preference:
          matchExpressions:
          - key: <label-key>
            operator: In
            values:
            - <label-value-1>

  containers:
  - name: <pod-name>
    image: <image-name>

For Pod definition with above affinity rules, scheduler with try to find a node with label key as “<label-key>” and label value of “<label-value-1>”. Since we have specified soft rule, scheduler will try to schedule the pod on a node with appropriate label else it will schedule the pod on some other node.

When multiple nodes match the preferred criteria, it is the weight field which decides which node the pod will be scheduled on. Weight is in the range 1–100. For each node that meets all of the scheduling requirements (resource request, RequiredDuringScheduling affinity expressions, etc.), the scheduler will compute a sum by iterating through the elements of this field and adding “weight” to the sum if the node matches the corresponding MatchExpressions. This score is then combined with the scores of other priority functions for the node. The node(s) with the highest total score are the most preferred.

Using NodeSelector and NodeAffinity together

If you specify both node selector and node affinity on a pod, then target node should satisfy both node selector and node affinity rules for it to be able to schedule the pod on it.

Conclusion

Pods with specific requirements can be aligned to certain node based on labels attached to a node. You can specify either node selector or node affinity or both to assign pod to certain nodes. Node selector is simple to use but is not very expressive and is limited in scope. Node affinity rules are a bit complex but gives you more option of expressing yourself in terms whether you want to use hard rules or soft rules and gives you multiple options for operator which you can use.

No comments:

Post a Comment

How to debug a Java application deployed in Kubernetes cluster with IntelliJ IDEA

Kubernetes has become the de facto standard for deploying and managing containerized applications at scale. However, debugging applications ...