In
my previous post,
I talked about how you can avoid pods from being scheduled on certain nodes. In
this post, I will discuss opposite of that. How you can schedule pods on
certain nodes. This can be a hard requirement (failing which pod won’t be
scheduled)or a soft requirement (scheduler will try to meet the requirement but
if it does not, pod will be scheduled on some other node). Let’s discuss how we
can do this.
Use case
There are couple of use cases when you want some pods to be
always scheduled on certain nodes. Some of them are:
- Pods
have specific hardware/resource requirement and you want those pods to be
always scheduled on nodes which have supporting resources.
- Pods
have specific security requirement to align with certain industry
standards and pods should always go to nodes which satisfy those
requirements.
There are 2 ways of assigning a pod to a node:
- Node
selector, which can do the job but is not very expressive
- Node
affinity, which is a bit more complex than node selector but gives you
more features as well.
We will discuss both these approaches.
First approach, using Node selector
With node selector, you decorate the pod with a label(key value pair) of
the node on which you want this pod to be scheduled. Once you specify the node
selector, pod will be scheduled on node which has corresponding matching
label(s).
Node selector is a field in pod spec. After pod definition,
you need to specify the node selector. Here are the steps to do do this.
Step 1: Label the node
You can assign the labels on the node using
following command:
kubectl
label nodes <node-name> <label-key>:<label-value>
In above command, you need to replace the place holders with
appropriate values. <node-name> is the name of one of the nodes in your
cluster. You can get the list of nodes in your cluster, using following
command.
kubectl get nodes
<label-key> and <label-value> can be
any arbitrary key and and value.
Step 2: Add node selector to pod configuration
See the pod configuration below. After pod specification,
node selector is defined for the pod.
apiVersion: v1
kind: Pod
metadata:
name: <pod-name>
spec:
containers:
- name: <pod-name>
image: <image-name>
nodeSelector:
<label-key>:
<label-value>
Replace the placeholders with appropriate values. Make sure
key and value in node selector match with label key and value of the node as
mentioned in step 1 above.
Once this pod is created, it will be scheduled on node
having specified label.
You can verify this by using following command and check the
node on which pod is scheduled.
kubectl get pods -o wide
Multiple node selectors
You can specify multiple key value pairs in node selector
and in that case scheduler will try to find a node with all key values pairs.
If it can’t find any node with all labels mentioned in node selector, pod will
not be scheduled. Here is how you specify multiple node selectors
apiVersion: v1
kind: Pod
metadata:
name: <pod-name>
spec:
containers:
- name: <pod-name>
image: <image-name>
nodeSelector:
<label-key-1>:
<label-value>
<label-key-2>:
<label-value>
Since node selector is a map, key should be different for
every entry.
Second approach, using Node affinity
We can use node selector for scheduling pod on certain nodes
but node selector is not very expressive. Instead of using node selector we can
use node affinity which is more expressive. Here are few differences between
them:
- Node selector only supports AND operator but besides AND
operator node affinity also supports other operators such as In, NotIn, Exist,
DoesNotExist, Gt and Lt.
- You
can specify a requirement as either hard or soft rules. We will
discuss shortly what are hard and soft rules.
With flexibility comes complexity. Node affinity
rules are complex when compared to node selector rules but once you understand
them you will be able to write them down fairly easily.
Hard and soft rules
Hard rules are conditions which scheduler will look for
before scheduling a pod on node failing which pod will not be scheduled. Hard
requirements are specified with requiredDuringSchedulingIgnoredDuringExecution
Soft rules are conditions which scheduler will try to match
before scheduling a pod on node but if it can’t find a node, it will still
schedule the pod on most preferred node. Soft requirements are specified
with preferredDuringSchedulingIgnoredDuringExecution
Using node affinity
NodeAffinity has 2 fields as mentioned above:
- requiredDuringSchedulingIgnoredDuringExecution
- preferredDuringSchedulingIgnoredDuringExecution
If you see carefully, one start with required and other
starts with preferred and rest everything is same in both the fields. Both
these fields specify 2 phases of pod lifecycle: scheduling and execution. First
half of each field tells about conditions which scheduler should look for
before scheduling the pod and second half tells about conditions which should
meet during execution of pod on a node. “IgnoredDuringExecution” means, after a
pod has been scheduled on a node and labels on node change afterwards, pods
won’t be affected and will continue to run.
Using hard rules
Like nodeSelector, nodeAffinity is also a field of pod spec.
Here is how you can use it. Replace the placeholders with appropriate values.
apiVersion: v1
kind: Pod
metadata:
name: <pod-name>
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: <label-key>
operator: In
values:
- <label-value-1>
- <label-value-2>
containers:
- name: <pod-name>
image: <image-name>
For Pod definition with above affinity rules,
scheduler with try to find a node with label key as “<label-key>” and
label value either of “<label-value-1>” or “<label-value-2>”. Since
we have specified hard rule, scheduler will make sure if no node exists with
either of label values, pod will not be scheduled.
Multiple nodeSelectorTerms vs matchExpressions
If you see carefully, both nodeSelectorTerms and
matchExpressions are list. So, how do you decide whether you specify multiple
nodeSelectorTerms or multiple matchExpressions. Between multiple
nodeSelectorTerms “or” operation is used i.e. if any of the nodeSelectorTerms
matches on a node, pod will be scheduled on the node but for multiple
matchExpressions “and” operation is used i.e. all of the rules in a
matchExpression should match for a node to be eligible to host a pod.
Using soft rules
For using soft rules, we need to use “preferred…” field of
node affinity. Here is how you can use it. Replace the placeholders with
appropriate values.
apiVersion: v1
kind: Pod
metadata:
name: <pod-name>
spec:
affinity:
nodeAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 1
preference:
matchExpressions:
- key: <label-key>
operator: In
values:
- <label-value-1>
containers:
- name: <pod-name>
image: <image-name>
For Pod definition with above affinity rules, scheduler with
try to find a node with label key as “<label-key>” and label value of
“<label-value-1>”. Since we have specified soft rule, scheduler will try
to schedule the pod on a node with appropriate label else it will schedule the
pod on some other node.
When multiple nodes match the preferred criteria, it is the
weight field which decides which node the pod will be scheduled on. Weight is
in the range 1–100. For each node that meets all of the scheduling requirements
(resource request, RequiredDuringScheduling affinity expressions, etc.), the
scheduler will compute a sum by iterating through the elements of this field
and adding “weight” to the sum if the node matches the corresponding
MatchExpressions. This score is then combined with the scores of other priority
functions for the node. The node(s) with the highest total score are the most
preferred.
Using NodeSelector and NodeAffinity together
If you specify both node selector and node affinity on a
pod, then target node should satisfy both node selector and node affinity rules
for it to be able to schedule the pod on it.
Conclusion
Pods with specific requirements can be aligned to certain
node based on labels attached to a node. You can specify either node selector
or node affinity or both to assign pod to certain nodes. Node selector is
simple to use but is not very expressive and is limited in scope. Node affinity
rules are a bit complex but gives you more option of expressing yourself in
terms whether you want to use hard rules or soft rules and gives you multiple
options for operator which you can use.