Friday, September 27, 2024

Kubernetes: How to schedule pods on certain nodes

 In my previous post, I talked about how you can avoid pods from being scheduled on certain nodes. In this post, I will discuss opposite of that. How you can schedule pods on certain nodes. This can be a hard requirement (failing which pod won’t be scheduled)or a soft requirement (scheduler will try to meet the requirement but if it does not, pod will be scheduled on some other node). Let’s discuss how we can do this.

Use case

There are couple of use cases when you want some pods to be always scheduled on certain nodes. Some of them are:

  1. Pods have specific hardware/resource requirement and you want those pods to be always scheduled on nodes which have supporting resources.
  2. Pods have specific security requirement to align with certain industry standards and pods should always go to nodes which satisfy those requirements.

There are 2 ways of assigning a pod to a node:

  1. Node selector, which can do the job but is not very expressive
  2. Node affinity, which is a bit more complex than node selector but gives you more features as well.

We will discuss both these approaches.

First approach, using Node selector

With node selector, you decorate the pod with a label(key value pair) of the node on which you want this pod to be scheduled. Once you specify the node selector, pod will be scheduled on node which has corresponding matching label(s).

Node selector is a field in pod spec. After pod definition, you need to specify the node selector. Here are the steps to do do this.

Step 1: Label the node

You can assign the labels on the node using following command:

kubectl label nodes <node-name> <label-key>:<label-value>

In above command, you need to replace the place holders with appropriate values. <node-name> is the name of one of the nodes in your cluster. You can get the list of nodes in your cluster, using following command.

kubectl get nodes 

<label-key> and <label-value> can be any arbitrary key and and value.

Step 2: Add node selector to pod configuration

See the pod configuration below. After pod specification, node selector is defined for the pod.

apiVersion: v1
kind: Pod
metadata:
  name: <pod-name>
spec:
  containers:
  - name: <pod-name>
    image: <image-name>
  nodeSelector:
    <label-key>: <label-value>

Replace the placeholders with appropriate values. Make sure key and value in node selector match with label key and value of the node as mentioned in step 1 above.

Once this pod is created, it will be scheduled on node having specified label.

You can verify this by using following command and check the node on which pod is scheduled.

kubectl get pods -o wide

Multiple node selectors

You can specify multiple key value pairs in node selector and in that case scheduler will try to find a node with all key values pairs. If it can’t find any node with all labels mentioned in node selector, pod will not be scheduled. Here is how you specify multiple node selectors

apiVersion: v1
kind: Pod
metadata:
  name: <pod-name>
spec:
  containers:
  - name: <pod-name>
    image: <image-name>
  nodeSelector:
    <label-key-1>: <label-value>
    <label-key-2>: <label-value>

Since node selector is a map, key should be different for every entry.

Second approach, using Node affinity

We can use node selector for scheduling pod on certain nodes but node selector is not very expressive. Instead of using node selector we can use node affinity which is more expressive. Here are few differences between them:

  1. Node selector only supports AND operator but besides AND operator node affinity also supports other operators such as In, NotIn, Exist, DoesNotExist, Gt and Lt.
  2. You can specify a requirement as either hard or soft rules. We will discuss shortly what are hard and soft rules.

With flexibility comes complexity. Node affinity rules are complex when compared to node selector rules but once you understand them you will be able to write them down fairly easily.

Hard and soft rules

Hard rules are conditions which scheduler will look for before scheduling a pod on node failing which pod will not be scheduled. Hard requirements are specified with requiredDuringSchedulingIgnoredDuringExecution

Soft rules are conditions which scheduler will try to match before scheduling a pod on node but if it can’t find a node, it will still schedule the pod on most preferred node. Soft requirements are specified with preferredDuringSchedulingIgnoredDuringExecution

Using node affinity

NodeAffinity has 2 fields as mentioned above:

  • requiredDuringSchedulingIgnoredDuringExecution
  • preferredDuringSchedulingIgnoredDuringExecution

If you see carefully, one start with required and other starts with preferred and rest everything is same in both the fields. Both these fields specify 2 phases of pod lifecycle: scheduling and execution. First half of each field tells about conditions which scheduler should look for before scheduling the pod and second half tells about conditions which should meet during execution of pod on a node. “IgnoredDuringExecution” means, after a pod has been scheduled on a node and labels on node change afterwards, pods won’t be affected and will continue to run.

Using hard rules

Like nodeSelector, nodeAffinity is also a field of pod spec. Here is how you can use it. Replace the placeholders with appropriate values.

apiVersion: v1
kind: Pod
metadata:
  name: <pod-name>
spec:
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchExpressions:
          - key: <label-key>
            operator: In
            values:
            - <label-value-1>
            - <label-value-2>

  containers:
  - name: <pod-name>
    image: <image-name>

For Pod definition with above affinity rules, scheduler with try to find a node with label key as “<label-key>” and label value either of “<label-value-1>” or “<label-value-2>”. Since we have specified hard rule, scheduler will make sure if no node exists with either of label values, pod will not be scheduled.

Multiple nodeSelectorTerms vs matchExpressions

If you see carefully, both nodeSelectorTerms and matchExpressions are list. So, how do you decide whether you specify multiple nodeSelectorTerms or multiple matchExpressions. Between multiple nodeSelectorTerms “or” operation is used i.e. if any of the nodeSelectorTerms matches on a node, pod will be scheduled on the node but for multiple matchExpressions “and” operation is used i.e. all of the rules in a matchExpression should match for a node to be eligible to host a pod.

Using soft rules

For using soft rules, we need to use “preferred…” field of node affinity. Here is how you can use it. Replace the placeholders with appropriate values.

apiVersion: v1
kind: Pod
metadata:
  name: <pod-name>
spec:
  affinity:
    nodeAffinity:
      preferredDuringSchedulingIgnoredDuringExecution:
      - weight: 1
        preference:
          matchExpressions:
          - key: <label-key>
            operator: In
            values:
            - <label-value-1>

  containers:
  - name: <pod-name>
    image: <image-name>

For Pod definition with above affinity rules, scheduler with try to find a node with label key as “<label-key>” and label value of “<label-value-1>”. Since we have specified soft rule, scheduler will try to schedule the pod on a node with appropriate label else it will schedule the pod on some other node.

When multiple nodes match the preferred criteria, it is the weight field which decides which node the pod will be scheduled on. Weight is in the range 1–100. For each node that meets all of the scheduling requirements (resource request, RequiredDuringScheduling affinity expressions, etc.), the scheduler will compute a sum by iterating through the elements of this field and adding “weight” to the sum if the node matches the corresponding MatchExpressions. This score is then combined with the scores of other priority functions for the node. The node(s) with the highest total score are the most preferred.

Using NodeSelector and NodeAffinity together

If you specify both node selector and node affinity on a pod, then target node should satisfy both node selector and node affinity rules for it to be able to schedule the pod on it.

Conclusion

Pods with specific requirements can be aligned to certain node based on labels attached to a node. You can specify either node selector or node affinity or both to assign pod to certain nodes. Node selector is simple to use but is not very expressive and is limited in scope. Node affinity rules are a bit complex but gives you more option of expressing yourself in terms whether you want to use hard rules or soft rules and gives you multiple options for operator which you can use.

Monday, September 16, 2024

Kubernetes: How to avoid scheduling pods on certain nodes

 When new pods are created in a cluster (either due to failure of existing pods or to scale the system horizontally), these pods are placed on some node. If some existing node has capacity in line with resource requirements for the pod, the pod is scheduled on that node or else new node is created.

Generally, an application will have multiple pods and every pod will have different resource requirements. Depending upon resource requirement of pods, you decide the node size. but if one of your pod has pretty much different resource requirement compared to other pods (eg: database), you might be tempted to have different node configuration for that particular pod and in such scenarios you want to make sure that only certain pods (eg: database) land on that node but no other pod, so that the node has sufficient capacity for target pod. We will see how we can avoid other pods from landing onto these nodes.

 

We can do this by using taints and tolerations.

Taint and Tolerations

Taints and tolerations work together to ensure that pods are not scheduled onto inappropriate nodes.

Taint

Taints are applied to nodes. They allow a node to repel a set of pods.

Tolerations

Tolerations are applied to pods, and allow (but do not require) the pods to be scheduled onto nodes with matching taints.

Core concept

Idea is, you taint a node on which you want only certain pods to be scheduled and pods which should be scheduled on these nodes should have toleration for the taint which is applied to the node. 

Gate security analogy

It is same as if you want to restrict entry to some premise to certain people only, than you put security on the gate of the premise which will not allow anyone to enter the premise. Only the ones who have the required pass to enter the premise will be allowed by security. So, taints are like security on nodes and tolerations are gate pass for the security.

Put security on gate: Apply taint on node

To restrict a node to accept pod of certain types, we need to apply a taint on the node. You can apply the taint using kubectl taint.

kubectl taint nodes <node-name> type=db:NoSchedule

You need to replace the <node-name> place holder with name of node. Above command places a taint on node “<node-name>”. The taint has key “type”, value “db”, and taint effect “NoSchedule”. This means that no pod will be scheduled on node <node-name> unless it has a matching toleration. We will shortly see what is taint effect and what are different types of effects. You can have any value for key and value. In my case, I choose “type” as key and “db” as value.

List taints on a node

You can list taints which are applied on a node using kubectl describe and then applying filter. This will list all the taints applied on specified node. In below command, replace the <node-name> placeholder with actual name of a node in your cluster.

kubectl describe node <node-name> | grep Taints

Give pass to some people: Apply tolerations to pods

In following pod definition, notice tolerations under spec. This toleration is for a taint and hence acts as a gate pass for security (we are referring taint as security). Text in bold in code snippet below is the tolerations. Since it is a list, you can apply multiple tolerations on a pod.

apiVersion: v1

kind: Pod

metadata:

  name: <image-name>

  labels:

    app: taint-test

spec:

  containers:

  - name: <image-name>

    image: <image>

  tolerations:

  - key: "type"

    operator: "Equal"

    value: "db"

    effect: "NoSchedule"

Important thing to note while applying tolerations is, it should be absolutely identical to taint you are trying to address. Notice in above toleration, key, value and effect are exactly same as mentioned in taint. Equal operator tells controller to match the value for the key.

Once above toleration is applied, this pod can be scheduled on the node with similar taint. So effectively, this pod has the gate pass to get into the node. Any pod which does not have this toleration, can’t be scheduled on this node.

It is important to understand that applying toleration on pod means that pod can be scheduled on node with same taint but this does not mean that this pod can’t go to any other node in cluster. This pod can still go to any other node in the cluster but a node with the taint can accept pods with similar toleration.

Before we move further, lets discuss about various taint effects and operators in tolerations.

Taint effects

There are 3 taint effects: NoSchedule, PreferNoSchedule and NoExecute

  • NoSchedule: Pods that don't tolerate this taint are not scheduled on the node.
  • PreferNoSchedule: Kubernetes tries to avoid scheduling the pods that don't tolerate this taint but may schedule them if there are no other options. 
  • NoExecute: Pods that don't tolerate this taint, are evicted immediately. It prevents new pods from being scheduled on the node and also removes the existing ones.

Toleration operators

There are 2 operators for tolerance in pods:

  • Equal: This will match both key and value to make sure they both match with the ones specified in taint.
  • Exist: This will make sure that taint with given key exists on node and does not bother about value. Value in the taint can be anything.

Taints and master node/control plane

If you notice, in multi node cluster, pods are not scheduled on master node. How is this controlled? Well you guessed it right, using taints on master node.

Master node has a following taint applied to it:

node-role.kubernetes.io/master

You can check this by describing the node and filtering taint as mentioned above.

and since no pod has a tolerance for this taint, no pod is scheduled on master node. You can schedule pods on master node by removing the taint from the node as describe in following section.

Remove taint from node

To remove the taint added by the command above, you can run:

kubectl taint nodes node1 key1=value1:NoSchedule-

It is exactly the same command which is used to apply taint but followed by “-” at the end.

That’s is all about how can you avoid pods to be scheduled on certain nodes.

Conclusion

Due to some special resources requirements of some pods, you may launch nodes with higher configuration and want to make sure those nodes don’t accept any pod coming its way rather you want to restrict scheduling of certain pods on that node. You do this by applying the taint on the node. A taint on node will restrict any pod from being scheduled on that node unless a pod has a toleration for the taint which is applied on that node. Pods with appropriate toleration can be scheduled in that node.

So, it is a 2 step process:

  1. Apply taint on node
  2. Mention toleration on pod for the taint

I hope this helps.

 


Wednesday, September 4, 2024

Migrating Virtual box VM to Hyper-V

I started using Virtual box around 4–5 years back and I got so comfortable that I refused to use any other Hypervisor be it VM ware or Hyper-V. Recently when I was trying to set up minikube (Kubernetes on local machine) cluster on my local machine using VirtualBox, I ran into many problems and after spending good amount of time I didn’t get any success. 

So, I decided to use Hyper-V for virtualization to setup minikube but I knew that once I enable Hyper-V, I won’t be able to use VirtualBox and then what about all my data on VMs in VirtualBox. So, I was wondering how can I migrate my VMs from VirtualBox to Hyper-V. After lots of search, I was able to successfully migrate my VirtualBox VM to Hyper-V. In this post, I will describe all the steps required to do so.

At high level, you need to perform 5 steps to migrate the VM from Virtual Box to Hyper-V

  1. Export the VM from VirtualBox into VHD format
  2. Convert VHD to VHDX
  3. Create a new VM in Hyper-V without disk
  4. Attach Hard Disk with Virtual machine
  5. Set the Boot options

1. Export the VM from VirtualBox into VHD format

Though VirtualBox has the option for this from UI but that didn’t work for me. I was using VirtualBox version 6.1.4. You may want to try with your version of VirtualBox but for me instead of VHD file it was generating a VDI file. So, I decided to use following command and it worked like charm for me

Format of command

VBoxManage clonehd <absolute-path-of-vdi-file> <vhd-destination> — format vhd

Example:

VBoxManage clonehd C:\users\myuser\VirtualBoxVMs\ubuntu.vdi D:\virtualbox-export\ubuntu.vhd — format vhd


Note 1: If your absolute path contains spaces, you will have to wrap the path in double quotes.

Note 2: VBoxManage might not be in your path, so you will to navigate to path where virtual box is installed. This happens to be following location on my machine. C:\Program Files\Oracle\VirtualBox


For those who want to try with Virtual Box UI, there are the steps:


Select the VDI you want to convert, right click and select copy



Select the location where you want to save VHD and enter a name. In Disk image file type, select VHD and click copy. This should generate the VHD



2. Convert VHD to VHDX

  1. Launch Hyper-V Manager and select server in left pane
  2. Under Actions, select Edit Disk…
  3. Click Next on the ‘Before You Begin’ screen
  4. Browse for the copied file. The file will be having a file extension of VHD. Select the file and click Next button.
  5. On the Choose Action window, select Convert and click Next button



6. Select VHDX format and click Next



7. Select Dynamically expanding and click Next

8. Select a name and location for the file and click Next

9. Click Finish and wait for the conversion process to complete

3. Create a new VM in Hyper-V without disk

From actions in top menu, select New → Virtual Machine and follow the wizard. Select the default values during different steps of wizard or change the values as per your convivence.




There is one screen in wizard which you need to take care of. It is Connect Virtual Hard Disk screen. On this screen, select “Attach a Virtual Hard disk later”.




4. Attach Hard Disk with Virtual machine

In Hyper V Manager, right click the VM and select settings. In left pane select IDE controller 0, select Hard Drive in right pane and click Add.




Browse to the path of vhdx file created in previous step and select that. Click Apply.


5. Set the Boot options

Select BOIS in left pane and in right pane move the IDE option to top and select ok.



After going through all the above steps, start the VM and connect to it and you will be able to use the VM through Hyper V. This VM will all the packages and data which you installed on it while working with Virtual box.

Hope this helps. 

Kubernetes: How to deploy different pods close to each other (same node or zone etc.)

 In my last two posts, I discussed how to avoid and enforce pod scheduling on specific nodes. You can read them here Avoidscheduling pods ...