Monday, October 14, 2024

Kubernetes: How to deploy different pods close to each other (same node or zone etc.)

 In my last two posts, I discussed how to avoid and enforce pod scheduling on specific nodes. You can read them here

Avoidscheduling pods on certain nodes

Schedulepods on certain nodes

In this post, I will talk about, how to schedule your pods near to some other pod or in other words schedule a pod on a node which already has some other pod running on it. I will use some terminology from my earlier post about scheduling pods on certain nodes. So, please make sure you get some understanding from that post.

Idea here is to make sure when your pod is scheduled on some node, you want to make sure, it is placed on a node which already has some other pod running or it should not have some other pod running. First, let’s talk about possible use cases where you might need this.

Use cases

  • A application can have pods which frequently talk to each other, so you need to make sure, they are placed on same nodes to avoid latency. For ex: In an ecommerce application, every time you place an order, service needs to check inventory before accepting the order. For every order placement, there will be a inventory check service invocation. So, you might want to make sure that node having a order service pod should also have inventory service pod.
  • Another example could be, to place a cache pod near to a web application pod for faster access to cache contents.
  • You need to make sure that no more than one pod is scheduled on a node to make sure your pods are as distributed as possible.

Pod Affinity: Schedule pods closer to already running pods

In my previous post, I talked about node affinity to schedule pods on specific set of nodes. In this post I will talk about another affinity which is pod affinity. So, here is how pod affinity and anti affinity works:

Schedule a pod (or don’t schedule, in case of anti-affinity) on a node X, if pod Y is already running on that node. Here X is a label key of the node (it can have any value) and Y is the label assigned to already running pod.

Don’t panic, if you could not make much sense out of above statement. We will discuss this in detail.

Like nodeaffinity, podaffinity is of 2 types: 

  • requiredDuringSchedulingIgnoredDuringExecution: This setting means that the specified affinity or anti-affinity rule must be met for the pod to be scheduled onto a node. If the criteria cannot be fulfilled, the pod will remain unscheduled, waiting for the conditions to be satisfied. This can be useful for critical applications that require specific placement for performance or regulatory reasons but might lead to unscheduled pods if the cluster does not have the capacity or the right node configuration.
  • preferredDuringSchedulingIgnoredDuringExecution: This setting tells the scheduler that it should try to enforce the rules, but if it can't, the pod can still be scheduled onto a node that doesn't meet the criteria. This approach provides more flexibility and is generally recommended for use in larger or more dynamic environments where strict adherence to the rules may not always be possible or optimal.

The choice between required and preferred rules will depend on the specific needs of your applications and the desired balance between strict scheduling requirements and the flexibility of pod placement.

To understand pod affinity, let’s discuss a scenario from one of the use cases above, where you want to place a web-app pod on the same node where cache pod is already running. (or vice versa depending upon your application)

Here is the yaml definition to achieve the same.

apiVersion: v1
kind: Pod
metadata:
  name: web-app
spec:
  affinity:
    podAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
      - labelSelector:
          matchExpressions:
          - key: name
            operator: In
            values:
            - web-cache
        topologyKey: my-node-label  
  containers:
  - name: web-app
    image: <container-image>

This YAML ensures that the web-app pod is scheduled on a node that already runs a pod labeled web-cache. The topologyKey ensures that the node selection is based on a label key, ensuring pods are placed together efficiently.

Let's talk about some key elements used in above YAML

podAffinity: This contains all the required rules for pod scheduling

topologyKey: This is the label key assigned to the target nodes. Remember, label has key and value and we are talking only about key. This can be any label key (value does not matter) whether set by you or by your cloud vendor.

With above pod definition file, we are deploying a pod containing web application named “we-app” and we want to place this pod on a node which is already running a cache pod named “web-cache”. Under podAffinity, we are telling scheduler to schedule this pod on a node having a label with key “my-node-label” and if a pod with name “web-cache” is already running on this node.

While trying to find suitable node for this pod, scheduler takes care of above requirement and whether pod is scheduled or not depends upon podAffinity type used. In our example, we used podAffinity of type, requiredDuringSchedulingIgnoredDuringExecution. If there is no node which is running a cache pod your web pod will not be scheduled and will be in pending state until it finds a suitable node. So, it is important that you are aware of how your application works and which pods will be scheduled before other pods and accordingly use the appropriate podAffinity type.

Pod Anti Affinity: Avoid scheduling pods on nodes which are already running certain pods

To understand pod anti-affinity, let’s discuss another scenario from one of the use cases above, where we want to make sure no more than one pod of a type is running on a node. Below yaml definition achieves this.

apiVersion: v1
kind: Pod
metadata:
  name: web-app
  labels:
    app: web-app
spec:
  affinity:
    podAntiAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
      - labelSelector:
          matchExpressions:
          - key: app
            operator: In
            values:
            - web-app
        topologyKey: my-node-label  
  containers:
  - name: web-app
    image: <container-image>

The only new element is podAntiAffinity, which is opposite to podAffinity. podAffinity is all about scheduling and podAntiAffinity is about not scheduling.

In above yaml defintion, we are trying to deploy a pod named “web-app” and goal is to make sure no more than one “web-app” pod is running on a node.

Under podAntiAffinity, we are telling scheduler to schedule this pod on a node having a label with key “my-node-label” if and only if a pod with name “web-app” is not running on this node. So, if a pod with same name is running on this node, new pod will be scheduled on some other node.

A word of caution

Though podAffinity and podAntiAffinity let you control placement of pods but this also has a downside. pod affinity and anti-affinity require substantial amount of processing which can slow down scheduling in large clusters significantly. These are not recommended for clusters larger than several hundred nodes.

Million dollar question: why not place containers in same pod

Why should we get into this much complexity to place 2 pods together. Why not place multiple containers in a single pod, after all pod is nothing but a wrapper around container. So, rather than going though all the complexity above, should we just place multiple containers in a pod. Short answer is, it depends. Longer answer, you need to keep following things in mind:

  1. No one can stop you from putting multiple containers in a pod but it all depends on your application requirements. If you think, your business model and application operation mode would benefit from this go ahead and do this.
  2. If due to some problem in one of your container, your pod crashes, you are losing another application also, which might not be at fault at all.
  3. While horizontal scaling, you will be scaling both applications together, even if there was a need to scale only one. That would mean, you will pay for more resources.

Conclusion

Pod affinity and anti-affinity are powerful tools for controlling pod placement in Kubernetes. However, in large clusters, be mindful of the performance impact. Always balance the need for specific pod placement with scalability and performance concerns.

Please comment in comments section if you have question or have any feedback.

How to debug a Java application deployed in Kubernetes cluster with IntelliJ IDEA

Kubernetes has become the de facto standard for deploying and managing containerized applications at scale. However, debugging applications ...