☸️ Kubernetes Pod Scheduling
Filtering, Scoring & Binding — How Pods Find Their Home
When you create a Pod, the kube-scheduler decides which node to place it on. The scheduler runs a pipeline: first it filters out nodes that can't run the Pod (not enough resources, wrong taints, affinity mismatches), then it scores the remaining candidates (prefer less loaded nodes, balanced allocation, topology spread), and finally binds the Pod to the highest-scoring node. If no node passes filtering, the Pod stays Pending — or triggers preemption of lower-priority Pods.
🔀 Scheduler Pipeline
The three phases every Pod goes through before it runs.
🖧 Interactive Cluster
Configure a Pod and watch the scheduler filter, score, and place it across 5 nodes.
Pod Configuration
📋 Scheduler Log
🔍 Filtering (Predicates)
Nodes that fail any predicate are eliminated. No exceptions.
📦 Resource Fit
The node must have enough allocatable CPU and memory to satisfy the Pod's
requests. Note: requests reserve capacity, while limits cap usage.
A node can be overcommitted on limits but never on requests.
🏷️ Taints & Tolerations
Nodes can have taints (e.g., dedicated=ml:NoSchedule).
A Pod must have a matching toleration or the node is filtered out.
This keeps specialized nodes reserved for specific workloads.
📍 Node Affinity
requiredDuringScheduling rules act as hard filters — the Pod must
land on a node matching the label selector. preferredDuringScheduling
rules are soft and only affect scoring.
🔄 Pod Anti-Affinity
requiredDuringScheduling anti-affinity prevents co-locating Pods.
For example, two replicas of the same service shouldn't land on the same node for HA.
📊 Scoring (Priorities)
Surviving nodes are scored 0–100 on each priority, then weighted and summed.
⚖️ LeastRequestedPriority
Prefers nodes with the most free resources. Score =
(capacity - used) / capacity × 100. Spreads load across the cluster.
🎯 BalancedResourceAllocation
Prefers nodes where CPU and memory usage are balanced. Avoids nodes that are 90% on CPU but 10% on memory — those waste capacity.
🧲 InterPodAffinity
Scores higher when preferredDuringScheduling affinity rules are satisfied.
Useful for co-locating tightly coupled services (e.g., app + cache).
⚡ Preemption
When no node passes filtering, the scheduler may evict lower-priority Pods.
How It Works
Each Pod has a PriorityClass (0–1,000,000,000). When a high-priority Pod
can't be scheduled, the scheduler finds a node where evicting low-priority Pods would
free enough resources. The evicted Pods get a gracefulTermination period,
then are deleted. The high-priority Pod is then scheduled on that node.
🌐 Topology Spread Constraints
Distribute Pods evenly across failure domains (zones, racks, nodes).
💡 maxSkew
maxSkew defines the maximum difference in Pod count between any two topology
domains. With maxSkew: 1 and 3 zones, scheduling 6 replicas gives 2 per zone.
If a zone already has 3 and another has 1, the scheduler places the next Pod in the zone
with fewer Pods. whenUnsatisfiable: DoNotSchedule makes this a hard constraint;
ScheduleAnyway makes it a soft preference.
📏 Requests vs Limits
📋 Requests
The guaranteed amount of resources. The scheduler uses requests to decide placement. A node's allocatable capacity minus all Pod requests = free capacity. Pods are guaranteed their requested resources.
🚀 Limits
The maximum a Pod can use. If a Pod exceeds its memory limit, it's OOM-killed. CPU limits cause throttling. A node can have total limits > capacity (overcommitted) — this is fine until Pods actually try to use it all simultaneously.
⚠️ QoS Classes
Guaranteed: requests == limits for all containers.
Burstable: requests < limits (or only requests set).
BestEffort: no requests or limits set — first to be evicted under pressure.