Resource requests, limits and scheduling basics in Kubernetes

Resource requests and limits look like small YAML fields. They are not small in effect. They influence where Pods land, whether a cluster autoscaler adds nodes, which workloads get evicted under pressure, and whether a container slows down or dies when it uses too much.

For beginners, the confusing part is that the same resources block participates in two different stories. One story happens before the Pod runs: the scheduler decides whether a node has room. The other happens after the container runs: the kubelet and the operating system enforce boundaries.

If those stories are mixed together, Kubernetes feels arbitrary. A Pod is Pending even though the dashboard shows free CPU. A container is OOMKilled even though the node still has memory. A service gets slow, but no Pod restarts. All three can make sense once requests, limits, and scheduling are separated.

The resources block

A typical container resource configuration looks like this:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: api
spec:
  replicas: 2
  selector:
    matchLabels:
      app: api
  template:
    metadata:
      labels:
        app: api
    spec:
      containers:
        - name: api
          image: nginx:1.27
          resources:
            requests:
              cpu: "250m"
              memory: "256Mi"
            limits:
              cpu: "500m"
              memory: "512Mi"

250m CPU means 250 millicores, or one quarter of a CPU core. 500m means half a core. 1 means one full core. CPU in Kubernetes is compressible: if a container wants more CPU than it can get, it may run slower.

256Mi memory means 256 mebibytes. Memory is not compressible in the same forgiving way. If a process needs memory and cannot get it, something has to give. With a container memory limit, that often means the process is killed and Kubernetes reports OOMKilled.

This difference is the first important mental model: CPU pressure usually causes waiting and throttling. Memory pressure can cause termination.

Requests: what the scheduler believes

A request is the amount of resource Kubernetes reserves for scheduling. It is not a prediction engine. The scheduler does not watch your application for a week and then decide where to put it. It reads the Pod spec.

If a container requests 250m CPU and 256Mi memory, the scheduler looks for a node with at least that much unallocated requested capacity, plus all other constraints. For a Pod with multiple containers, the requests are added together. Init containers have special rules, but the beginner version is: Kubernetes needs enough room for what the Pod declares.

Check node capacity with:

kubectl describe node <node-name>

Near the bottom, Kubernetes shows allocated resources. This is often more useful than a live CPU graph when debugging scheduling:

Allocated resources:
  Resource           Requests      Limits
  cpu                1750m (43%)   4200m (105%)
  memory             6Gi (50%)     10Gi (83%)

The scheduler mostly cares about requests. A node can show low current CPU usage and still reject a new Pod because requested CPU is already full. That is not Kubernetes being stubborn. It is Kubernetes honoring the promises already written into other manifests.

This is also why bad requests create bad cluster behavior. Requests that are too low make the cluster look emptier than it is. Requests that are too high make it look full too early.

Limits: what happens at runtime

A limit is a boundary applied while the container is running.

CPU limits are enforced through CPU quotas. If the container tries to use more CPU than the limit allows, it is throttled. It does not usually die. It just waits more. For web services, that can mean higher latency without an obvious restart event.

Memory limits are enforced more brutally. If the process exceeds the memory limit, the kernel can kill it. Kubernetes then shows a terminated container with reason OOMKilled.

Useful checks:

kubectl describe pod <pod-name>
kubectl get pod <pod-name> -o jsonpath='{.status.containerStatuses[*].lastState}'
kubectl top pod <pod-name>

kubectl top requires metrics-server or a similar metrics pipeline. It shows current usage, not the historical shape of the workload. It is a good flashlight, not a sizing strategy by itself.

When a Pod restarts, look at the previous logs:

kubectl logs <pod-name> --previous

If the last state says OOMKilled, do not immediately assume the application is broken. It may be leaking memory. It may also have been given an unrealistic limit.

Quality of Service classes

Kubernetes assigns a QoS class to Pods based on resource settings.

Guaranteed means every container has CPU and memory requests and limits, and each request equals its matching limit.

Burstable means at least one request or limit is set, but the Pod does not meet the strict Guaranteed rules.

BestEffort means no CPU or memory requests or limits are set.

Check it with:

kubectl get pod <pod-name> -o jsonpath='{.status.qosClass}{"\n"}'

QoS matters during node pressure. If a node runs short on memory or disk, Kubernetes has to decide what to evict. BestEffort Pods are easiest to evict. Burstable Pods are in the middle. Guaranteed Pods get the strongest protection, although they are not immortal.

Beginners sometimes hear “Guaranteed is best” and set every request equal to every limit. That can be right for some workloads, but it is not free. Equal CPU request and limit can cause avoidable throttling. Equal memory request and limit can leave no room for brief spikes. The right setting depends on the workload and the node environment, not on the name of the QoS class.

How scheduling really works

The scheduler does more than CPU and memory math. It filters nodes, scores the remaining nodes, and binds the Pod to one of them.

The filter step asks: Which nodes are even possible?

Resources matter here. So do node selectors, node affinity, taints and tolerations, topology spread constraints, volume zone rules, and policy. A Pod may have enough CPU available somewhere, but still be unschedulable because it requires a label no node has.

The score step asks: Among possible nodes, which one is best?

Kubernetes may prefer more balanced resource usage, respect spreading rules, or follow other scheduler plugins. You usually do not need to know every scoring detail as a beginner. You do need to know that “the node with the most free CPU” is not the whole algorithm.

When scheduling fails, the most important command is:

kubectl describe pod <pod-name>

Look for events like:

Warning  FailedScheduling  default-scheduler  0/3 nodes are available: 2 Insufficient memory, 1 node(s) had untolerated taint.

This message is gold. It tells you the scheduler’s reason. Do not skip it and guess.

A simple Pending Pod example

Imagine a three-node cluster. Each node has 2 CPU cores allocatable. Existing Pods already request 1800m CPU on each node. Current CPU usage is low because the services are idle.

Now you deploy a Pod that requests 500m CPU.

The dashboard may show plenty of live CPU. The scheduler still rejects the Pod because each node only has about 200m of requested CPU left. The new Pod needs 500m. Result: Pending.

The fix is not always “lower the request.” The request might be honest. Maybe the cluster needs another node. Maybe the workload belongs in a different node pool. Maybe old workloads have inflated requests. The event only tells you the immediate scheduling blocker. Human judgment still has to decide whether capacity, configuration, or workload sizing is wrong.

Useful commands:

kubectl get pod <pod-name> -o wide
kubectl describe pod <pod-name>
kubectl describe node <node-name>
kubectl get events --sort-by=.lastTimestamp

If a cluster autoscaler is installed, a pending Pod with reasonable requests may trigger a new node. If the request is impossible for any node type, the autoscaler may not help. A Pod requesting 64Gi memory will not fit a node group whose largest node has 32Gi allocatable.

Namespaces, quotas, and defaults

Production clusters often add guardrails. Two common objects are ResourceQuota and LimitRange.

A ResourceQuota can limit total CPU, memory, or object counts in a namespace. A team may have free nodes in the cluster but still be blocked because its namespace quota is full.

kubectl get resourcequota
kubectl describe resourcequota <quota-name>

A LimitRange can set defaults or minimum and maximum values for container requests and limits.

kubectl get limitrange
kubectl describe limitrange <limit-range-name>

This explains another beginner surprise: “I did not set a limit, but my Pod has one.” A LimitRange may have defaulted it. That can be helpful, but it can also create invisible behavior if teams do not know the defaults.

When a Pod is rejected before it even becomes Pending, check:

kubectl describe replicaset <replicaset-name>
kubectl get events --sort-by=.lastTimestamp

Admission failures often show up as events on the controller or namespace.

Practical starting points

There is no universal perfect number, but there are reasonable habits.

For CPU, start with a request that reflects normal sustained usage plus some margin. Be careful with CPU limits on latency-sensitive services. A tight CPU limit may make the service slower under load without creating an obvious crash.

For memory, start from observed working-set memory and peak behavior. Leave headroom for runtime overhead, caches, and spikes. A memory limit that is barely above idle usage is an invitation to OOMKilled.

For batch jobs, be honest about peak resource needs. A job that requests too little may run next to too many neighbors and make the node unhealthy. A job that requests too much may sit Pending forever.

For sidecars, include them. Service meshes, log agents, and secret agents consume resources too. The Pod’s total request includes all containers.

For Java, Node.js, and similar runtimes, understand the runtime’s memory behavior. Container limits and heap settings should agree. If the JVM thinks it can use more memory than the container limit, the kernel will eventually correct that misunderstanding.

What to check during troubleshooting

For a Pending Pod:

kubectl describe pod <pod-name>
kubectl get nodes
kubectl describe node <node-name>
kubectl get resourcequota
kubectl get limitrange

Read the scheduling event. Look for Insufficient cpu, Insufficient memory, taints, node affinity, or volume binding messages.

For an OOMKilled container:

kubectl describe pod <pod-name>
kubectl logs <pod-name> --previous
kubectl top pod <pod-name>

Compare the memory limit with observed usage and application behavior. If possible, look at historical metrics, not only the current value.

For suspected CPU throttling:

kubectl top pod <pod-name>
kubectl describe pod <pod-name>

Kubernetes does not always make throttling obvious through basic commands. In production, Prometheus metrics such as container CPU throttling seconds are better. Still, a tight CPU limit next to latency spikes is a useful clue.

A small lab exercise

Create a Deployment with a tiny memory limit and run something that allocates memory. Watch it restart. Then raise the limit and observe the difference. Create another Pod with an intentionally huge CPU request and watch it stay Pending in a small cluster. Read the events before changing the YAML.

The point is not to memorize failure messages. The point is to build the habit of asking which phase failed.

Did admission reject the Pod because of quota or policy?

Did scheduling fail because no node matched the request and constraints?

Did runtime enforcement kill or throttle the container?

Those are different problems with different fixes. Once you separate them, resources.requests and resources.limits stop being mysterious YAML decorations. They become the language your workload uses to negotiate with the cluster.

That negotiation should be honest. Not perfect, not overconfident, but honest enough that the scheduler can place Pods sensibly and the kubelet can protect the node when reality gets noisy.