StatefulSets for beginners

Deployments are the default answer for running applications in Kubernetes. For a stateless API, a web frontend, or a worker that reads from a queue, that default is correct. Pods can come and go. Names change. IP addresses change. As long as something healthy is behind a Service, users usually do not care which Pod handled a request.

Stateful workloads break that assumption.

A database replica cares which data directory is mounted. A Kafka broker may expect a stable host identity so other brokers know how to reach it. A distributed system may require members to start in order so that node 0 initializes before node 1 joins. An application may write local files that must survive Pod restarts but must stay attached to the same logical instance.

If you run those patterns with a plain Deployment and a shared PVC, or with Pods that get random suffix names and no stable network identity, things fail in subtle ways. Data ends up on the wrong disk. Cluster members cannot find each other. Scaling creates duplicate identities instead of new ones.

StatefulSets exist for that class of problem. They are not “Deployments but fancier.” They are a controller with different guarantees.

When Deployments fail for state

A Deployment creates Pods with random suffix names: web-7d4f8c9b6-xk2lm. That is fine when every Pod is interchangeable. It is painful when Pod web-2 must always mean the same replica with the same disk.

Common failure modes with Deployments for stateful apps:

Shared storage by mistake. Two Pods mount the same ReadWriteOnce volume. Only one node can attach it. The second Pod stays pending or fails to mount.

Local identity assumptions. Software expects a hostname like db-2 or a persistent node ID stored on disk. Deployment Pod names change on every recreate, so peer lists and config drift.

Unordered startup. Every replica starts at once. Some clustered databases tolerate that. Many do not. Replica 1 may try to join before replica 0 has initialized storage.

Scale-down data surprises. Deployment scale-down removes Pods without guaranteeing which one disappears. If each replica should keep its own disk, you need per-replica storage binding.

None of this means “never use Deployments.” It means Deployments optimize for replaceability. Stateful apps often need identity and ordering instead.

What a StatefulSet guarantees

A StatefulSet provides:

Stable Pod names. Pods are named <statefulset-name>-0, <statefulset-name>-1, and so on. If app-db-1 is deleted and recreated, the replacement is still app-db-1.

Stable DNS names. With a headless Service, each Pod gets a predictable DNS record such as app-db-1.app-db.default.svc.cluster.local.

Ordered deployment and scaling. By default, Pods are created in order 0, 1, 2 and terminated in reverse order. Updates can roll one Pod at a time.

Stable storage per Pod. With volumeClaimTemplates, each Pod gets its own PersistentVolumeClaim. The claim name includes the Pod identity and survives Pod recreation.

These guarantees cost flexibility. StatefulSets are slower to roll out and harder to reason about than Deployments. Use them when the guarantees matter, not because the app has “some state.”

A minimal StatefulSet

Here is a small example with three replicas:

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: app-db
  namespace: demo
spec:
  serviceName: app-db
  replicas: 3
  selector:
    matchLabels:
      app: app-db
  template:
    metadata:
      labels:
        app: app-db
    spec:
      containers:
        - name: db
          image: my-registry.example/fake-db:1.0.0
          ports:
            - containerPort: 5432
              name: db
          volumeMounts:
            - name: data
              mountPath: /var/lib/db
  volumeClaimTemplates:
    - metadata:
        name: data
      spec:
        accessModes:
          - ReadWriteOnce
        resources:
          requests:
            storage: 10Gi
        storageClassName: standard

Apply and inspect:

kubectl apply -f statefulset.yaml
kubectl get statefulset app-db -n demo
kubectl get pods -n demo -l app=app-db
kubectl get pvc -n demo

You should see Pods app-db-0, app-db-1, and app-db-2 appear in order. You should also see PVCs like data-app-db-0, data-app-db-1, and data-app-db-2.

serviceName must match the name of a headless Service. The StatefulSet controller uses it to generate stable network identity for each Pod.

Stable identity in practice

Stable identity shows up in three places beginners should connect:

Pod name. Other members refer to app-db-0, not a random hash.

DNS. Clients inside the cluster can resolve individual Pods through the headless Service.

PVC name. Storage follows the Pod ordinal. Recreate app-db-0 and it reattaches data-app-db-0.

That last point is why StatefulSets pair naturally with ReadWriteOnce storage. Each replica gets its own disk. Scaling up creates new claims. Scaling down removes Pods but does not automatically delete PVCs. Kubernetes avoids deleting data just because replica count changed.

Check DNS from another Pod:

kubectl run -it --rm dns-test --image=busybox:1.36 --restart=Never -- \
  nslookup app-db-1.app-db.demo.svc.cluster.local

If that lookup fails, the headless Service is missing or misnamed before you debug the database itself.

Headless Services

A normal ClusterIP Service load-balances across Pods. A headless Service (clusterIP: None) does not provide a virtual IP for load balancing. Instead, it publishes DNS A or AAAA records for the backing Pods.

apiVersion: v1
kind: Service
metadata:
  name: app-db
  namespace: demo
spec:
  clusterIP: None
  selector:
    app: app-db
  ports:
    - port: 5432
      name: db

Clients that need “any healthy replica” may still use a regular Service or connect through an application-level router. Clients that need a specific replica use the Pod DNS name.

For many clustered products, both patterns appear:

headless Service for peer discovery and stable member addresses
regular Service for client traffic to the current primary or any ready replica

Do not skip the headless Service because “the app already has a Service.” If serviceName on the StatefulSet does not point to a headless Service, stable per-Pod DNS will not work as intended.

Ordered rollout and updates

StatefulSets create Pods sequentially. Pod 1 is not created until Pod 0 is Running and Ready. That reduces race conditions during first bootstrap.

Scale up:

kubectl scale statefulset app-db --replicas=4 -n demo
kubectl get pods -n demo -l app=app-db -w

Watch app-db-3 wait until earlier ordinals are ready.

Scale down removes highest ordinals first. If you scale from 4 to 2, Pods 3 and 2 disappear before 1 and 0. That is safer when higher ordinals are expansions rather than the founding members.

Updates use a rolling strategy controlled by updateStrategy:

spec:
  updateStrategy:
    type: RollingUpdate
    rollingUpdate:
      partition: 0

With partition: 0, all Pods update. With partition: 2, only Pods with ordinal greater than or equal to 2 update. That is useful for canary-style testing on the highest replica before rolling the rest.

Inspect rollout status:

kubectl rollout status statefulset/app-db -n demo
kubectl get pods -n demo -l app=app-db
kubectl describe statefulset app-db -n demo

If Pod 1 stays pending while Pod 0 is fine, read events on Pod 1 first. Ordered startup means one stuck lower ordinal blocks everything above it.

volumeClaimTemplates

volumeClaimTemplates are like PVC manifests embedded in the StatefulSet. Kubernetes creates one claim per Pod from the template.

Important behaviors:

Each claim has a unique name derived from the template name and Pod ordinal.

Recreating a Pod reuses the same claim.

Deleting a StatefulSet does not automatically delete the claims unless you manage that separately.

Scaling down leaves unused PVCs behind by design.

Inspect storage:

kubectl get pvc -n demo
kubectl describe pvc data-app-db-0 -n demo
kubectl get pod app-db-0 -n demo -o yaml | grep -A10 volumeMounts

Choose access modes and storage class deliberately. Most StatefulSet per-replica disks use ReadWriteOnce. ReadWriteMany is possible only when the storage class and application support shared filesystem semantics.

Common misconception: “I can mount one PVC in a Deployment and scale to three.” With typical block storage, that fails. StatefulSets exist partly because each ordinal needs its own claim.

StatefulSet vs Deployment

Use a Deployment when:

Pods are interchangeable
no per-Pod persistent identity is required
you want fast rollouts and simple scaling
clients reach the app through a Service that load-balances to any ready Pod

Use a StatefulSet when:

each replica needs stable network identity
each replica needs its own persistent volume
startup order matters
the software expects predictable hostnames or ordinals

Use something else when:

you run a managed database operator that hides cluster formation
your app stores state only in external object storage or a remote database
you need elastic scale with no identity at all

Many teams run state in managed services and keep Deployments for the app tier. That is valid. StatefulSets are for when the clustered software runs inside the cluster and expects Kubernetes to preserve identity.

Debugging StatefulSet problems

When a StatefulSet misbehaves, I check identity, order, and storage before blaming the application image.

Step 1: StatefulSet status and events

kubectl get statefulset app-db -n demo
kubectl describe statefulset app-db -n demo
kubectl get pods -n demo -l app=app-db

Look for how many replicas are ready versus desired. A common pattern is 2/3 because one ordinal is stuck.

Step 2: The lowest not-ready Pod

kubectl describe pod app-db-1 -n demo
kubectl logs app-db-1 -n demo
kubectl logs app-db-1 -n demo --previous

Because of ordering, fixing the lowest broken ordinal often unblocks the rest.

Step 3: PVC and mount status

kubectl get pvc -n demo
kubectl describe pvc data-app-db-1 -n demo
kubectl get events -n demo --sort-by=.lastTimestamp

Pending PVCs, attach errors, and wrong storage class names show up here.

Step 4: Headless Service and DNS

kubectl get svc app-db -n demo
kubectl describe svc app-db -n demo

Confirm ClusterIP is None and selectors match Pod labels. Peer discovery failures often trace back to this layer.

Step 5: Update strategy and partition

If only some Pods updated, check updateStrategy.rollingUpdate.partition. A non-zero partition leaves lower ordinals on an old spec on purpose.

Practical cautions

StatefulSets make data durable, which also makes mistakes durable.

Scaling down does not delete PVCs. Old disks cost money and can confuse a future scale-up if the app expects empty disks.

Image updates roll one Pod at a time by default, but application-level readiness still matters. A Pod can be Ready from Kubernetes’ point of view while cluster membership is broken.

Backups remain your responsibility. Persistent volumes survive Pod restarts. They do not replace backup and restore testing.

For local labs, StatefulSets are excellent learning tools. For production clustered databases, many teams prefer operators or managed services unless there is a strong reason to run the full cluster themselves.

Final thought

StatefulSets are Kubernetes’ answer to replaceable Pods being the wrong abstraction.

Deployments optimize for “keep N copies running.” StatefulSets optimize for “keep N identifiable members running, each with its own name, DNS, and disk, started in a predictable order.”

If the app treats every instance as anonymous, stay with Deployments. If the app cares whether it is node 0 or node 2, if peers discover each other by stable hostname, or if each replica must keep its own volume across restarts, a StatefulSet is the right controller to learn next.

Once stable identity, headless Services, and volumeClaimTemplates fit together in your mental model, StatefulSets stop feeling like mysterious YAML and start reading like infrastructure with explicit guarantees.