When we talk about Kubernetes, we often jump straight to YAML, Helm charts and kubectl apply. But the more time I spend with real systems, the more I feel that the deployment strategy matters just as much as the manifests themselves.

Rolling, Blue-Green and Canary are not just different words for “zero-downtime deploy” — they are different ways of dealing with risk, cost and uncertainty.

What a Deployment actually does

A Kubernetes Deployment manages ReplicaSets, makes sure the right number of Pods is running and supports declarative updates.

With strategy.type, maxUnavailable and maxSurge you control how aggressively Kubernetes replaces old Pods with new ones.

In other words: the Deployment object is the engine, but the strategy is how you choose to drive it.

Most teams default to Rolling updates, even when the change they are shipping has very different risk characteristics.

Rolling deployments — the default

With RollingUpdate, Kubernetes gradually replaces old Pods with new ones while trying to keep the service available.

It works well when you ship small, compatible changes and you have decent tests and probes in place.

But there are two gotchas I see repeatedly:

  • You get a mixed state where some users hit the old version and some hit the new one, which makes debugging harder.
  • Any backwards-incompatible change (DB schema, contracts, assumptions) can break either the old or the new Pods while both are live.

For many everyday releases Rolling is still my default, but only if we design changes to be compatible and we keep maxUnavailable and maxSurge conservative for critical services.

Blue-Green — two production worlds

Blue-Green means you run two production environments side by side: Blue is live, Green gets the new version.

You deploy, test and verify on Green, and when you are confident, you flip the traffic over — and if things go wrong, you flip back to Blue in seconds.

This is powerful when:

  • you have big, risky releases
  • or only a few core services that absolutely must not fail.

The trade-off is obvious: higher infrastructure cost and more complexity around state and databases.

Because of that, I see Blue-Green as something you reserve for truly high-impact changes, not for every small feature.

Canary — a few users first

Canary deployments send only a small fraction of traffic (or a specific user segment) to the new version first.

Canary deployments in Kubernetes: small traffic share to the new version, metrics on the control plane, gradual rollout steps, and rollback when errors spike

Start with a small share on the canary, watch error rate and latency, ramp up gradually — or hit rollback when it goes wrong. Scroll the image horizontally if needed.

You start with 1–5%, watch metrics and error rates, and gradually ramp up if things look good.

This shines when:

  • you deploy frequently
  • you have good observability (metrics, logs, traces, SLOs)
  • and you can define clear thresholds for “this is fine” vs. “roll back now”.

The downside is that Canary is not “just a Deployment spec” — you usually need a service mesh or smart ingress, plus solid monitoring and automation to really benefit from it.

A simple way to choose

The way I currently think about it looks roughly like this:

Situation Risk Infra budget Strategy I'd pick
Small, well-tested change, low budget medium low Rolling
Large, high-risk release on a critical service high high Blue-Green
Frequent releases on a large user base with good metrics high medium Canary

It’s not a perfect rule, but it forces a conscious decision instead of “we always do Rolling”.

Bringing the pilot mindset in

From flying I learned that you don’t use the same checklist for every phase of flight — takeoff, cruise and landing have different procedures and different risk profiles.

I think deployment strategies are similar: Rolling, Blue-Green and Canary are just different checklists for different kinds of changes.

For me, the most important shift is to treat the choice of strategy as a deliberate decision, not an afterthought hidden in YAML. If we discuss risk first and then choose the strategy, production incidents get a lot less exciting.