GitOps — trust, but verify

Reagan’s phrase was about treaties. In ops it fits GitOps better than most vendor slides admit: yes, Git is the contract — and yes, you still look at the cluster before you sleep.

Terminal with code on a monitor

Photo by Mikhail Nilov on Pexels

GitOps promises reconciliation: declare desired state in Git, let a controller make reality match, detect drift, audit through commits. Argo CD became the face of that story for many teams. I use it. I like parts of it. I also watch it auto-sync something I would have paused if a human still held the apply button.

This post is not anti-GitOps. It is pro-clarity: when declarative sync helps, when drift is a symptom not a sin, and when forcing everything through Git creates new failure modes.

What GitOps is good at

Audit trail. Who changed what, when, with review comments. Beats SSH history and tribal memory.

Repeatable environments. Same manifests, different clusters — if your overlays are disciplined.

Recovery narrative. “Revert commit and sync” is a rollback story executives understand.

Reduced kubectl heroics. Fewer one-off applies from laptops — when people actually stop applying from laptops.

Visibility in one place. Argo CD’s UI showing OutOfSync resources helped me onboard faster than grep-ing twelve repos.

For platform teams with many services, these wins are real. I am not arguing we should go back to snowflake clusters maintained by three people with root and a prayer.

What “trust but verify” means in practice

Trust: merged main reflects intent; the controller should converge safely.

Verify: before and after sync, a human or automated check confirms reality matches expectations and customer impact is acceptable.

Verification is not optional because:

Git does not know your cluster’s history. Manual hotfixes, incident patches, and failed partial applies leave landmines.

Controllers do not understand business context. Healthy sync can deploy a broken image if the image tag in Git is wrong but syntactically valid.

Prune and sync options delete things. Verify means reading what will be pruned, not only what will be added.

Dependencies exist outside Git. DNS, certificates, external databases, feature flags, quota — Git is not the whole system.

My minimal verify habit:

Read the diff in Git.
Read the diff in Argo CD (or argocd app diff) including hooks and prune.
Watch error rate and saturation during rollout window.
Confirm no unexpected OutOfSync resources after sync — drift might be telling you something.

Trust the pipeline. Verify the outcome.

Drift is not always the enemy

GitOps culture sometimes treats drift like moral failure. Any manual kubectl is heresy; OutOfSync badges shame teams.

Reality is messier.

Incident hotfix. Production was down. Someone scaled manually or patched a ConfigMap. Git caught up later — or did not yet. The cluster was right to diverge temporarily.

Controller-generated fields. Mutating webhooks, defaults, HPAs adjusting replicas — live state differs from manifest without malice.

Secrets and external operators. SealedSecrets, External Secrets, cloud IAM — what Git stores is not what runs.

Exploration in non-prod. Engineers learn by poking. Strict auto-sync in dev can slow learning if every experiment needs a PR.

Drift is a signal. Ask why before you sync it away.

Questions I ask when Argo shows OutOfSync:

Did someone patch during an incident? Document and backport or revert live.
Is the diff harmless metadata?
Is the live state actually correct and Git wrong? Fix Git, not only sync.
Will sync cause downtime (Pod restart, CRD replace, ingress swap)?

Blind sync is how I watched a Service type flip and drop external traffic because Git had an old value someone “fixed” in the cluster weeks ago.

When GitOps helps most

Patterns where I advocate strongly:

Multi-cluster standardization. Platform baseline in Git; overlays per env. Drift detection finds teams who pinned exceptions.

Regulated change control. PR review plus merge gates plus sync audit satisfies auditors better than shared kubeconfig.

Frequent deploys with small diffs. Controllers roll out incrementally; Git history correlates with incidents.

Teams already living in PR culture. GitOps meets them where they work.

Disaster recovery drills. Reapply from Git to a fresh cluster — if you actually test it quarterly.

In these cases GitOps reduces variance. Variance is where overnight pages come from.

When GitOps hurts

Not every pain is “you implemented it wrong.” Some is structural.

Stateful systems with complex migrations. One-way schema changes do not reconcile cleanly. You need runbooks, not only revert commits.

Heavy Helm or Kustomize indirection. Reviewers cannot see rendered YAML without tooling. Bad changes hide in templates.

Auto-sync on production without guardrails. Speed without pause windows removes the human verify beat.

Org boundaries mismatch repos. Three teams, one cluster, one monorepo — merge contention and blame games.

Secrets in Git temptation. Even encrypted, rotation and leak response get harder. Verify includes secret hygiene, not only kubectl.

Platform not ready. No CI render tests, no policy checks (OPA, Kyverno), no staging cluster — GitOps amplifies whatever you already did poorly.

Learning environments. Mandatory GitOps for students exploring Pods adds friction with little safety gain.

I have seen GitOps adopted because leadership read a blog, not because the team struggled with the problems GitOps solves. That order creates resentment and workarounds — secret kubectl, duplicate manifests, “emergency” namespaces outside Argo.

Argo CD specifics I have learned the hard way

Argo is excellent and opinionated. A few practical notes without pretending to be exhaustive docs.

Application vs AppProject boundaries. Misconfigured projects leak cluster-admin paths. Review AppProject as seriously as Application.

Sync waves and hooks. Jobs run PreSync; resources delete on hooks policy. Read the sync plan, not only the Git diff.

Replace vs apply. Some resources need replace; accidental replace deletes and recreates. Exciting for StatefulSets.

Prune propagation. --prune last sync removed a Namespace someone thought was out of scope. Prune settings belong in review checklist.

Health assessment lag. Argo says Healthy; your app is not. Use metrics and probes, not only UI green.

Multi-source apps. Powerful; diffs harder. Slow down.

CLI and RBAC. Who can argocd app sync production? If answer is “everyone with SSO,” verify is weakened.

Ignore differences. ignoreDifferences fixes noise until it hides real problems. Revisit periodically.

None of this is argument against Argo. It is argument for treating sync like a production change — because it is one.

Reconciliation loop mental model

Useful picture:

Git (desired) --> Controller --> Cluster (actual)
                     ^                |
                     |                v
                     +---- drift ----+

The loop runs continuously. Your job is not only to push Git. Your job is to ensure the loop’s assumptions hold: correct context, compatible live state, safe sync options, observability during convergence.

When the loop fights you — sync thrashing, repeated OutOfSync — stop syncing and diagnose. Thrashing is often a sign of competing controllers (HPA vs Deployment replicas in Git), webhook mutations, or someone still kubectl-ing the same object.

Pairing GitOps with imperative habits

Hybrid is not failure. Mature teams I respect:

GitOps for app deploys and platform baseline.
Runbooks for break-glass kubectl with mandatory follow-up PR.
Drift tickets instead of drift shame.
Freeze windows during high-risk business periods — sync paused, comms clear.

“Everything through Git” sounds pure. Purity breaks at 3 a.m. when the fix is one field and the PR pipeline takes forty minutes. Better: honest break-glass with audit than secret kubectl culture.

After break-glass, verify twice: cluster state and Git alignment. Permanent drift is debt.

Metrics that matter beyond sync status

Argo metrics are necessary, not sufficient.

Deployment success rate — rollouts completed vs failed.

MTTR after bad merge — how fast revert plus sync restores service.

OutOfSync age — chronic drift indicates process gap.

Manual sync count — high auto-sync failure rate means templates or cluster preconditions are wrong.

Incident correlation with Git merges — if every page follows a platform PR, review depth not tool brand is the issue.

Green sync icon is not SLO.

Pragmatic middle path:

GitOps on staging first. Learn diff, hooks, prune pain without customer blast radius.
Manual or semi-auto sync on prod until review and render tooling mature.
Required helm template / kustomize build in CI with artifact attached to PR.
Drift report weekly — OutOfSync list with owners, not only auto-heal.
Break-glass doc — who may kubectl, how to backport, SLA to fix Git.
Verify checklist on prod sync — same cross-check mindset as kubectl apply posts I write elsewhere.

No silver bullet. Iteration.

Personal stance

I trust GitOps enough to use it daily. I verify enough to sleep on call nights after platform merges. I do not sell it as morality. I sell it as a tool with boundaries.

Aviation parallel, once, without poster vibes: flight plans are filed and followed — and you still look out the window. Instruments lie occasionally. Controllers on the ground miss your revised clearance. Trust the plan, verify against reality.

Git is the plan. The cluster is reality. Argo CD is the autopilot coupling them — useful, not infallible.

Closing

GitOps helps when variance, audit, and repeatability were your problems. GitOps hurts when it is adopted as religion, auto-sync removes human judgment at the wrong time, or drift is treated as crime instead of conversation.

Trust your pipeline. Verify your outcomes. Fix Git when live was right. Fix live when Git was right. Teach your team the difference.

If you run Argo CD today, pick one production Application and walk through the last sync: what would have broken if prune had been on? What is OutOfSync right now and why? Five minutes of verify beats another hour of trust me.

GitOps — trust, but verify

What GitOps is good at

What “trust but verify” means in practice

Drift is not always the enemy

When GitOps helps most

When GitOps hurts

Argo CD specifics I have learned the hard way

Reconciliation loop mental model

Pairing GitOps with imperative habits

Metrics that matter beyond sync status

Personal stance

Closing

See also

Related posts

Cross-checking config before you apply

OpenShift day-two operations for application teams

What GitOps is good at

What “trust but verify” means in practice

Drift is not always the enemy

When GitOps helps most

When GitOps hurts

Argo CD specifics I have learned the hard way

Reconciliation loop mental model

Pairing GitOps with imperative habits

Metrics that matter beyond sync status

What I recommend teams try

Personal stance

Closing

See also

Related posts

Cross-checking config before you apply

OpenShift day-two operations for application teams