If you're using Kubernetes in production, it's critical that you're validating the configuration for each of your workloads. The smallest changes or omissions can lead to downtime, cost overruns, or worse, a security breach. So what do you need to be looking for when it comes to Kubernetes configuration validation?
Specifically, you should be checking for, at minimum:
But it's not enough to just look for these things when doing code review — you should be checking them automatically at every step in the deployment lifecycle. Without a thorough program for validating Kubernetes configuration, mistakes are bound to fall through the cracks.
To keep your cluster healthy, you'll need to ensure configuration is checked:
If your company has implemented a mature DevOps program, you're storing all your configuration as Infrastructure as Code (IaC). So every change to your infrastructure is (ideally) tracked in Git, and goes through a code review process.
Code review is a great way to catch high-level issues, like a change that won't actually accomplish the business goal, or that will introduce a subtle security issue. But there are lots of common issues that are tedious to look for and easy to miss, like a misconfigured security setting or a missing health probe.
Having automated validation in CI is a great way to supplement code review and ensure that new changes adhere to a consistent level of quality. By automating the most rote tasks on a code reviewer's plate, you give them space to dig into the logic and think deeply about its impact.
Once a pull request (PR) has been approved and the tests are passing, you should be safe to deploy. But what if the deployment does something subtly different from what was tested in CI? Or worse, what if someone has circumvented the review process, either by force-pushing to Git, or by interacting directly with your Kubernetes cluster?
Adding an Admission Controller is an important safeguard to keep your cluster healthy. It works like a bouncer at the front door of your cluster — anything that doesn't adhere to your policies won't get in.
Often organizations will configure their Admission Controller to be less strict than CI, so that exceptions to non-critical rules can be made in the review process. Instead, they'll only enforce the most high-severity policies in the Admission Controller.
In a perfect world, issues would always be caught earlier in the development cycle, before they ever get sent to the Kubernetes API. But even the tightest DevOps programs have holes that an Admission Controller can plug.
So your workload passed the CI process, and the Admission Controller let it through. Your app is running in production and everyone is happy. You should be done at this point, right?
Not quite. There are a few cases where a previously healthy deployment can start to rot:
Scanning resources that are already in your Kubernetes environment is an important step in ensuring the long-term health of your cluster. It is the closest you'll get to a true sense of your cluster's health, security posture, and policy compliance.
Putting these sorts of guardrails in place might seem like a chore, or a risk to productivity. We all know the pain of having our work rejected by an overzealous CI system or security policy.
But it's much easier to move fast when you know you won't break things. Changing resources in your Kubernetes cluster is scary, and simple mistakes can lead to angry users, lost revenue, and reputational damage. The right guardrails will help you ship confidently and sleep peacefully.
If you're looking for help implementing policy and best practices throughout the Kubernetes lifecycle, get in touch! Fairwinds Insights can help you run the best open source validation tools for Kubernetes in CI, Admission Control, and Live Scanning.
Fairwinds Insights is available to use for free. You can sign up here.