Most organizations start out with Kubernetes by piloting it with a single application. Once they’ve gone through a successful pilot and embraced Kubernetes, companies may build dozens of clusters to support dozens of teams. For a mid to large type company deploying multiple applications using Kubernetes, that also means that development and operations teams are also adopting it, frequently in a self-service model. When you have many users across many different clusters building and deploying, it becomes challenging to ensure applications are deployed consistently, securely, and with proper resource requirements.
Configuration drift refers to an environment in which running clusters in an infrastructure become increasingly different over time, usually due to manual changes and updates on individual clusters. Setting up a consistent, automated cluster provisioning process helps ensure that clusters are consistent when created, but do not prevent changes from happening either on that cluster or on the other clusters. Changes to configuration parameters might be done by the dev team, the ops team, or the DevOps team.
When you start operating a large number of clusters that have been manually deployed and inconsistently configured, you’ll almost certainly have discrepancies in your configurations across your containers and clusters. That makes it quite difficult to identify inconsistencies and correct them; significant negative consequences related to configuration drift include:
Trying to manually track configuration drift and fix misconfigurations is extremely error prone, and will quickly lead to operations teams spending too much time on trying to track down issues.
To keep track of basic metrics, most companies need to have tooling in place so that they have visibility into what version of Kubernetes they are running, as well as the versions of the critical system software that powers Kubernetes, such as ingress controllers, certificate management, DNS, and so on. It's important to be able to find and see all of the software version information because it helps your organization to keep all of your software upgraded to the latest stable versions, which helps you to avoid technical debt. You don't want to be running an old version of Kubernetes, particularly because the older versions of Kubernetes and the add ons that you're running may be insecure, increasing your risk of cyber attack.
Configuration drift can also lead to a lot of inconsistency, which may not seem so bad, but it will have a significant impact on your upgrade process. When clusters are inconsistently configured, it will make running Kubernetes more expensive over time, because it means that you will need to research each upgrade path separately from the others. That can add to a lot of time to your upgrade process and lead to considerable waste of time and operations resources. When you are able to have a consistent infrastructure, it means that you can research your upgrade and patching scenario once and apply it uniformly across multiple environments.
Larger companies are starting to consider multi-cloud scenarios, which enables them to take advantage of the benefits of different cloud providers. This isn’t problematic in terms of Kubernetes, because Kubernetes is available on multiple clouds. The benefit of Kubernetes is that it provides a consistent API for running infrastructure across all of those clouds. The challenge comes when you’re trying to consistently apply policy and get information around the state of your clusters from these different cloud providers in a single location. It’s extremely challenging for DevOps teams to manually manage and get insight into configuration drift across multiple clouds and clusters.
One of the important capabilities of Fairwinds Insights is that it supports the multi-cloud use case. That helps engineering and DevOps teams to manage multi-user, multi-cluster, and multi-cloud environments more effectively and efficiently, because it enables the multi-cloud deployment so many organizations are considering, without losing the ability to manage configuration globally. To maintain consistency in deployments, even when you deploy across multiple clouds, it’s important to have all of the security, efficiency, and reliability information rolled up into a single location so the operations and security teams can manage all configurations from a single view and reduce the risks inherent in configuration drift.