Kubernetes governance is the set of policies and procedures organizations adopt to define how Kubernetes is managed and maintained, and it is an essential part of how enterprises become production-ready at scale. Kubernetes governance includes management of Kubernetes resources, scheduling, upgrades, and role-based access control. It also includes the process for making decisions about Kubernetes, such as how to manage security issues, bug fixes, and feature requests.
Kubernetes offers a portable open source platform that enables organizations to manage containerized workloads and services. It increases the ability for companies of all sizes to deliver cloud-native applications that are scalable and extensible as well as the ability for development teams to take on Kubernetes service ownership. Using Kubernetes, internal teams can develop systems and deploy applications and services more easily. While Kubernetes offers many important benefits, it also introduces new technologies and different complexities for development, operations, and security teams. Lack of a governance framework can result in issues as the size and scope of Kubernetes deployment grows. Managing a single Kubernetes cluster is a not-insignificant learning curve, but once an organization begins managing multiple clusters, particularly on hybrid or multiple clouds across different cloud providers, it becomes a much greater challenge. Some of the challenges include:
Lack of visibility into Kubernetes cluster activity and growth
Managing and troubleshooting multiple software versions across the organization
Defining user roles, responsibilities, and privileges becomes difficult to track across multiple teams and environments
Identifying role violations, performing compliance checks, and assessing governance risks is time-consuming
Managing policies and procedures is difficult to enforce across teams
Kubernetes governance ensures that your organization is creating processes, clarifying tasks, and setting priorities to implement and run Kubernetes successfully. Kubernetes governance initiatives help you ensure that this complex environment is meeting your organization’s policy requirements, adhering to best practices, and meeting regulatory requirements in your industry.
A Kubernetes governance framework offers organizations many benefits, including helping to ensure that Kubernetes environments are secure, compliant, and optimized. It helps organizations adhere to best practices and comply with corporate policies and regulations. It also creates an environment where teams can easily collaborate and manage their workloads. This helps to ensure that the environment is stable, secure, and efficient.
Running Kubernetes infrastructure at scale nearly guarantees a wide range of components and configurations, as well as multiple teams within your organization that have slightly divergent infrastructure needs. Kubernetes offers many capabilities for spinning up and configuring components, using command line and user interfaces. Multiple people and teams adjusting individual components to meet those requirements inevitably results in a lack of consistency that makes Kubernetes even more difficult to manage.
Configuration drift is when running Kubernetes clusters in an infrastructure become increasingly different over time, typically because of manual changes and updates on individual clusters. Ultimately, this difference is between what’s stored in git and what’s running in production. Sometimes development teams may “copy and paste” example configuration from other teams so they can get up and running quickly. However, this may also cause teams to inherit bad practices. Not only does configuration drift mean differences in underlying configuration, but it could mean different standards applied to workloads.
If you set up a process that ensures consistent, automated provisioning, your clusters will be consistent when created. Unfortunately, changes can happen after provisioning on that cluster or on other clusters due to updates to configuration parameters by the development team, the operations team, or the DevOps team. When your organization is operating a lot of clusters that were manually deployed and are now inconsistently configured, there will inevitably be discrepancies in your configurations across your Docker containers and Kubernetes clusters. These discrepancies can be difficult to identify and correct. A few negative consequences that relate to configuration drift include:
Security vulnerabilities: Misconfiguration may result in privilege escalation, vulnerable images, images from untrusted repositories, or containers running as root or privileged
Inefficient resource utilization: Costs may slowly increase when workloads are over-provisioned or stale workloads are not reviewed and corrected.
Downtime and reliability risks: Not scaling applications or services enough or scaling too frequently may cause downtime or reduce reliability.
Manually tracking configuration drift and fixing misconfigurations is a challenging, error prone, and ongoing task. It increases costs for the operations team as it takes considerable time to track down and resolve issues. Configuration drift can result in more inconsistency, which will negatively impact the upgrade process because each upgrade path will require unique research to make sure everything will still work after an upgrade. That adds time to the process and wastes both time and operational resources. A consistent infrastructure can help you research your upgrade and patching scenario once and then apply it across multiple environments.
In the Kubernetes and software development sphere, service ownership is defined as the way in which development teams take responsibility for supporting the products and services they deliver throughout the service life cycle. This service ownership model gives development teams far more control over how their software and services run in production environments. It also allows operations teams to focus far more on maintaining and improving core infrastructure rather than tracking down bugs and trying to optimize applications.
In a self-service model, DevOps and infrastructure leaders allow many users to develop and deploy across many different Kubernetes clusters. As more teams deploy Kubernetes to production environments, it becomes increasingly difficult for DevOps teams to manually write or review each Dockerfile and Kubernetes manifest being deployed to their clusters. Creating a set of Kubernetes guardrails and enforcing them automatically allows developers to self-service and prevents the accidental introduction of security risks, inefficient cloud resource usage, and application performance problems.
Using these guardrails and putting those policies in place at the platform level enables consistency to be applied across the organization, reassuring developers that they will not unintentionally deploy applications or services that could put their company — and maybe even their job — at risk. The development team also is empowered to build apps and services and spin up clusters without worrying about breaking something else. In this way, they can focus more on writing and deploying rather than worrying about making mistakes.
Service owners take responsibility for developing, shipping, and owning their services, which is a lot more accountability than they were able to have in the past. They must now ensure that the application is reliable and performs well, deliver new features, patch bugs and vulnerabilities in their code and resolve misconfigurations in their Docker containers, maintain documentation, and more. The right open source tooling, typically available on GitHub, can support these service owners as they take on new responsibilities and ensure more cost-effective and secure applications and services.
Service ownership also allows the owners to ship applications faster by enabling self-service and empowering developers to develop, test, and deploy faster than ever. When DevOps teams establish policies around who takes service ownership of distinct parts of an application and have sufficient visibility into those owners, they can monitor what is happening in a Kubernetes cluster and have insight into who is patching which vulnerability, where, and when, rather than wondering who is responsible for taking that action.
A Kubernetes governance model improves Kubernetes security by establishing a strategy to ensure greater visibility into the Kubernetes environment, better understanding of the misconfigurations that could cause security and compliance risks, and reduced time required for vulnerability management. There are several important aspects of Kubernetes security that must be incorporated into any Kubernetes governance model.
Many Kubernetes workload settings are actually "insecure by default" — they grant applications permission to do things it may or may not need. The trade-off here has meant developers can get apps up and running quickly, but without security best practices. For example, by default each container is mounted with a writeable root filesystem. This can offer an attacker the permissions necessary to replace system binaries or modify configurations. Kubernetes does offer a number of built-in security features, such as Kubernetes role-based access control (RBAC), network policies, and admission controllers. Kubernetes governance must take these default configurations into account and include a security-first mindset for Kubernetes deployments. Reviewing security policies is a crucial step for any organization adopting Kubernetes.
In DevOps, the shift-left approach takes tasks that formerly happened later in the software development lifecycle (SDLC) and moves them “left” into earlier stages. Shifting testing earlier in the SDLC identifies design or code issues early in the development process, speeding up development and release cycles. In Kubernetes, applications and services deploy far more rapidly than in traditional environments, so adopting tooling that enables and enforces a shift-left approach is critical to deploying apps and services as quickly and securely as possible.
Cloud-native technologies, such as containers and Kubernetes, present new challenges in compliance. When deploying in containerized workloads, development teams must shift both security and compliance as far left as possible, starting in the design and development phase. It also requires visibility into and control over the apps and services from development through to production. Putting guardrails in place that map to SOC 2 controls, HIPAA, ISO27001, and other regulations can help organizations gain the visibility into compliance status necessary to meet an increasingly compliance-driven landscape.
The following five principles provide an excellent starting point for building your Kubernetes governance model:
Aligned with business objectives — Kubernetes strategy should be an integral part of the overall business and IT strategy. All Kubernetes systems and policies must be shown to support business goals.
Collaboration — there must be clear agreements between Kubernetes infrastructure owners and other stakeholders to ensure that Kubernetes is being used appropriately and effectively.
Change management — all changes to Kubernetes environments must be implemented consistently and according to Kubernetes best practices, aligning with the appropriate organizational controls.
Dynamic response—Kubernetes governance should rely on monitoring, tooling, and policy automation to effectively manage the Kubernetes environment.
Policies and standards compliance —Kubernetes usage standards must align with relevant regulations and compliance standards used within your organization and by others in your industry.
Kubernetes governance is the process of setting policies, standards, and procedures that function as guardrails to ensure the security, compliance, and cost-effective use of Kubernetes within an organization.
To begin the design and implementation of a Kubernetes governance project, get started by:
Creating a Kubernetes Governance Team: Assemble a team of experts who understand Kubernetes and the organization’s needs, including members from the development, operations, security, and compliance teams.
Establishing Policies and Standards: Establish policies and standards for the use of Kubernetes, particularly security, compliance, and cost-effectiveness.
Developing Processes and Guidelines: Develop processes and guidelines to verify that the governance policies and standards have been successfully implemented, including approval and communication processes.
Monitoring and Enforcing Policies: Monitor and enforce the policies and standards to make certain that they are being followed; implementing tooling and automation to enforce Kubernetes guardrails reduces the challenges inherent in complex Kubernetes environments.
Reviewing and Updating: Review and update the policies, standards, and processes regularly to confirm that they still meet the organization’s needs as they change over time.
Kubernetes enables organizations to increase the usability and productivity of containers and build cloud-native applications. To maximize the benefits of a Kubernetes implementation, it is essential to follow these five Kubernetes best practices:
Security configurations are not set by default in Kubernetes; these are settings that your security team must establish and then enforce, ideally through automation and robust policies.
Cost optimization requires you to set resource requests and limits on workloads to maximize infrastructure utilization while ensuring optimal application performance.
Reliability is a complex undertaking in Kubernetes, but IaC makes it easier by reducing human error, increasing consistency, improving auditability, and simplifying disaster recovery.
Policy enforcement is critical once Kubernetes deployment increases beyond a single application. Enforcing policy through tools and automation helps prevent common misconfigurations, enables IT compliance, and promotes a service ownership model because users know that guardrails are in place to enforce the policies. The Open Policy Agent (OPA) is an open source tool for cloud native environments that offers policy-based controls.
Monitoring and alerting help ensure that your infrastructure and applications are running. This requires tools to optimize monitoring, identifying what needs to be monitored and why, and discovering misconfigurations prior to deployment.
In general, organizations may deploy both cluster-wide and namespace-specific (or application-specific) policies. Cluster-wide policies tend to apply to all workloads and may cover security, efficiency, and reliability categories. These are general rules-of-thumb, and exceptions may be granted on an instance-by-instance basis between the Platform and Security teams.
Below are a few examples:
Memory requests should be set
CPU requests should be set
Liveness probes should be set
Readiness probes should be set
Image pull policy should be “Always”
Container should not have dangerous capabilities
Namespace-specific policies are used to enforce standards for specific app teams or services where an increased level of security, efficiency, or reliability is needed. For example, you may use namespaces to create different ‘tenants’ within a shared Kubernetes cluster for teams to apply to. You want these teams to adhere to a common set of best practices that avoid disruption to other cluster tenants, such as resource exhaustion or security violations.
Some examples may include:
Container should not have insecure capabilities
Host IPC should not be configured
Host PID should not be configured
Privilege escalation should not be allowed
Should not be running as privileged
Image tag should be specified.
Namespace quotas should be set
The enforcement of these policies can happen in multiple stages. Ultimately, the “enforcement” function of Kubernetes governance is all about delivering feedback to engineers in the tools they use, at the time they need it.
If you have an existing cluster running production workloads, you may first choose to do a baseline audit of your cluster against these cluster-wide and namespace policies. This will help with bringing visibility to issues – enabling greater service ownership.
Next, you may then integrate these policies in both the CI/CD phase (using shift-left tooling) and at the time of deployment (using admission controllers). You may choose to have these policies “warn” users, but not actually block pipelines or deployment – helping bring more visibility earlier in the process, and catch issues before they are released.
Finally, the last phase of rolling out policy enforcement is actual enforcement. This is when policies are “active,” and blocking pipelines or deployments when issues are found. This step in the process usually comes when teams are comfortable with the types of configurations that need to be fixed.
Ideally, organizations migrating containers to Kubernetes for the first time should integrate enforcement from the start. This raises the bar on Kubernetes governance, and can enable teams to migrate workloads faster by ensuring workloads run correctly from the start.
Kubernetes spend increases in direct proportion to the number of clusters, where apps and services are deployed, and how they are configured. Reporting the costs of multi tenant clusters and allocating costs to the correct owners can be challenging. Kubernetes deploys transient workloads that rely on both shared and external resources. Usage is measured in seconds, and tracking Kubernetes resources used by a workload can be difficult. As the Kubernetes environment grows, so does the number of clusters and nodes. Spending must be monitored to avoid wasting Kubernetes resources; therefore, platform engineering teams must be able to allocate and showback costs in a business-relevant context and create feedback loops with engineering to enable a culture of service ownership.
Mapping costs to a Kubernetes component, such as a namespace or label, helps to properly allocate costs to individual business units. The Kubernetes Vertical Pod Autoscaler (VPA) uses the historical memory and CPU usage of workloads together with the current usage of pods to generate recommendations for resource requests and limits. Node labels in Kubernetes allow you to label your nodes. You can configure pods to use specific “nodeSelectors” that match specific node labels, which determine which nodes a pod can be scheduled onto. Using instance groups of different instance types that are appropriately labeled allows you to match the underlying hardware available from your chosen public cloud or private cloud provider choice with your Kubernetes workloads. Without resource labeling, it is difficult to allocate costs appropriately and make informed decisions on costs, optimizations, and cloud spend.
Cost avoidance, per the FinOps Foundation, means to reduce usage and optimize costs to get a better cloud rate. Most of the actions required for cost avoidance are dependent on engineers. Platform engineering teams can avoid costs and increase optimization using Kubernetes by shipping applications more quickly, lowering costs by optimizing cloud usage, and reducing risks by implementing Kubernetes security features. Kubernetes governance can avoid costs and increase optimization by automatically enforcing policies built with security and cost in mind.
Infrastructure as code (IaC) enables the use of a configuration language to provision and manage infrastructure. This applies the repeatability, transparency, and testing of modern software development to infrastructure management. The primary goal of IaC is to reduce error and configuration drift, which allows engineers to focus on higher value tasks. Making use of IaC provides significant benefits for Kubernetes users, such as reduced human error, repeatability and consistency, improved change tracking, and disaster recovery. Infrastructure as code scanning is the ability to scan IaC files against a set of policies and Kubernetes best practices, which helps an organization ensure alignment with Kubernetes governance goals, such as application security, reliability, and cost.
Containers enable development teams to build, package, and deploy applications and services to diverse environments and deployment targets, therefore securing and protecting the integrity of containers is critical to organizations building and deploying in cloud-native environments. Containers are built on a base image, which is the starting point for Linux containers. A vulnerability in your base image will exist in every container that contains that base image. Finding a trusted source for your base image, staying up to date with patches, controlling permissions, and limiting the use of additional images that are not required for deployment to production can help increase the security of your container images. It is still critical to scan images and detect misconfigurations because Common Vulnerabilities and Exposures (CVEs) can be introduced at any time. Scanning containers and all their components to identify security vulnerabilities is a critical component of container security, and therefore an important aspect of Kubernetes governance.
Over time, Kubernetes deprecates different versions of APIs to reduce maintenance on older APIs and ensure that organizations are using the more secure, up-to-date versions. If your application or service includes deprecated or removed API versions, it is important to find and update them to the latest stable version.
Add-ons extend Kubernetes functionality. Like other software, add-ons require upgrades, but first you must verify whether the latest version is compatible with your cluster. This process can be quite time-consuming, particularly for teams that are managing more than a few clusters. Monitoring for updates and upgrading them can be slow and difficult unless there is a tool in place to detect add-on changes automatically, but not doing so can negatively impact service.
In organizations with multiple Kubernetes clusters, teams of developers, and a service ownership model in place, it is essential to have defined policies and an automated method of enforcing those policies. Having guardrails and policies in place offers development, security, operations, and leadership teams peace of mind, because automating the enforcement of policies and guardrails prevents last minute changes that could break something in production, allow a data breach, or enable configurations that do not scale properly. Kubernetes policy enforcement enables automation, monitoring, and enforcement of guardrails and best practices for Kubernetes.
The shift to cloud-native technologies, such as Docker containers and Kubernetes, creates new challenges related to compliance. Most organizations need to comply with security and privacy regulations, including SOC 2, HIPAA, ISO27001, GDPR, PIPEDA, and many more. Using defined Kubernetes compliance requirements policies and enforcing them across all clusters automatically is essential, as is the ability to automate compliance analysis to ensure that it meets changing requirements.
Traditional governance models slow teams down and function as a blocker to effectively delivering on goals. Kubernetes governance offers a new model aligned with cloud native computing strategy. Instead of increasing bureaucracy, organizations can apply guardrails automatically to implement and enforce policies. By creating processes, clarifying tasks, and setting priorities at the outset, you can create policies that help you implement and run Kubernetes successfully. Most importantly, Kubernetes governance will help you maximize your investment in the platform while meeting your organization’s policy requirements, adhere to best practices, and meet relevant regulatory requirements without slowing down the development and deployment of applications and services.