How old are your Kubernetes nodes? Most often, people don’t know the answer to this question, or if they do, they know that “most” of their nodes are a certain age, and some are newer. Knowing the age of your nodes is important, particularly in terms of maintaining a healthy and efficient cluster. In fact, the age of your nodes can impact your cluster’s performance, security, and stability. Let’s look at some of the primary reasons it’s important to both know the age of your nodes and keep them up to date.
As nodes age, they are more likely to be susceptible to security vulnerabilities. Older nodes may not have the latest security patches or updates, leaving them (and your infrastructure) exposed to potential threats. Having outdated nodes can even lead to a false sense of security, where one node group has been updated, but not all, allowing a vulnerability to go unpatched. Another possibility, one I have seen in the wild, is having the needed image or security tool updates applied in code, resulting in repo scanning tools indicating that all vulnerabilities have been addressed. However, a stale node that was not rolled correctly resulted in the vulnerability remaining in the cluster. Regularly updating and replacing all nodes in the cluster, no matter the age, ensures that your cluster remains secure and protected against known vulnerabilities.
Over time, nodes can experience performance degradation due to software inefficiencies, such as memory leaks, disks filling up, or cloud provider hardware management deprioritizing virtual machines (VMs) over a certain age. Older nodes may not perform as well as newer ones, potentially leading to slower response times and an unexplained decrease in application performance.
Kubernetes is an actively developed platform, with frequent updates and releases. Maintaining the latest patch version of Kubernetes becomes easy if you already have a regular node lifecycle policy. So, keep the nodes rolling, benefit from a stable and secure position, and take advantage of the bug fixes and feature releases included in all the latest Kubernetes patch versions automatically.
Node age can impact resource allocation and optimization. Depending on your auto scaling solution, older nodes also may not be as efficient in managing resources as newer ones, potentially resulting in less than optimal usage of CPU, memory, and storage. Keeping track of node age can help you make informed decisions about when to rotate older nodes and keep your workloads properly right-sized.
Have you ever shipped a feature only to discover a nasty surprise only once you perform your next cluster upgrade? Perhaps these issues could be addressed ahead of time during a window when there are less changes in flight and more operational support is available. Not to mention the fact that older nodes are more likely to experience reliability issues as cloud providers are constantly updating their preferred physical location of new VMs. The only way to benefit from this is by keeping them fresh. Regularly replacing your node infrastructure reduces the risk of unexpected downtime and ensures you are running on supported and reliable software and hardware, contributing to the overall stability and reliability of your Kubernetes cluster.
At Fairwinds, we monitor multiple open source release cycles and build in tests and monitoring to ensure the process will progress smoothly. We also automate node and image updates on a rolling basis to ensure our cluster nodes are running the latest stable releases available.
By using advanced tools, such as Ansible, Atlantis, Kured, Terraform, Karpenter, and the Bottlerocket Update Operator, we are able to handle a wide variety of cases where the pressure to update and refresh node config are varied.
For some clients, reliability is key. So having automated tooling that rotates nodes on a regular basis, or on a schedule, and potentially disrupts critical client processes during the workday is not something that is desired. In these instances, Ansible, Terraform, and Karpenter might be the better choices, all with the ability to update node tooling and configuration with light human intervention or GitOps-based processes.
For others, security is paramount. So having a completely hands off tool, such as Kured, Karpenter, or the Bottlerocket Update Operator constantly watch for and apply pending patches, regardless of the time of day, might be the best solution.
Additionally, some of our clients publish their own custom images and do not want to have any intervention required on our part for the image updates to be rolled out, so where they might have a preference to have a GitOps process in place, they perform the testing and PR review prior to application to our managed clusters. We architect our infrastructure to be flexible for all these possibilities and more.
By performing your node upgrades on a regular and rolling basis, you can more easily isolate any issues, trace them back to their origin, and ensure that if there is a problem, the impact will occur only during an acceptable window, avoiding late night calls and unexpected outages when you need them least. Some teams prefer to do overnight node upgrades, but this isn’t really the ideal time. No one wants to be woken up by a page, and even if a page is triggered, the impact might be overlooked in the middle of the night, leading to issues into the early morning when client traffic is higher and more impactful. To avoid these undesired outcomes, keep an eye on your node age and don’t let them get too old. It’ll save you from 2 AM pages and from walking into a P1 on an early weekday morning, before you have finished your coffee.
Looking for help managing your K8s infra? Fairwinds can help. Reach out to learn how we can keep your infrastructure secure, resilient, and cost efficient.