When running a production Kubernetes cluster, one of the most important (and burdensome) tasks is keeping everything up-to-date. Kubernetes itself cuts quarterly releases, which quickly fall out of support—if you fail to upgrade, you could be left open to security holes and buggy features. Fortunately, the quarterly cadence at least makes these releases predictable: you can even set a reminder to go through the process every three months.
Source: https://gravitational.com/blog/kubernetes-release-cycle/
But on top of Kubernetes, you probably have half a dozen add-ons running in your cluster, most likely installed via Helm. Supplementary components like nginx-ingress, cert-manager and linkerd are often critical to running a production-grade cluster. You might even be using k8s-specific deployments for common applications like PostgreSQL or Wordpress. Each of these tools has its own release cadence, and some updates may come with critical security patches.
Installing a new patch typically isn’t that hard - usually it just means rerunning a helm install,
command, or updating a single line in your Infrastructure-as-Code repository. But how do you know when it’s time to update? Unlike Kubernetes itself, Helm chart updates are incredibly hard to monitor and predict.
Enter Nova, an open source tool by Fairwinds.
Nova is a command-line interface for cross-checking the Helm charts running in your cluster with the latest versions available. Click to Tweet
Nova is a command-line interface for cross-checking the Helm charts running in your cluster with the latest versions available. Nova will let you know if you’re running a chart that’s out-of-date or deprecated, so you can make sure you’re always aware of updates.
Release Name Installed Latest Old Deprecated
cert-manager v0.11.0 v0.15.2 true false
insights-agent 0.21.1 0.21.1 false false
grafana 2.1.3 3.1.1 true false
metrics-server 2.8.8 2.11.1 true false
nginx-ingress 1.25.0 1.40.3 true false
This makes it easy to see how far behind you are on any installed Helm charts. In the example above, we’ve got the latest version of insights-agent,
but there are minor updates available for some core infrastructure, including cert-manager,
nginx-ingress,
and metrics-server.
grafana
also has a new major version available, so we might be missing out on some cool new features!
This turned out to be a harder problem than expected.
The first issue was in supporting both Helm 2 and Helm 3. Helm 2 stored chart metadata using ConfigMaps, which were fairly easy to parse. But Helm 3 uses Secrets, along with some byzantine encoding practices. With some help from the Pluto team, we built support for both versions, as well as the ability to auto-detect charts from both versions. You can use this feature by specifying --helm-version=auto.
Even more challenging was matching the installed chart to the upstream source. Unfortunately, Helm doesn’t store info about the upstream repository in the chart metadata, making an exact match impossible. All we have to go on is the name, which might be duplicated across different repositories. For example, many of the Bitnami charts were recently migrated out of the core charts repo, but kept the same names.
To find a match, we have to look in all known repositories for charts with a matching name, then use heuristics like the chart’s version, home, description,
and maintainers
to find a good match. So far this strategy has a 100% success rate, but it feels like a bit of a hack. If you know of a better solution for finding the upstream repository for an installed Helm chart, let us know!
Nova saves a lot of time and effort for anyone looking to keep their charts up-to-date. Instead of having to monitor each individual chart for updates, operators can now run a single CLI to detect old versions running in their cluster. But you still have to remember to run the CLI!
When managing hundreds of clusters, the chances of forgetting to run Nova on a particular cluster, misreading the output, or failing to act on the results, are all but certain at scale. Furthermore, some organizations need to be on an older version of a given tool (in particular, cert-manager
has regularly introduced breaking changes). So we needed a way to operationalize Nova at scale.
The easiest way to do this is using Fairwinds Insights. Insights can run Nova (along with other Kubernetes auditing tools) automatically on a regular schedule—every hour by default—and use the results to create Action Items. These Action Items serve as a reminder to update your Helm Chart, and can be piped to Slack, Datadog, or anywhere else your engineers live.
Interested in using Fairwinds Insights? It’s available for free! Learn more here.
Internally, we also maintain a YAML file that includes a set of standards - particular versions of each add-on that we expect our clients to run. We use Nova to make sure that YAML file stays up-to-date with the latest versions, and to keep our Infrastructure-as-Code in line with those standards.
Automatically checking Helm charts for updates solves a big problem for us at Fairwinds, and for the Kubernetes community as a whole. But the ecosystem moves fast, and there are many more pieces of software and infrastructure that we need to stay on top of. Other things we’d like to build automated checks for are:
Whether these checks fit into Nova, or will justify creating a separate project, remains to be seen.
In any domain, staying on top of the latest releases is a big, scary problem. We hope Nova helps the Kubernetes community keep their clusters reliable and secure.