Fairwinds | Blog

Demo: Monitoring Kubernetes Workloads with Astro

Written by Luke Reed | Sep 16, 2020 5:49:22 PM

I am a Site Reliability Engineer at Fairwinds where we provide both Software and Service offerings that help our clients succeed in the world of Kubernetes. In our highest touch offering we will run clusters for you and monitor that infrastructure 24/7. In working with so many clusters and customers with varying needs, it is a necessity to have tools that make us (and our clients) more efficient in our day to day work. A lot of this software we create is open sourced to give back to the open source community that we leverage so often. Astro is one of these open source tools, and that’s the focus today.

First, I want to tell you a little about my monitoring journey throughout my career. 10 years ago, my first position was for a large company using bare metal servers alongside a few VM’s, but the VMs were mostly used for dev and QA. Most of the monitoring that was done was for the bare metal machines. My experience with monitoring and alerts at this point revolved around statically monitoring machines that didn’t change often. We’re talking hundreds of hours of uptime for some of these machines and applications.

The shift from bare metal hosts into the world of containers and Kubernetes requires us to rethink some things around monitoring. In a traditional bare metal environment, the number of machines you need to monitor is usually a known quantity with a finite number of processes on each host. Easy! You set up your monitors once when you’re provisioning the machine and rely on them from that point forward.

When using Kubernetes, development teams are empowered to create and destroy services as necessary which means you need to monitor an unknown number of services. This leaves something to be desired when using manually defined, traditional static monitoring. Depending on SLA and SLO of an application, you will likely want to get alerted if it is misbehaving. On the flipside, you don’t need to keep monitoring a service once it is removed from the cluster. Granted, this may not be the case depending on how you or your organization manage deployments, but regardless it’s easier to allow software to manage this process.

Monitoring Kubernetes Workloads with Astro

Datadog’s service is wonderful and removes a lot of toil from our daily lives as Kubernetes operators. Most things “just work” when it comes to metrics gathering after you install the datadog-agent. We built Astro to complement the Datadog service so that monitoring workloads is automatic in the dynamic world of Kubernetes. Astro is a workload that runs inside your Kubernetes cluster and will create a set of defined monitors based on annotations put on either your workloads, or a namespace that contains multiple workloads. Astro provides three key elements to greatly simplify monitor management:

  1. Automated management of the lifecycle of Datadog monitors for workloads running in Kubernetes: Given configuration parameters, the utility will automatically manage defined monitors for all relevant objects within the Kubernetes cluster. As objects change, monitors are updated to reflect that state.
  2. Correlation between logically bound objects: Astro has the ability to manage monitors for all objects within a given namespace. This ensures greater consistency across monitor configurations.
  3. Templating of values from Kubernetes objects into managed monitors: Any data from a managed Kubernetes object can be inserted into a managed monitor. This makes more informative alerts and can make monitors more context specific.

Watch a Demo of Astro

Transcript

In this demo, I already have Astro running, which was installed with our provided helm chart. We are tailing the logs of Astro in my top pane and also showing the monitors in the Datadog frontend. We have configured Astro in this demo to create a set of monitors for any Deployment with the annotation of astro/owner = astro. This key value combination can be configured to something else if you desire. You can see here that I’ve added this annotation to our nginx deployment. We see a log message that the monitor should be created. If we refresh the Datadog frontend we can see this is true! Now lets remove the annotation and verify that Astro removes the monitor.

One other feature of note is the idea of a binding monitor. In a binding monitor you configure a set of “bound” resource types, such as a Deployment and then you apply the configured annotation to a namespace, which in this case is astro/admin-bound = astro. Once applied to the namespace, any resources that match the binding type within that namespace will get the set of monitors. This gives you flexibility to monitor every Deployment in a given namespace. At this time, Deployment is the only resource type available with the plan to add DaemonSets and StatefulSets.

If we remove the annotation from the namespace, the monitors will go away

The Astro project is still young and growing, and the community is growing with it. We hope to see you in our community slack and maybe even opening a PR to Astro to help us make this project as useful as can be.