For people familiar with Kubernetes, you already know that it has many configurations available, either to scale or to perform better. In the past, most organizations used the cluster-autoscaler from Kubernetes to help them automatically scale EKS cluster node pools based on workload demand. The cluster autoscaler adds nodes when demands are high and reduces them to a minimum size when demand is low, based on minimum and maximum sizes that you specify for the node pool. AWS Karpenter is an open-source cluster autoscaler built by Amazon Web Services (AWS) to help improve both application availability and cluster efficiency based on the application workload in AWS environments.
Cluster autoscaler and Karpenter are similar in how they add nodes based on the need for more capacity. Cluster autoscaler relies on users to make more decisions to dynamically adjust the compute capacity of their clusters. When you work with cluster autoscaler, you need to evaluate what type of capacity you need in different clusters and nodes and decide which types of instances you need to provision. It is not easy to make those decisions initially because you do not know those answers and you need to spend a great deal of time and effort to make those decisions and update them.
As an auto scaler, Karpenter scales up and down easily by giving you flexibility via annotations, node selectors, and other configuration options. One thing to be aware of is that Karpenter can consolidate down to fewer nodes if it sees an opportunity to create a better node configuration based on price, number of nodes, or other considerations. This means that Karpenter may be getting rid of nodes all the time.
You want Karpenter to be able to consolidate your pods, so you do not want to set your annotations such that an app will not be evicted from a node because you are worried the app will crash if you lose pods. If Karpenter detects that you have too many nodes, it is going to start evicting pods to try to move them off a specific node that it determines you do not need anymore. But if you have the do-not-evict-true annotation set, it will block Karpenter from scaling down that node.
You can use an open source tool, such as Polaris, to ensure that you’re following best practices for configuration in Kubernetes and Karpenter to make sure that the settings you apply don’t interfere with Karpenter’s ability to consolidate nodes efficiently.
Using Karpenter, you can indicate that you only want to use a particular node type for an application. That may change the topology of your cluster, because if you have multiple apps running in the cluster and one app requires one node type while the other app requires a different type, Karpenter must spin up both types. If you do that across ten different teams, it may become a problem. Using your provisioner, you can restrict the number of node types that you want to allow, or you can use Polaris to disallow specific node type selection.
If your organization decides to reserve or buy a savings plan from a cloud service provider, you want to make sure that the dollars you're leveraging on a commitment model are being used efficiently. Karpenter can help you make sure that you are using resources well based on decisions the operation team made for your organization. Karpenter is Kubernetes-native, which makes it much easier for your users to:
Define exactly what they wish for within those constraints
Save money
Make their clusters more efficient
You want Karpenter to be efficient and build your cluster with the right mix of nodes. However, because you are still relying on the underlying mechanisms of Kubernetes scheduling to do that, you need to set resource requests and limits. Polaris marks anything that does not have resource requests and limits as a problem that you need to address. When you set limits on CPU counts and memory quantity as a whole, you can cap that cluster at maximum, so you know that it will never exceed a certain cost.
Goldilocks is an open source tool that provides a base level of resource requests and limit recommendations using the Vertical Pod Autoscaler's recommendation engine. It reviews the usage of your pods and makes suggestions for your resource requests and limits settings. This is an excellent way to begin if you are not sure what to set these requests and limits to (or if you are not setting them at all yet). Over time, you can reevaluate them on a regular basis and update your resource requests and limits accordingly.
Karpenter respects your pod disruption budgets when it goes to scale nodes down and when it evicts pods. If you do not have pod disruption budgets for all your apps, you may lose a large number of your pods. This may happen because your service cannot handle the standard level of traffic with that number of pods removed.
You need to be aware of those constraints, which is another thing Polaris helps with. One of the default checks in Polaris is to make sure that a pod disruption budget is attached to your deployments. Pod disruption budgets are a standard best practice for all Kubernetes clusters, and when you are introducing node churn in the form of a slightly more aggressive or more intelligent autoscaler like Karpenter, you need to make sure it is in place.
If your pod is serving traffic and you are scaling pods up and down often, you need to be resilient to node churn. A pod must start up, connect to a database, and perform several other tasks before it is ready to serve traffic. You need liveness and readiness probes in place if you want to be resilient through the node churn introduced by Karpenter. Polaris includes a check for liveness and readiness probes, and even allows you to write policies that require a specific range of resource requests and limits.
Karpenter is only as good as the information you give it. The more information you can give Karpenter and the fewer constraints you provide, the more scalable and flexible Karpenter will be for your organization. Kubernetes itself includes lots of knobs to tweak— and adding Karpenter adds even more knobs. This leads to more potential complexity, so you need a policy or governance solution to manage all those knobs correctly. Polaris collects tried and true Kubernetes best practices that can increase your Karpenter-readiness by helping you set up checks and alerts aligned to Karpenter and thereby take advantage of the cloud by supplying fast and simple compute provisioning for K8s clusters