It seems like this should be easy to know where to start, but then you find out there are some significant hurdles in Kubernetes adoption, and even knowing how to explain Kubernetes and where to begin can feel overwhelming.
Here are five things to consider before diving in with both feet.
1. Learn What You Don’t Know
Kubernetes is one of the largest open source projects in history—the ecosystem is vast, the complexities of the project itself are incredible, there are add-ons, third party contributions, and managed services. Which pieces are essential? Which pieces are cruft? And who do you even listen to to find out?
In a 2018 CNCF survey, 40% of respondents found Complexity to be one of the biggest challenges to using and deploying containers. Kubernetes can make things easier. But if you don’t know where to get started, it can also add incredible complexity.
Similarly, The New Stack found that 78% of companies will attempt Kubernetes adoption on their own, and, consequently, 38% of respondents indicated this has taken longer than originally expected. You need to budget carefully, train your team or hire outside talent, and make sure your people have enough knowledge and comfort to also implement a sane on-call rotation. Some solutions, like Fairwinds ClusterOps, offer Advisory services to help augment you with sound architectural decisions along the way.
2. Build and Maintain a Production Grade Stack
Once you’ve navigated the initial hurdles, how will you build a stack ready for production? Just getting Kubernetes up and running is different than having something hardened for production. There are several choices here and all of it depends on where you choose to deploy.
If you choose to deploy to the public cloud, the three main cloud providers all have a managed Kubernetes service offering. These services focus on providing you with a highly available control plane, meaning that you do not need to manage the underlying nodes or compute that Kubernetes runs on. The managed Kubernetes services will also give you an intuitive “point-and-click” UI for creating clusters. Unfortunately each service handles patches, upgrades, and even autoscaling differently. Make sure you have a plan for when that first CVE hits, or that spike in traffic you’re expecting on Black Friday.
Next, what add-ons are you planning on implementing? Which add-ons make your infrastructure more production worthy and which open you up to new and more complex security problems? Most add-ons are open source, but they still require significant time investment and internal PoCs before pushing to production.
Invest in training. Invest in outside help. Invest in a third party to audit what you’ve built and make sure you have the right systems in place, or it will come back and bite you. Kubernetes is an incredible tool, but it can also be dangerous to navigate without seasoned veterans.
3. Don’t Forget Infrastructure as Code
Between GKE on GCP, EKS on AWS, and AKS on Azure, or even Rancher for elsewhere—you can log in to a GUI somewhere and click some buttons and get a Kubernetes cluster. But if your production infrastructure was built with point-and-clicks, then you don’t have a repeatable model for building and modifying it in the future.
In a worst case scenario, you’re going to want every one of your settings and configuration decisions documented as code to quickly stand back up everything that has fallen over. This is why infrastructure as code is a great foundation for your disaster recovery strategy.
Infrastructure as code is also handy for SaaS companies or e-commerce companies. Business opportunities like international expansion or white-labeling may require services to run in the local country for reasons of performance or data sovereignty. Having your entire infrastructure defined as code allows you to spin up an exact replica of your environment with relative ease.
Infrastructure as code is not just convenient for certain use cases, it’s also best practice in operations. Infrastructure as code gives you consistency, version controlling, testability, transparency, and auditability, and it makes your infrastructure reproducible.
4. Implement 24/7 Operations and Monitoring
It may at times feel like your infrastructure is tangential to your business, but when it goes down and your clients or customers no longer have access to anything, you realize how business critical it really is. Implementing quality monitoring has a significant impact on your revenue and your engineering team.
Monitoring well is hard, and your team needs to wake up when things break so they can fix them. Unfortunately just handing someone a pager isn’t sufficient. Quality monitors need to be put in place so you actually know when an outage is occurring or about to occur, and your team needs to be well enough equipped to be able to respond when something goes wrong.
Downtime will directly impact your company’s revenue.
A well built infrastructure can respond in an automated way to many of the common problems in past operations. Kubernetes has many of these fail-safe mechanisms built in, but you need them configured properly.
Engineering morale and retention is dramatically affected by poor monitoring.
Poor monitoring will directly impact your engineering happiness. Be sure to build a reasonable on-call schedule that allows engineers to take vacation without constant stress over unexpected production incidents.
5. Validate your Deployments
If you’ve made it to a working production-grade infrastructure entirely documented and implemented as code, and it’s well monitored and your team is handling the on-call rotation…congratulations, you’re now 90% of the way there!
From here, your engineering team is more empowered than ever to work with Kubernetes and you have CI/CD in place. Now it’s time to validate each of the workloads being deployed to your cluster so you can be sure you stay aligned with best practices. These best practices can be dictated from an ops team, or they can be encoded and enforced with a tool like Fairwinds Polaris.
Now your developers can directly see the health of the workloads they’re deploying before they reach the cluster. And your ops team can see where workloads are being configured insecurely or unreliably, and how these issues can be fixed.
Conclusion
With a bit of outside help, some training, and slowing down to consider the things that will stop you up if you get them wrong, you can get to Kubernetes Zen. If you decide you want to hire someone to get you there faster, Fairwinds has a variety of offerings that accelerate Kubernetes adoption (shortening the time to weeks instead of months or longer).