We’re all familiar with the most common threat vector – a hacker or other bad actor gains access to your cloud infrastructure to exploit system vulnerabilities. A less talked about threat vector is the purposeful or accidental deletion of data in house. Either way, the threat’s sure to become a reality some time or another. So how can you be prepared when that time comes?
To ensure high availability and disaster recovery capabilities that improve manageability and enable business continuity, you first need to identify what you want to protect – static infrastructure, dynamic infrastructure, persistent data, and dependencies – then create a disaster recovery plan that covers critical infrastructure and data.
Static infrastructure includes pieces of infrastructure that don’t change very often, including networking infrastructure, firewall and VPN appliances, as well as platform state files. For example, in the Kubernetes world static infrastructure might include cluster configuration and master/minion user data.
Tools like Terraform or AWS CloudFormation can be used for static infrastructure configuration management (CM). Make sure to hold regular code reviews and engineering reviews, as well as keep a version change history. Consider applying your static infrastructure configuration to multiple regions on your cloud platform, which gives you a secondary disaster recovery site without much extra cost.
Dynamic infrastructure includes pieces of the application stack that may change over your application’s lifecycle, such as server configuration, security groups, launch configurations, autoscaling groups, and load balancers.
Ansible is one commonly used DevOps tool for automating application deployment and orchestrating infrastructure changes. Regardless what tool you use, you need to track all dynamic infrastructure changes. CM, versioning, and engineering reviews are critical, as is the ability to completely rebuild the application stack if it’s destroyed or compromised in any way.
Say a bad actor gains access to your infrastructure. If you consistently maintain and monitor a single source of truth for your security policies, instance configuration, and resource provisioning, your application stack can be up and running quickly in a different region.
Persistent data is your company’s lifeblood. It’s where user information and application configuration and logic are stored. Most cloud storage resources allow you to regularly snapshot your data for backup purposes. To prevent loss, ensure snapshots are mirrored to a secondary region or account. Some tools to consider:
Consider maintaining a separate cloud account used only for backups. For example, create one AWS account strictly for backups, set it to “write once, delete never” to maintain a permanent backup history, sync persistent data on a regular basis, and limit the number of users who can access that data. Then if something happens to your main AWS account, you’ll still have your backup account.
Application and infrastructure-level dependencies exist within your workloads. If something happens to these dependencies, you might not be able to rebuild your infrastructure after a disaster. Some tools to consider:
How can you make sure the cloud doesn’t bring you down? By keeping close tabs on what information has changed, who changed it, and when.
What’s most important is that you understand your infrastructure backup needs, then put the right custom backup and recovery strategy in place for your business.