Quick Links

Obviously, nobody plans for downtime. But problems are inevitable, and if you don't have a plan in place to deal with them immediately and automatically, you're going to lose revenue when your services go down. High Availability will help you plan for the worst-case scenarios.

What Is High Availability?

High Availability (HA) is the practice of minimizing all server downtime, ideally down to zero. It incorporates many techniques, such as auto-scaling, real-time monitoring, and automated blue/green update deployments.

The core concept is pretty simple---one server is no server. Two servers are one server. The more redundancy you plan for, the more highly available your service will be. Your service should not experience interruptions even in the event of one of your components going up in flames.

This can be achieved with something as simple as an auto-scaling group, which cloud services like AWS support very well. If a server has a problem, such as a sudden crash, the load balancer will detect it as not responding. It can then divert traffic away from the crashed server to the other servers in the cluster, even spinning up a new instance if it needs the capacity.

This redundant philosophy applies to all levels of your component hierarchy. If you have a microservice to handle the image processing of user-uploaded media, for example, it wouldn't be a great idea to just run that in the background on one of your machines. If that machine has problems, users might not be able to upload, which counts as partial downtime of your service and can be frustrating for the end user.

Sometimes, you need to guarantee availability to clients. If you guarantee a 99.999% availability in a service-level agreement (SLA), that means that your service can't be down for more than five minutes a year. This makes HA necessary from the get-go for many large companies.

For example, services like AWS S3 come with SLAs guaranteeing 99.9999999% (nine 9s) of data redundancy. This basically means that all of your data is replicated across regions, making it safe from everything except the giant-meteor-impacting-your-data-warehouse scenario. Even then, with physical separation, it might be safe from small meteors, or at the very least, safe from the much more realistic warehouse fire or power outage situation.

Components of Good HA Systems

What leads to downtime? Barring acts of god, downtime is usually caused by human error or random failure.

Random failures can't really be planned for, but they can be planned around with redundant systems. They can also be caught while they happen with good monitoring systems that can alert you of problems in your network.

Human error can be planned for. First and foremost, by minimizing the amount of errors with careful test environments. But everyone makes mistakes, even big companies, so you must have a plan in place for when mistakes happen.

Auto-Scaling & Redundancy

Auto-scaling is the process of automatically scaling the number of servers that you have, usually during the day, to meet peak load, but also under situations of high stress.

One of the primary ways that services go down is the "hug of death," when thousands of users all flock to the site en masse, or traffic spikes in some other way. Without auto-scaling, you're screwed, as you can't spin up any more servers and must wait until the load subsides or manually spin up a new instance to meet demand.

Auto-scaling means that you'll never really have to deal with this issue (though you'll need to pay for the extra server time you need). This is part of the reason why services like serverless databases and AWS Lambda Functions are so great: They scale extremely well out of the box.

However, it goes beyond just auto-scaling your primary servers---if you have other components or services in your network, those must be able to scale as well. For example, you may need to spin up additional web servers to meet traffic needs, but if your database server is overwhelmed, you're gonna have a problem as well.

If you'd like to learn more, you can read our article on getting started with AWS auto-scaling.

24/7 Monitoring

Monitoring involves tracking logs and metrics on your services in real time. Doing this automatically with automatic alarms can alert you about problems in your network while they're happening rather than after they affect users.

For example, you could set an alarm to go off when your server hits 90% memory usage, which could indicate a memory leak or a problem with an application being overloaded.

Then you could configure this alarm to tell your auto-scaling group to add another instance or to replace the current instance with a new one.

Automated Blue/Green Updates

The most common scenario for errors is a botched update, when your code changes and breaks an unforeseen part of your application. This can be planned for with blue/green deployments.

A blue/green deployment is a slow, gradual process that deploys your code changes in stages rather than all at once. For example, imagine that you have 10 servers running the same bit of software behind a load balancer.

A regular deployment might simply update all of them immediately when new changes are pushed, or at least update them one at a time to prevent downtime.

A blue/green deployment would fire up an 11th server in your auto-scaling group instead, and install the new code changes. Then, once it was "green," or accepting requests and ready to go, it would immediately replace one of the existing "blue" servers in your group. Then you'd rinse and repeat for each server in the cluster. Even if you only had one server, this method of updating would result in no downtime.

Better yet, you can immediately revert the changes back to the blue servers if problems are detected with your monitoring systems and alarms. This means that even a completely botched update will not take down your service for more than a few minutes, ideally not at all if you have multiple servers and are able to deploy the update slowly. Blue/green deployments can be configured to only update 10% of your servers every five minutes, for example, slowly rolling out the update over the hour.