Quick Links

Auto scaling is very simple in concept---when your servers start to become overloaded with traffic, AWS's auto-scaling systems will spin up new servers to help meet demands. This can help you both cut costs and scale quickly.

Auto Scaling Saves You Money

Auto scaling allows you to scale up to meet traffic needs, but also fixes an issue with traditional server hosting; you must build your servers around peak load, but that server may remain mostly idle during non-peak hours. You'll still be paying that server's hourly price, however, even if you aren't using it. This is bad for your wallet, and also bad for AWS, as they could be selling that extra capacity to someone else.

Say your application requires 16 vCPU worth of power during peak load. You could accomplish this with a

        c5.4xlarge
    

 instance, which costs around $500 per month. You can get it for around $200 effectively per month if you buy reserved instances upfront with 3-year contracts, but you'll still be paying full price for an instance designed around your peak capacity. And if your needs change within your contract period, you'll be stuck with that instance until the contract is up.

But if your application load changes throughout the day, auto scaling can help optimize costs. You could instead use multiple

        c5.xlarge
    

 instances with 4 vCPUs, and spin up new ones when you need to meet demands. With EC2 Spot Instances, you can also have your auto-caling group purchase spare compute capacity at huge discounts.

AWS has multiple auto-scaling services for different products; you can auto scale Aurora and DynamoDB read replicas, and auto scale Amazon's Elastic Container Service (ECS). For this article, we'll discuss EC2 Auto Scaling, as it's what you'll likely want to scale anyway.

Building Your Infrastructure Around Automation

To make auto scaling work, you must automate your server's whole lifecycle. The process of creating a server, installing all the dependencies your app needs to run, installing your code, running your code at startup---everything must be handled properly for auto scaling to make sense.

There are two easy ways to do this, and both have different use cases.

The first is Amazon Machine Images, or AMIs. Your EC2 server is probably already running on an AMI, such as Amazon Linux 2. But the AMI is more than an OS; AMIs are images that contain the OS, programs, user data, and configuration, all in one image. You can create your own custom AMI that contains all of your programs (such as Nginx, WordPress, PHP, etc.) and the associated configuration, and spin up a carbon copy of your existing server.

This method is very useful if you're simply reaching the limits of a single server and want to scale up, or if you simply want to cut costs by scaling your servers throughout the day. The main issue is that version management is a pain; you'll have to create a new AMI everytime you want to make changes, or automate some way of pulling updated code and configuration from a tool like

        git
    

.

The second method is to use containers. Containers are a Unix concept that allows applications to be bundled up and ran in an isolated virtualized environment, while still maintaining the speed benefits or running on bare metal. You can think of it like having all the stuff your application needs to run on a CD; you could burn multiple copies of that CD and run them on multiple servers.

Everytime you need to make an update, you simply update the CD and redistribute the updated version. With the way Docker works, this makes version management quite simple. But, moving an existing application to Docker may require more initial setup than you're comfortable doing, as it requires a significant shift in how you develop and operate your systems.

We'll cover the AMI method in this article, as it's far simpler; but, if you do go down the container route, you'll be better off using Amazon's managed container services rather than EC2 auto scaling. You can read our guide on getting started with AWS ECS to learn more.

How to Get Started

You'll need a few things to get started. First is the custom AMI. They're relatively simple to create; from the EC2 Management Console, right-click your current server and select Image > Create Image. This will open a dialog that will make a snapshot of your server and create an AMI from that snapshot; give it a name and description and select "Create Image."

Create image.

Once the AMI is created (it may take a few minutes), scroll down to the bottom of the EC2 sidebar and select "Launch Configuration" under the "Auto Scaling" tab. Create a new launch configuration and select your custom AMI as the base.

Choose your AMI under "My AMIs" tab.

You should choose the instance type you want to use as your increment. For example, if you'd like to scale up in 2 vCPU increments, choose a 2 vCPU instance. You'll be doing more scaling, but your costs may be better optimized.

Next, you'll configure the launch details. You'll want to make sure to request spot instances, especially if you're planning on scaling up during the day and scaling down at night. Spot instances can run for up to 6 hours. You'll have to specify a max price; you can set this to the hourly cost of the On-Demand version of the instance, and it will always run.

Configure the launch details

You can also specify a setup script here, under the advanced settings. You can paste this in as text or as a file to run.

Specify a setup script

Next, you'll add storage, select a security group, and select a key pair, as you usually would when creating an EC2 instance (though this is simply a template).

At the end, choose to create an auto-scaling group with the newly created launch configuration. Set a name for the group, initial size, and select your subnet.

Next, you'll configure your scaling policies. You'll want to choose a range to scale between, and a metric to use to scale the instances, such as average CPU utilization or average network traffic. You can also set up CloudWatch alarms to scale instances based on other metrics.

Configure your scaling policies

You'll also need to specify the time in seconds that instances need to warm up; if you're using AMIs, this time will be much lower, but you'll still need to do testing to figure out how long it takes.

Next, you can configure notifications and tags, and review your configuration before launch. Note that creating this auto-scaling group will provision servers for you, so be prepared to pay for them.

From the "Auto Scaling Groups" tab in the EC2 Console, you can view the activity of your group, such as the current running instances or launch failures. Your group should now scale up and down, depending on load. You'll want to keep a close eye on its behavior for the first few days, to make sure everything is in order.

When you need to update your servers, you'll have to create a new launch configuration with a new AMI, and select the new configuration as the config for your auto-scaling group.