Quick Links

A common use case for EC2 On-Demand and Spot Instances is using powerful machines for short-term, one-off tasks. However, if you were to leave these machines running on accident, you may end up with a very large bill. Luckily, AWS has tools to prevent that.

Preventing Wallet Overflow

This is a case of "cloud overflow," where some of the more scalable services you run are dangerous to your wallet, and if not set up properly, can end up costing you orders of magnitude more money than you expected.

For example, say you need to do a short term, time sensitive task on an extremely powerful machine---like running an intensive codebase build on a 64 core worker machine, or doing 3D rendering on a machine with multiple GPUs. In any case, the hourly rates of these machines are expensive, on the order of multiple thousands of dollars a month. AWS has some accelerated computing machines that will cost you $25k to run continously for 750 hours.

However, running them for only a few hours is actually pretty cost effective for some workloads, and AWS's on-demand system makes that possible. The only problem is turning the machine off every time, because if you don't, you'll keep paying for it.

AWS doesn't have a built-in way to ensure this, but it does provide CloudWatch alarms: configurable functions that will check your instances continously to ensure things are in proper order. They can be set up to monitor CPU, network, and disk usage, and can trigger automatic shutdowns or restarts. You can even hook them up to an SNS queue to send notifications to other systems.

Setting Up An Alarm

For this use case, a simple alarm that turns off inactive instances will work fine. While there's no alarm for "has been running for too long," you can simulate one using CPU usage. If the task you're running generally loads all the cores, CPU usage should be close to zero on an inactive machine.

Your mileage will vary though, so you'll need to check your CloudWatch statistics to ensure that your machine's inactivity will be detected. The alarm configuration also provides a graph of this for you to compare off of.

From the EC2 console, right click an instance and choose "Monitoring" > "Manage CloudWatch Alarms," or click the + icon next to the "Alarm Status" in the browser if there are no existing alarms.

Here, you can set up the alarm threshold. Generally, you'll want to set the grouping to "Average," choose "CPU Utilization," and set it to trigger when it's less than 10 percent or so for an hour. Note that the period here is multiplied by the amount in "Consecutive Periods," so 5 minute periods times 12 would also be valid instead of just 1 hour.

Then, you can set up the alarm action to stop the instance.

That should be all you need, and it will automatically start running once created. You can use the same alarm configuration on a smaller machine if you'd like to test your setup.

To be safe though, you should likely enable global AWS billing alerts. These can warn you early if you're exceeding your target budget, letting you fix the problem manually before it runs off and empties your wallet.