Quick Links

Data migration can be an slow and arduous process. Even over a gigabit link, it can take months to move huge datasets. AWS Snowball provides a physical device you can fill up on-site and ship back to AWS for import into S3.

It's Pretty Much a Box of Hard Drives

It's not uncommon to see gigantic datasets, as successful startups usually collect a lot of data. While 100 TB of data might not be that unwieldy to actually store on your servers, you'll reach a point where you just don't have the compute capacity required to actually work with such a large dataset. At this point, transferring it to the cloud, AWS S3 in this case, has a lot of advantages, but actually getting it there becomes a logistical issue of its own.

The internet is very slow compared to the speeds achievable on private networks. At best, you may get gigabit speed, but you're unlikely to achieve that in practice---even AWS Direct Connect tops out at 500 MBps. Either way, you wouldn't want to saturate that whole connection for months while you move your dataset. Meanwhile, it's common for intranet connections to reach 10 to 40 gigabit speeds with the right hardware and modern cables, and hard drive speeds are fairly fast, especially in RAID arrays or when read in parallel.

So, AWS's crazy solution is to load up data on a hard drive, and mail that hard drive to AWS. The time it takes to load data, ship it cross-country, be received by Amazon, and be read and loaded into S3 is far lower than simply using the internet pipes (for huge datasets at least). AWS has actually offered this as a service since 2009, where you could send in your own hard drives for import. However, the logistical issues involved were less than ideal.

Snowball is the next iteration of this service. The box contains multiple hard drives, and comes in 50 TB and 80 TB capacities. On the front of the box, you'll find a Kindle e-Ink display, which serves as the shipping label and interface for the device. On the back, you'll find the power plug as well as RJ45 and optical port for connecting it to your network. All of this is contained in a ruggedized plastic container that acts as its own shipping box:

The Snowball in its shipping box.

The Snowball can reach a max data transfer rate of 250 to 400 MBps, which is equivalent to a 2.5- to 3-gigabit connection. You also may be limited by read speeds from your own network, but you're able to run multiple import tasks in parallel, so you should be able to saturate it. You can also order multiple Snowballs to transfer datasets larger than 50 TB.

The Snowball costs $200 per transfer, or $250 for the 80 TB version. This is actually really cheap; compared to transferring that data into S3 over the internet pipes (which you are taxed for), it's about one fifth of the cost, and compared to buying and sending in your own hard drives, you'd be hard pressed to find 50 TB of storage for $200 (a single 8 TB drive can cost that much).

This positions the Snowball as a reasonable data transfer option for companies that want to make the switch to the cloud and bring their dataset with them. Keep in mind, though, that the major cost won't be the Snowball itself, but rather the cost of storing that data in the cloud in the first place. A Snowball full of data would cost $1,150 per month under S3's standard tier, but this can be brought down to around $750 or so by using Intelligent Tiering. If your data isn't accessed that often, you can use S3 Glacier, which can bring the costs down to $200 per month, or just $50 per month with Glacier Deep Archive. Either way, the cost of the Snowball is negligible if you're seriously considering moving your data to the cloud.

How to Use a Snowball

As far as actually using the thing goes, once it arrives you'll simply plug it into power and connect it to a computer on your network with the RJ45, SFP+ Copper, or SFP+ Optical jacks. You'll want to make sure the workstation you're connecting it to is fairly powerful, as it's usually the bottleneck. It will also be ideal if this workstation is connected directly to your data source with nothing else clogging up the network.

Using the display on the front, you can assign the Snowball a static IP address:

Assigning the Snowball a static IP address.

Once the networking is hooked up, you can begin the transfer. You'll want to use the Snowball S3 Adapter, a software package that acts as an S3 endpoint and uses the same API you would use to transfer to the cloud. There's also a regular client, which will enable you to simply drag and drop files onto the Snowball, but it's much slower than the S3 adapter.

Once it's loaded up, you ship it back to AWS, and it's loaded into S3 for use with the rest of the AWS ecosystem.

You Can Also Order a Whole Truck of Them

If you're the kind of person that is reading this and wondering how you're going to get hundreds of Snowballs connected to your datacenter to transfer dozens of petabytes, AWS has a service for you. Taking the Snowball concept to the extreme, some crazy engineer created the AWS Snowmobile, and it's exactly what you think---a trailer truck pulling a gigantic hard drive stored in a 45 foot long shipping container. It can move 100 petabytes of data, equivalent to about 1,250 Snowballs.

Naturally, it's fairly expensive. You're charged $0.005 per GB, per month, which is a nice way of saying you're charged $5,000 per petabyte for each month that the transfer takes. But, at this scale, this really is the cheapest solution---a network-based solution would take decades to transfer hundreds of petabytes to the cloud, making it realistically impossible, even putting aside the millions of dollars in investment and transfer costs. An AWS Snowmobile job will at most cost a few hundred thousand if you fill it up completely, and will be done in a few weeks.