What is a "vCPU," and How Much Performance Is It?

Quick Links

The Difference Between Cores and Threads

What Makes a vCPU?

Burstable Instances

Cloud server providers will often advertise their instances as having a certain number of "vCPUs," short for virtual CPU. How much performance can you expect from this compared to a regular CPU?

The Difference Between Cores and Threads

It's important to understand the distinction between a processing thread and CPU core. CPUs will have a set number of cores which handle execution of programs. But even heavily intensive tasks aren't utilizing 100% of the CPU all the time; programs must often wait for memory reads from L3 cache, RAM, and disks, and will often go to sleep while waiting on the data to arrive. During this time, the CPU core is inactive.

The solution to this problem is called "hyperthreading" or "simultaneous multithreading." Rather than running one set of tasks at a time, the CPU is able to to handle multiple threads. Currently, nearly every high-end CPU from Intel or AMD supports two threads per core.

Depending on the application, hyperthreading can give a theoretical 100% speedup, if both threads are waiting for memory reads and aren't conflicting with each other. In most cases, hyperthreading gives about a 30% speed gain over no hyperthreading. In some cases though, when two threads are pinned at 100% and running on the same core, it can cause slowdowns as they battle for the CPU's resources.

What Makes a vCPU?

vCPUs are roughly comparable to a single processing thread, but this isn't exactly a fair comparison.

Say you rent a

        c5.large

instance from AWS with 2 vCPUs. Your application will run alongside many others on one large server. You can actually rent the whole server with an AWS Bare Metal instance, which gives you direct access to the processor. If you're renting anything smaller than that, your access is managed through the AWS Nitro.

Nitro is a hypervisor, that handles the creation and management of the virtual machines running on the server itself. This is why you're renting a "Virtual Server," and not rack space in a datacenter. Nitro is what makes EC2 tick; it's powered in part by dedicated hardware, so the slowdown from running in a virtualized environment should be minimal.

Nitro decides which threads to assign your VM to based on how much processing power is needed, much like a task scheduler does in a normal desktop environment. With 2 vCPUs, the worst case is that your application runs on a single core, and is given the two threads of that core. If you're really maxing out your instance, your threads may conflict and cause minor slowdowns. It's hard to say exactly how AWS's hypervisor works, but it's probably safe to assume that this scenario is largely mitigated with good thread management on Nitro's part.

So, all in all, you can probably expect comparable performance to a normal CPU thread, if not a bit better. The distinction doesn't matter much anyway, since most EC2 instances will come with multiples of 2 vCPUs. Just remember that a 4 vCPU instance is not a 4 core server---it's really emulating a 2 core server, running 4 processing threads.

The processing speed of the vCPU will depend more on the actual hardware it's running on. Most server CPUs will be Intel Xeons, as they make up the majority of the market. Lower end servers may run older hardware that is a bit dated by today's standards. AWS's T3a instances use the high core count AMD EPYC CPUs, run a bit slower, but cost less due to the hardware being much cheaper per core.

Burstable Instances

AWS's T2 and T3 instances are "burstable," which are more suitable for applications which don't need to be running at 100% all of the time.

For instance, the

        t3.micro

instance has 2 vCPUs, but its base speed is 10% of a normal vCPU. In reality, the

        t3.micro

really only has 0.2 vCPU, which is actually how Google Cloud Platform advertises their

        f1-micro

instances.

But the

        t3.micro

isn't just 90% slower overall; it's allowed to burst beyond the base speed for short periods of time, much like how the turbo frequency works on a regular computer. Except the limiting factor here is not thermals, but how much you're willing to pay.

For each hour the instance runs below base speed, you accumulate CPU credits, which are used to burst the instance for one minute. The

        t3.micro

instance in particular accumulates 6 CPU credits per hour that it runs below base speed. But when processing power is needed, CPU credits are consumed to run beyond the base speed.

This is well suited for micro-service based applications, that must respond to requests when they happen, but stays idle until the next user requests something. Services that must be crunching numbers all the time are better suited to traditional servers.

This allows AWS to run more T2 instances per server than the server would usually be capable of, which helps keep costs down. For instance, each rack in their datacenter may contain a 48 core system with 96 processing threads. This could be used for 96 vCPUs worth of C5 instances, but T2 instances are able to share cores and run at less than 20% of the base core speed, so AWS can run more of them off of the same server.