Why Are Progress Bars So Inaccurate?

Quick Links

All Tasks Are Not Created Equal

Past Results Do Not Guarantee Future Performance

You Cannot Accurately Determine Something that is Nondeterministic

Ultimately, It Really Does Not Matter

At first thought, it seems that generating an accurate estimation of time should be fairly easy. After all, the algorithm producing the progress bar knows all the tasks it needs to do ahead of time... right?

For the most part, it is true that the source algorithm does know what it needs to do ahead of time. However, pinning down the time it will take to perform each step is a very difficult, if not virtually impossible, task.

All Tasks Are Not Created Equal

The simplest way to implement a progress bar is to use a graphical representation of task counter. Where the percent complete is simply calculated as Completed Tasks / Total Number of Tasks. While this makes logical sense on first thought, it is important to remember that (obviously) some tasks take longer to complete.

Consider the following tasks performed by an installer:

Create folder structure.
Decompress and copy 1 GB worth of files.
Create registry entries.
Create start menu entries.

In this example, steps 1, 3, and 4 would complete very quickly while step 2 would take some time. So a progress bar working on a simple count would jump to 25% very quickly, stall for a bit while step 2 is working, and then jump to 100% almost immediately.

This type of implementation is actually quite common among progress bars because, as stated above, it is easy to implement. However, as you can see, it is subject to disproportionate tasks skewing the actual progress percentage as it relates to time remaining.

To work around this, some progress bars might use implementations where steps are weighted. Consider the steps above where a relative weight is assigned to each step:

Create folder structure. [Weight = 1]
Decompress and copy 1 GB worth of files. [Weight = 7]
Create registry entries. [Weight = 1]
Create start menu entries. [Weight = 1]

Using this method, the progress bar would move in increments of 10% (as the total weight is 10) with steps 1, 3, and 4 moving the bar 10% on completion and step 2 moving it 70%. While certainly not perfect, methods like this are a simple way to add a bit more accuracy to the progress bar percentage.

Past Results Do Not Guarantee Future Performance

Consider a simple example of me asking you to count to 50 while I use a stopwatch to time you. Let's say you count to 25 in 10 seconds. It would be reasonable to assume you will count the remaining numbers in an additional 10 seconds, so a progress bar tracking this would show 50% complete with 10 seconds remaining.

Once your count reaches 25, however, I start throwing tennis balls at you. Likely, this will break your rhythm as your concentration has moved from strictly counting numbers to dodging balls thrown your way. Assuming you are able to continuing counting, your pace has certainly slowed a bit. So now the progress bar is still moving, but at a much slower pace with the estimated time remaining either at a standstill or actually climbing higher.

For a more practical example of this, consider a file download. You are currently downloading a 100 MB file at the rate of 1 MB/s. This is very easy to determine the estimated time of completion. But 75% of the way there, some network congestion hits and your download rate drops to 500 KB/s.

Depending on how the browser calculates the remaining time, your ETA could instantly go from 25 seconds to 50 seconds (using present state only: Size Remaining / Download Speed) or, most likely, the browser uses a rolling average algorithm which would adjust for fluctuations in transfer speed without displaying dramatic jumps to the user.

An example of a rolling algorithm with regards to downloading a file might work something like this:

The transfer speed for the previous 60 seconds is remembered with the newest value replacing the oldest (e.g. the 61st value replaces the first).
The effective transfer rate for the purpose of calculation is the average of these measurements.
Time remaining is calculated as: Size Remaining / Effective Download Speed

So using our scenario above (for the sake of simplicity, we will use 1 MB = 1,000 KB):

At 75 seconds into the download, our 60 remembered values would each be 1,000 KB. The effective transfer rate is 1,000 KB (60,000 KB / 60) which yields a time remaining of 25 seconds (25,000 KB / 1,000 KB).
At 76 seconds (where the transfer speed drops to 500 KB), the effective download speed becomes ~992 KB (59,500 KB / 60) which yields a time remaining of ~24.7 seconds (24,500 KB / 992 KB).
At 77 seconds: Effective speed = ~983 KB (59,000 KB / 60) yielding time remaining of ~24.4 seconds (24,000 KB / 983 KB).
At 78 seconds: Effective speed = 975 KB (58,500 KB / 60) yielding time remaining of ~24.1 seconds (23,500 KB / 975 KB).

You can see the pattern emerging here as the dip in download speed is slowly incorporated into the average which is used to estimate the time remaining. Under this method, if the dip only lasted for 10 seconds and then returned to 1 MB/s the user is unlikely to notice the difference (save for a very minor stall in the estimated time countdown).

Getting to the brass tacks - this is simply methodology for relaying information to the end user for the actual underlying cause...

You Cannot Accurately Determine Something that is Nondeterministic

Ultimately, the progress bar inaccuracy boils down to the fact that it is trying to determine a time for something that is nondeterministic. Because computers process tasks both on demand and in the background, it is almost impossible to know what system resources will be available at any point in the future - and it is the availability of system resources which is needed for any task to complete.

Using another example, suppose you are running a program upgrade on a server which performs a fairly intensive database update. During this update process, a user then sends a demanding request to another database running on this system. Now the server resources, specifically for the database, are having to process requests for both your upgrade as well as the user initiated query - a scenario which will certainly be mutually detrimental to execution time. Alternately, a user could initiate a large file transfer request which would tax the storage throughput which would detract from performance as well. Or a scheduled task could kick off which performs a memory intensive process. You get the idea.

As, perhaps, a more realistic instance for an everyday user - consider running Windows Update or a virus scan. Both of these operations perform resource intensive operations in the background. As a result, the progress each makes depends on what the user is doing at the time. If you are reading your email while this runs, most likely the demand on system resources will be low and the progress bar will move consistently. On the other hand, if you are doing graphics editing then your demand on system resources will be much greater which will cause the progress bar movement to be schizophrenic.

Overall, it is simply that there is no crystal ball. Not even the system itself knows what load it will be under at any point in the future.

Ultimately, It Really Does Not Matter

The intent of the progress bar is to, well, indicate that progress is indeed being made and the respective process is not hung. It is nice when the progress indicator is accurate, but typically it is only a minor annoyance when it is not. For the most part, developers are not going to devote a great deal of time and effort into progress bar algorithms because, frankly, there are much more important tasks to spend time on.

Of course, you have every right to be annoyed when an progress bar jumps to 99% complete instantly and then makes you wait 5 minutes for the remaining one percent. But if the respective program works well overall, just remind yourself that the developer had their priorities straight.