Quick Links

Sometimes the faithful download progress meter on your browser (or other application) just throws its hands in the air and gives up on displaying the remaining download time. Why does it sometimes nail the projected download time and sometimes fails to report it all together?

Today’s Question & Answer session comes to us courtesy of SuperUser—a subdivision of Stack Exchange, a community-driven grouping of Q&A web sites.

The Question

SuperUser reader Coldblackice wants to know why his browser doesn't always dish the dirt:

Occasionally, when downloading a file in a web browser, the download progress doesn't "know" the total size of the file, or how far along in the download it is -- it just shows the speed at which it's downloading, with a total as "Unknown".

Why wouldn't the browser know the final size of some files? Where does it get this information in the first place?

Where indeed?

The Answers

SuperUser contributor Gronostaj offers the following insight:

To request documents from web servers, browsers use the HTTP protocol. You may know that name from your address bar (it may be hidden now, but when you click the address bar, copy the URL and paste it in some text editor, you'll see 

        http://
    

 at the beginning). It's a simple text-based protocol and it works like this:

First, your browser connects to the website's server and sends a URL of the document it wants to download (web pages are documents, too) and some details about the browser itself (User-Agent etc). For example, to load the main page on the SuperUser site, 

        http://superuser.com/
    

, my browser sends a request that looks like this:

        GET / HTTP/1.1
Host: superuser.com
Connection: keep-alive
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64)
Accept-Encoding: gzip,deflate,sdch
Accept-Language: pl-PL,pl;q=0.8,en-US;q=0.6,en;q=0.4
Cookie: [removed for security]
DNT: 1
If-Modified-Since: Tue, 09 Jul 2013 07:14:17 GMT

The first line specifies which document the server should return. The other lines are called headers; they look like this:

        Header name: Header value

These lines send additional information that helps the server decide what to do.

If all is well, the server will respond by sending the requested document. The response starts off with a status message, followed by some headers (with details about the document) and finally, if all is well, the document's content. This is what the SuperUser server's reply for my request looks like:

        HTTP/1.1 200 OK
Cache-Control: public, max-age=60
Content-Type: text/html; charset=utf-8
Expires: Tue, 09 Jul 2013 07:27:20 GMT
Last-Modified: Tue, 09 Jul 2013 07:26:20 GMT
Vary: *
X-Frame-Options: SAMEORIGIN
Date: Tue, 09 Jul 2013 07:26:19 GMT
Content-Length: 139672

<!DOCTYPE html>
<html>
    [...snip...]
</html>

After the last line, SuperUser's server closes the connection.

The first line (HTTP/1.1 200 OK) contains the response code, in this case it's 200 OK. It means that the server will return a document, as requested. When the server doesn't manage to do so, the code will be something else: you have probably seen 404 Not Found, and 403 Forbidden is quite common, too. Then the headers follow.

When the browser finds an empty line in the response, it knows that everything past that line is the content of the document it requested. So in this case <!DOCTYPE html> is the first line of the SuperUser's homepage code. If I was requesting a document to download, it would probably be some gibberish characters, because most document formats are unreadable without prior processing.

Back to headers. The most interesting one for us is the last one, Content-Length. It informs the browser how many bytes of data it should expect after the empty line, so basically it's the document size expressed in bytes. This header isn't mandatory and may be omitted by the server. Sometimes the document size can't be predicted (for example when the document is generated on the fly), sometimes lazy programmers don't include it (quite common on driver download sites), sometimes websites are created by newbies who don't know of such a header.

Anyway, whatever the reason is, the header can be missing. In that case the browser doesn't know how much data the server is going to send, and thus displays the document size as unknown, waiting for the server to close the connection. And that's the reason for unknown document sizes.


Have something to add to the explanation? Sound off in the the comments. Want to read more answers from other tech-savvy Stack Exchange users? Check out the full discussion thread here.