Facebook’s six-hour-long outage on October 4, 2021, had people scrambling to find out just what was going on. Part of the answer lies in an integral part of the internet called Border Gateway Protocol, or BGP.
What Exactly Is BGP Anyway?
Several very apt metaphors have been used in recent articles to explain BGP. People have likened it to everything from an air traffic controller to a constantly evolving map of the internet. It’s even been called “the duct tape of the internet.” And they’re all right.
BGP is the protocol that tells data requests what path they need to take to reach the server. If, for example, you log in to Facebook or open the app to pull up your feed, BGP is what guides your data packet along the fastest route to retrieve that data for you from Facebook’s servers.
Cloudflare describes BGP as “the postal service of the internet,” in that it chooses the fastest and most efficient route for your requests to reach their intended server. BGP looks at all the available routes your data could take, then chooses what it sees as the best one.
Often, that will mean routing your data through the autonomous systems that make up the internet as a whole. BGP figures out which systems talk to one another then sends your data along the quickest path between them so it can arrive at the proper destination.
Continuing the post office metaphor, each autonomous system on the internet is like a branch of the post office. Even though your city might have thousands of mailboxes, every piece of mail still has to go through the post office before it’s delivered.
Examples of autonomous systems on the internet include:
- An internet service provider (ISP) like Comcast, AT&T, Verizon, etc.
- A company like Facebook
- Other large organizations like governments or universities
Mitchell Clark, writing for The Verge, likens BGP to a constantly updating map and autonomous systems to islands on that map. Since there are way too many “islands” on the internet to build bridges between each and every one, BGP tells you where the bridges already are.
There are in fact two types of BGP:
- External BGP (eBGP): The protocol used by the internet at large. In our post office metaphor, this is akin to international shipping.
- Internal BGP (iBGP): An internal BGP protocol that autonomous systems can choose to use to route data within their own networks. This is similar to the mail services in different individual countries.
It’s not necessary to have iBGP set up in order to access the wider internet’s eBGP, but some autonomous systems like big tech companies use iBGP anyway to route internal traffic.
How Do BGP and DNS Work Together?
BGP is what makes data routing on the internet possible, which makes it the glue—or the duct tape—that holds the internet together. Part of the way BGP works is that it advertises viable routes for data. If BGP stops working, those routes can’t be found and disappear from the internet, so the data has nowhere to go.
That’s part of what happened at Facebook. Facebook’s VP of Infrastructure Santosh Janardhan put it this way in his blog post explaining the mechanics of the outage:
“One of the jobs performed by our smaller facilities is to respond to DNS queries. DNS is the address book of the internet, enabling the simple web names we type into browsers to be translated into specific server IP addresses. Those translation queries are answered by our authoritative name servers that occupy well known IP addresses themselves, which in turn are advertised to the rest of the internet via another protocol called the border gateway protocol (BGP).”
In other words, the internet’s Domain Name System (DNS) protocol functions like a list of addresses, and BGP is the postal service that gets the mail to those houses. Mail can’t get delivered if you’ve got an address but no directions to the house.
“…DNS servers disable those BGP advertisements if they themselves can not speak to our data centers, since this is an indication of an unhealthy network connection. In the recent outage the entire backbone was removed from operation, making these locations declare themselves unhealthy and withdraw those BGP advertisements. The end result was that our DNS servers became unreachable even though they were still operational. This made it impossible for the rest of the internet to find our servers.”
How BGP Can Mess Up The Internet
Multiple factors can affect the route your data takes through the internet’s map. Cost can be one, as some providers charge for access to their systems. The changing nature of the internet itself is another.
Autonomous systems and websites can move or be removed entirely from the map of the internet. They can also change or add service providers—an example might be a college switching ISPs from Comcast to AT&T. BGP has to regularly update the routes data can take to make sure they stay current and your request doesn’t run into a dead-end, Wile E. Coyote style.
Autonomous systems run BGP updates without incident all the time. But when they go wrong, they can go very wrong. In their article, Clark explains that since BGP is designed to spread from system to system quickly, an error can have a ripple effect like the one we saw at Facebook.
Fixing the Bugs
According to Cloudflare, a bad BGP update in 2004 by Turkish ISP TTNet temporarily advertised TTNet as the best destination for all traffic on the internet. That resulted in connection problems for an entire day until the issue was sorted out.
Incidents like these point to certain weaknesses in BGP, namely that the autonomous systems that make up the internet at large will implicitly trust what BGP tells them is the best route for data. While glitches don’t happen often, some have argued for the need to make BGP more secure. An update on that scale, however, would require every autonomous system on the internet to update at once. That means implementing major changes to the protocol would be challenging, to say the least.
BGP is just one of several elements that make the internet work. Understanding its foundation can help you navigate and understand outages and other issues in the future.
RELATED: How Does the Internet Work?