You click a reference on Wikipedia, only to find the URL is broken. It’s frustrating, but it should happen less often now thanks to The Internet Archive.

Websites die, and even if they don’t they sometimes take down articles and pages. That’s a problem for Wikipedia, which builds credibility in part by citing other websites. A three year effort by The Internet Archive means 9 million previously broken Wikipedia citations point to the Archive’s Wayback Machine, providing access to source materials that would otherwise be hard for users to track down.

Here’s Mark Graham, writing in an official Internet Archive blog post about the program:

For more than 5 years, the Internet Archive has been archiving nearly every URL referenced in close to 300 wikipedia sites as soon as those links are added or changed at the rate of about 20 million URLs/week.

And for the past 3 years, we have been running a software robot called IABot on 22 Wikipedia language editions looking for broken links (URLs that return a ‘404’, or ‘Page Not Found’). When broken links are discovered, IABot searches for archives in the Wayback Machine and other web archives to replace them with. Restoring links ensures Wikipedia remains accurate and verifiable and thus meets one of Wikipedia’s three core content policies: ‘Verifiability’.

The bot fixed 6 million links by pointing to the Archive, while Wikipedia editors linked to 3 million more. It’s a real service for internet users, who can now check references that would otherwise be lost. It’s a little scary that a nonprofit has to do this work, but I’m glad someone is.

Profile Photo for Justin Pot Justin Pot
Justin Pot has been writing about technology for over a decade, with work appearing in Digital Trends, The Next Web, Lifehacker, MakeUseOf, and the Zapier Blog. He also runs the Hillsboro Signal, a volunteer-driven local news outlet he founded.
Read Full Bio »