A spider trap is a web page or set of pages that traps any web crawler/search bot that comes across it. While some spider traps are unintentionally created (e.g. there is an element on the page such as a dynamic calendar with, essentially, infinite forward links for the crawler to follow), many spider traps are created with the intent of trapping spam crawlers looking for email addresses and other personal information. The heart of the trap, whether intentionally or unintentionally created, is series of links or a dynamic link system that a crawler gets stuck following (like a maze with no exit).
Spider traps are particularly effective against malicious crawlers because the robots.txt file for the web site (which contains instructions for legitimate crawlers to follow) can be updated to include the location of the trap, allowing legitimate crawlers to avoid it. Malicious crawlers, on the other hand, routinely ignore the requests/rules in the robots.txt and end up trapped in the spider trap.
- By Jason Fitzpatrick on 11/16/13