Search Info

How Search Engines Work

 

The world wide web contains billions of pages and trillions of files that are accessed through web pages.  So how can a search engine come back with what you want in a second or two?  Clearly, the billions of pages and trillions of files are not searched in real time. 

Rather, each search engine company has multiple applications that "crawl" around the web gathering information on pages and files.  That information is catalogued and indexed for fast searching in massive databases on high speed servers.  When you initiate a search, those databases are searched in parallel and the results are collected, optimized and presented to you. 

This process is very efficient and yields excellent results.  There are a few minor problems from the user's perspective:

  • the crawlers may not find new content immediately and it may be days or even weeks before new content is indexed and "available" to the search engine(s).  This is true for the ww2hc site; recent content may not be "found"

  • the process does not do a very good job of removing information on pages that are no longer available; this is one of the reasons for the "404 - page not found" errors we all encounter.  Encountering a 404 error is not a big deal, but many ISP (Internet Service Providers) trap this and redirect you to a site or site of their choosing.   That is why you may click on a link, get a 404 error and then immediately be redirected to Walmart (or something equally useful).