Crawler-Based Search Engines

Google, Yahoo and MSN are all crawler-based search engines. They are called crawler based because they create their listings automatically by "crawling" or "spidering" the web.

Crawler-based search engines have three major elements.

  1. First is the spider or crawler. The spider visits a web page, reads it, and then follows links to other pages within the site. The spider returns to the site on a regular basis, such as every month or two, to look for changes.
  2. Everything the spider finds goes into the second part of the search engine, the index. Sometimes it can take a while for new pages or changes that the spider finds to be added to the index. Thus, a web page may have been "spidered" but not yet "indexed." Until it is added to the index it is not available to those searching with the search engine.
  3. Search engine software is the third part of a search engine. This is the program that sifts through the millions of pages recorded in the index to find matches to a search and rank them in order of what it believes is most relevant.

Next Topic: Search Rankings