How search engines organize the web

If you've ever used a search engine for research on the web, you know that the lengths of the lists of matching web sites returned after entering a search can vary dramatically based on the search, and the search engine used. For example, if you enter the phrase "automation" on the Yahoo! search engine, you get a list of 96 categories and 2,688 specific sites to peruse.

By Laura Zurawski, web editor March 1, 1999

If you’ve ever used a search engine for research on the web, you know that the lengths of the lists of matching web sites returned after entering a search can vary dramatically based on the search, and the search engine used. For example, if you enter the phrase “automation” on the Yahoo! search engine, you get a list of 96 categories and 2,688 specific sites to peruse. If you enter the exact same word on Excite’s site, you get no less than 79,254 individual sites. The same search on Lycos’ site results in only 3 categories and about 100 individual sites, selected by how closely they match the word “automation.”

How search engines work

Why such a difference between the sites? It all has to do with how that particular site chooses to find and keep track of all the web sites out there. Search engines do not search the entire World Wide Web. That would be impossible, given the number of web sites in existence and the fact that new sites are being added every day. Most search engines use robots to go out onto the Internet and collect the URLs and descriptions of as many web sites as possible. They are not robots in the true sense of the word; rather, they are computer programs that are designed to automatically locate and index URLs and the URLs referenced within. Like web surfing itself, it can seem like an endless process, which is why it must be an automated task.

The robots, sometimes called “spiders,” then index the URLs found into a database, which is what the search engine will actually search. Since each different search engine has its own robot program or programs to do its indexing, and each search engine indexes URLs differently, the presentation and amount of subject-relevant web sites that search engines gather into their databases will obviously differ from search engine to search engine. That is not to say that any one search engine is better than any others. It all depends on how you prefer your information organized and presented to you.

Where to go?

Most search engines’ robots or spiders will start in places where they can find large numbers of URLs to visit, usually pages that contain large amounts of links. From there they can explore countless different avenues and collect URLs and site descriptions most efficiently. However, since it is most certainly the case that there will be sites that robots can’t get to from anywhere else, the majority of search engines also allow users to submit URLs for robots to visit, usually by filling out an online form. These two methods help to ensure that the search engine has a diverse collection of URLs in its database. Each individual search engine’s database is then organized into categories to help make searching easier. The users can then search the database two ways: by entering a specific word or phrase, or by browsing the category or categories from which they would like further information.

Virtual libraries

Many of the places that robots will start collecting are classified as online “virtual libraries,” dedicated to specific subjects or fields. Some virtual libraries operate the exact same way as a regular search engine does, differing only in the content they keep in their databases. Other virtual libraries simply keep an online list of links to related web sites that the user can browse and do not have a search option. These virtual libraries can become valuable tools for researching a particular subject, or keeping handy in a bookmark list for frequent referencing. Most virtual libraries can be found through the larger search engines.

Author Information

Laura Zurawski, web editor, lzurawski@cahners.com

Virtual Libraries for Manufacturing and Engineering

Manufacturing Marketplace,

Control Engineering Virtual Library,

1999 Automation Integrator Guide,

Most Commonly Used Search Engines

Alta Vista—

Excite—

HotBot—

InfoSeek—

Lycos—

WebCrawler—

Yahoo!—


Related Resources