How search engines organize the web

If you've ever used a search engine for research on the web, you know that the lengths of the lists of matching web sites returned after entering a search can vary dramatically based on the search, and the search engine used. For example, if you enter the phrase "automation" on the Yahoo! search engine, you get a list of 96 categories and 2,688 specific sites to peruse.

03/01/1999


If you've ever used a search engine for research on the web, you know that the lengths of the lists of matching web sites returned after entering a search can vary dramatically based on the search, and the search engine used. For example, if you enter the phrase "automation" on the Yahoo! search engine, you get a list of 96 categories and 2,688 specific sites to peruse. If you enter the exact same word on Excite's site, you get no less than 79,254 individual sites. The same search on Lycos' site results in only 3 categories and about 100 individual sites, selected by how closely they match the word "automation."

How search engines work

Why such a difference between the sites? It all has to do with how that particular site chooses to find and keep track of all the web sites out there. Search engines do not search the entire World Wide Web. That would be impossible, given the number of web sites in existence and the fact that new sites are being added every day. Most search engines use robots to go out onto the Internet and collect the URLs and descriptions of as many web sites as possible. They are not robots in the true sense of the word; rather, they are computer programs that are designed to automatically locate and index URLs and the URLs referenced within. Like web surfing itself, it can seem like an endless process, which is why it must be an automated task.

The robots, sometimes called "spiders," then index the URLs found into a database, which is what the search engine will actually search. Since each different search engine has its own robot program or programs to do its indexing, and each search engine indexes URLs differently, the presentation and amount of subject-relevant web sites that search engines gather into their databases will obviously differ from search engine to search engine. That is not to say that any one search engine is better than any others. It all depends on how you prefer your information organized and presented to you.

Where to go?

Most search engines' robots or spiders will start in places where they can find large numbers of URLs to visit, usually pages that contain large amounts of links. From there they can explore countless different avenues and collect URLs and site descriptions most efficiently. However, since it is most certainly the case that there will be sites that robots can't get to from anywhere else, the majority of search engines also allow users to submit URLs for robots to visit, usually by filling out an online form. These two methods help to ensure that the search engine has a diverse collection of URLs in its database. Each individual search engine's database is then organized into categories to help make searching easier. The users can then search the database two ways: by entering a specific word or phrase, or by browsing the category or categories from which they would like further information.

Virtual libraries

Many of the places that robots will start collecting are classified as online "virtual libraries," dedicated to specific subjects or fields. Some virtual libraries operate the exact same way as a regular search engine does, differing only in the content they keep in their databases. Other virtual libraries simply keep an online list of links to related web sites that the user can browse and do not have a search option. These virtual libraries can become valuable tools for researching a particular subject, or keeping handy in a bookmark list for frequent referencing. Most virtual libraries can be found through the larger search engines.


Author Information

Laura Zurawski, web editor, lzurawski@cahners.com


Virtual Libraries for Manufacturing and Engineering

Manufacturing Marketplace,

Control Engineering Virtual Library,

1999 Automation Integrator Guide,

Most Commonly Used Search Engines

Alta Vista—

Excite—

HotBot—

InfoSeek—

Lycos—

WebCrawler—

Yahoo!—



No comments
The Engineers' Choice Awards highlight some of the best new control, instrumentation and automation products as chosen by...
The System Integrator Giants program lists the top 100 system integrators among companies listed in CFE Media's Global System Integrator Database.
The Engineering Leaders Under 40 program identifies and gives recognition to young engineers who...
This eGuide illustrates solutions, applications and benefits of machine vision systems.
Learn how to increase device reliability in harsh environments and decrease unplanned system downtime.
This eGuide contains a series of articles and videos that considers theoretical and practical; immediate needs and a look into the future.
Sensor-to-cloud interoperability; PID and digital control efficiency; Alarm management system design; Automotive industry advances
Make Big Data and Industrial Internet of Things work for you, 2017 Engineers' Choice Finalists, Avoid control design pitfalls, Managing IIoT processes
Engineering Leaders Under 40; System integration improving packaging operation; Process sensing; PID velocity; Cybersecurity and functional safety
This article collection contains several articles on the Industrial Internet of Things (IIoT) and how it is transforming manufacturing.

Find and connect with the most suitable service provider for your unique application. Start searching the Global System Integrator Database Now!

SCADA at the junction, Managing risk through maintenance, Moving at the speed of data
Flexible offshore fire protection; Big Data's impact on operations; Bridging the skills gap; Identifying security risks
The digital oilfield: Utilizing Big Data can yield big savings; Virtualization a real solution; Tracking SIS performance
click me