NetRegister.biz: Network Domain Registrar (English Version)

HOME

ISTRUZIONI

SUPPORTO

Automated Search Engine Tutorial

How can anyone find information on the Web?

Search engines are programs that have two basic functions: identify and collect information/pages, and index the information into a searchable database. Programs called spiders (or webcrawlers) use lists of servers to find Web pages to use.

Basically, the search engine uses words from a page to identify that page.

Indexing is a way of creating a database to search for information. A keyword index is built from words from the text of the page, which then points to that page. Note that 'search engine' and 'keyword' are on this page-- if a spider found this page and it was indexed, you would be able to retrieve the page by searching on those keywords. (Many search engines also allow searching by HTML fields, such as <TITLE>. Thus, you could limit your seach for this page to 'title:indexing' to find only those pages which have indexing in the title.)

Most search engines use relevancy ranking to claim that the results which best match your search are displayed first.

Relevancy is basically the number of times a word appears in a given page, its placement on the page (a word in the title is deemed more "relevant" than one at the end of the page), and the size of the document. Ranking assesses a value to these occurances of a word (or words) and expresses that number in a comparison to other documents.

Unlike the automated spider, subject-oriented search engines use human mediation for filtering. That means that a person reviews Web pages to decide whether they will be included.

Often, URLs are either submitted for review, or a reviewer will come across a page that will be considered. In this example, Pat the Internet Librarian reviews Web pages to determine if they meet the coverage, level and subject focus of the search engine.

Similar to the automated spider, indexing in a subject-oriented search engines involves creating an index of keywords from Web pages.

The biggest difference is that in subject-oriented search engines, someone assigns a or Subject Heading or Category to help describe the overall topic or focus of the page. This makes it easier to search by common subjects, instead of knowing the specific terms used by the author of the Web page. In addition, for some of these search engines a human mediator assigns a rating or review to each page, expressing its strengths or uniqueness.

Thanks to:
D. Scott Brandt (a.k.a. techman) Technology Training Librarian - Professor of Library Science Purdue University Libraries