| How can 
anyone find information on the Web?
 
  
 Search 
engines are programs that have two basic functions: identify and collect information/pages, 
and index the information into a searchable database. Programs called spiders 
(or webcrawlers) use lists of servers to find Web pages to use.
 Basically, 
the search engine uses words from a page to identify that page.
 
  Indexing 
is a way of creating a database to search for information. A keyword index is 
built from words from the text of the page, which then points to that page. Note 
that 'search engine' and 'keyword' are on this page-- if a spider 
found this page and it was indexed, you would be able to retrieve the page by 
searching on those keywords. (Many search engines also allow searching by HTML 
fields, such as <TITLE>. Thus, you could limit your seach for this 
page to 'title:indexing' to find only those pages which have indexing in 
the title.) Most search engines 
use relevancy ranking to claim that the results which best match your search are 
displayed first.
 
  Relevancy 
is basically the number of times a word appears in a given page, its placement 
on the page (a word in the title is deemed more "relevant" than one 
at the end of the page), and the size of the document. Ranking assesses a value 
to these occurances of a word (or words) and expresses that number in a comparison 
to other documents. Unlike the automated 
spider, subject-oriented search engines use human mediation for filtering. 
That means that a person reviews Web pages to decide whether they will 
be included.
 
  Often, 
URLs are either submitted for review, or a reviewer will come across a page that 
will be considered. In this example, Pat the Internet Librarian reviews Web pages 
to determine if they meet the coverage, level and subject focus of the search 
engine. Similar to the automated 
spider, indexing in a subject-oriented search engines involves creating an index 
of keywords from Web pages.
 
  The 
biggest difference is that in subject-oriented search engines, someone assigns 
a or Subject Heading or Category to help describe the overall topic 
or focus of the page. This makes it easier to search by common subjects, instead 
of knowing the specific terms used by the author of the Web page. In addition, 
for some of these search engines a human mediator assigns a rating or review to 
each page, expressing its strengths or uniqueness. Thanks 
to: D. Scott Brandt (a.k.a. techman) Technology 
Training Librarian - Professor of Library Science Purdue 
University Libraries
 |