How can
anyone find information on the Web?
Search
engines are programs that have two basic functions: identify and collect information/pages,
and index the information into a searchable database. Programs called spiders
(or webcrawlers) use lists of servers to find Web pages to use. Basically,
the search engine uses words from a page to identify that page.
Indexing
is a way of creating a database to search for information. A keyword index is
built from words from the text of the page, which then points to that page. Note
that 'search engine' and 'keyword' are on this page-- if a spider
found this page and it was indexed, you would be able to retrieve the page by
searching on those keywords. (Many search engines also allow searching by HTML
fields, such as <TITLE>. Thus, you could limit your seach for this
page to 'title:indexing' to find only those pages which have indexing in
the title.) Most search engines
use relevancy ranking to claim that the results which best match your search are
displayed first.
Relevancy
is basically the number of times a word appears in a given page, its placement
on the page (a word in the title is deemed more "relevant" than one
at the end of the page), and the size of the document. Ranking assesses a value
to these occurances of a word (or words) and expresses that number in a comparison
to other documents. Unlike the automated
spider, subject-oriented search engines use human mediation for filtering.
That means that a person reviews Web pages to decide whether they will
be included.
Often,
URLs are either submitted for review, or a reviewer will come across a page that
will be considered. In this example, Pat the Internet Librarian reviews Web pages
to determine if they meet the coverage, level and subject focus of the search
engine. Similar to the automated
spider, indexing in a subject-oriented search engines involves creating an index
of keywords from Web pages.
The
biggest difference is that in subject-oriented search engines, someone assigns
a or Subject Heading or Category to help describe the overall topic
or focus of the page. This makes it easier to search by common subjects, instead
of knowing the specific terms used by the author of the Web page. In addition,
for some of these search engines a human mediator assigns a rating or review to
each page, expressing its strengths or uniqueness. Thanks
to: D. Scott Brandt (a.k.a. techman) Technology
Training Librarian - Professor of Library Science Purdue
University Libraries |