By Billie Peterson
Dear Tech Talk--

There are just too many indexes on the World Wide Web. I never know which one to use, and why do there have to be so many of them anyway?

-Goin' Buggy

Dear Buggy--

Sure enough -- there are a lot of spiders crawling around on the World Wide Web, and confusion reigns. Why are there so many of them? Because the amount of information on the Internet is so vast that one search engine can't possibly capture everything, and they all create their databases differently. Because each of these "indexes" has its own strengths and weaknesses, there are definitely times when one may be more appropriate to use than another.

First, here's a list of some basic features you should know about any WWW search engine:

  1. What does the database contain? Only WWW sites; WWW sites and other Internet sites (gopher, ftp, etc.)?
  2. What kind of Boolean searching is provided? And; or; both?
  3. How are phrases handled? As a Boolean search; as an adjacency search?
  4. What is the default method of searching? Can the default be changed?
  5. Is a relevancy score attached to each retrieved document, with the more "relevant" documents listed first, or are documents retrieved and listed randomly?
  6. Are summaries of each search result provided?
  7. Can a site be browsed by subject?
  8. Are terms searched as whole words or as part of words (substrings)?

Below I've listed some of the "indexes" with which I am familiar and some of their features. There are many others, but this list provides a good place to begin. Often these "indexes" can be put into two categories. Subject Trees with search engines and Search Engines only. With subject trees, the documents are put into the subject categories (from which the database is usually created) by people; with Search Engines only, databases are created with automated spiders, wanderers, robots, which "crawl" through the Internet and automatically build the databases using on a variety of indexing techniques.

Subject Trees With Search Engines

Lycos

URL: http://lycos.cs.cmu.edu/
One of the largest search engine databases currently available. It includes WWW, gopher, and ftp sites.

And -- Yes
Or -- Yes (default)
Phrase Searching -- No
Relevancy Ranking -- Yes
Summary of Search Results -- Yes
Partial Word Search -- Yes; to achieve an exact match, end each word with a period.
Subject Browsing -- Lycos 250; based on what the spider finds, the 250 sites that are found most frequently as links on other pages are listed in 10 broad subject categories.

WWW Virtual Library

URL: http://www.yahoo.com/
One of the most popular places to begin looking for information. Although the database is manually maintained and relatively small, its value is enhanced because whenever a search is performed, links to the following search engines are automatically provided: OpenText, Lycos, WebCrawler, InfoSeek, Inktomi, and DejaNews.

And -- Yes (default)
Or -- Yes
Phrase Searches -- Yes
Relevancy Ranking -- No
Summary of Search Results -- No
Partial Word Searches -- Yes
Subject Browsing -- 14 broad subject categories listed

Search Engines Only

DejaNews

URL: http://www.dejanews.com/
Indexes only Usenet archives.

And -- Yes
Or -- Yes (default)
Phrase Searches -- No
Relevancy Ranking -- Yes
Summary of Search Results -- No
Partial Word Searches -- Yes

InfoSeek

URL: http://www.infoseek.com/
Indexes titles and comments on pages. InfoSeek charges a fee to have complete access to the database, but often the demo search access provides the needed information.

And -- Yes
Or -- Yes (default)
Phrase Searches -- Yes (enclose phrase in quotes)
Relevance Ranking -- Yes
Summary of Search Results -- Yes
Partial Word Searches -- Yes

Inktomi

URL: http://inktomi.cs.berkeley.edu/
A relatively new, large database which rivals Lycos and WebCrawler.

And -- Yes (use a + in front of any word that must be contained in the returned references)
Or -- Yes (default)
Phrase Searches -- No
Relevancy Ranking -- Yes
Summary of Search Results -- No
Partial Word Searches -- Yes

Open Text

URL: http://www.opentext.com/
Indexes all words on every page, but searches can be limited to specific areas (URL's, titles, summaries, etc.). An option is provided to improve the results of any search.

And -- Yes
Or -- Yes
Phrase Searches -- Yes (default)
Relevancy Ranking -- Yes
Summary of Search Results -- No
Partial Word Searches -- yes

WebCrawler

URL: http://webcrawler.com/
Indexes text of pages, including Web, gopher, and ftp sites, so it can return extensive results. WebCrawler is owned by America OnLine, but no fees are charged.

And -- Yes (default)
Or -- Yes
Phrase Searches -- No
Relevancy Ranking -- Yes
Summary of Search Results -- No
Partial Word Searches -- Yes

World Wide Web Worm

URL: http://www.cs.colorado.edu/home/mcbryan/WWWW.html/
Searches titles and URL's only. It's a good search engine to use when looking for an image or a moving picture because the URL's can be searched using extensions such as "gif" or "mpg".

And -- Yes (default)
Or -- Yes
Phrase Searches -- No
Relevancy Ranking -- No
Partial Word Searches -- Yes

Finally, there are some Web pages which list several search engines on one page; and in some cases you can actually perform the search from these pages. Some pages to investigate are:

CUI Mta-Index -- http://cuiwww.unige.ch/meta-index.html
Global Search -- http://ngwwmall.com/search/
Internet Search -- http://home.netscape.com/home/internet-search.html SavvySearch -- http://www.cs.colostate.edu/~dreiling/smartform.html Ted Slater's Search Engines -- http://www.regent.edu/~tedslat/tools.html

For more detailed information on search engines and spiders, read the following:

December, John. "Spiders and Indexes: Keyword-Oriented Searching." In World Wide Web Unleashed. Indianapolis: Sams Publishing,1994, 386-407.

Ernst, Warren. "Finding the Web Pages You Want." In Using Netscape: The User-Friendly Reference Indianapolis: QUE Corporation, 1995, 73-82.

Notess, Greg R. "Searching the World-Wide Web: Lycos, WebCrawler and More." Online19 (July-August 1995):48-52.

Paul, Kathryn and Kathleen Matthews. "Is the Web Navigable?" (Handouts from "Making Sense of the Internet" a preconference prior to the British Columbia Library Association meeting, May 4-5, 1995). http://burns.library.uvic.ca/BCLA_Overhead4.html


As always, send questions and comments to:

Snail Mail:
Tech Talk
Billie Peterson
Moody Memorial Library
P.O. Box 97143
Waco TX 76798-7143
Phone:
Voice: (817) 755-2344
FAX: (817) 752-5332
E-Mail:
INTERNET: petersonb@baylor.edu

LIRT News, December 1995. Volume 18, number 2.
To report problems, please contact the LIRT News Production editor at edwards@ufl.edu

  WELCOME     BACK ISSUES     Last revised December 21, 1999.