By Billie Peterson
Dear Tech Talk--
There are just too many indexes on the World Wide Web. I never know
which one to use, and why do there have to be so many of them anyway?
-Goin' Buggy
Dear Buggy--
Sure enough -- there are a lot of spiders crawling around on the World
Wide Web, and confusion reigns. Why are there so many of them?
Because the amount of information on the Internet is so vast that one
search engine can't possibly capture everything, and they all create
their databases differently. Because each of these "indexes" has its
own strengths and weaknesses, there are definitely times when one may be more appropriate to use than another.
First, here's a list of some basic features you should know about any
WWW search engine:
- What does the database contain? Only WWW sites; WWW sites and
other Internet sites (gopher, ftp, etc.)?
- What kind of Boolean searching is provided? And; or; both?
- How are phrases handled? As a Boolean search; as an adjacency
search?
- What is the default method of searching? Can the default be
changed?
- Is a relevancy score attached to each retrieved document, with
the more "relevant" documents listed first, or are documents retrieved
and listed randomly?
- Are summaries of each search result provided?
- Can a site be browsed by subject?
- Are terms searched as whole words or as part of words
(substrings)?
Below I've listed some of the "indexes" with which I am familiar and
some of their features. There are many others, but this list provides a
good place to begin. Often these "indexes" can be put into two
categories. Subject Trees with search engines and Search Engines only.
With subject trees, the documents are put into the subject categories
(from which the database is usually created) by people; with Search
Engines only, databases are created with automated spiders, wanderers,
robots, which "crawl" through the Internet and automatically build the
databases using on a variety of indexing techniques.
Subject Trees With Search Engines
Lycos
URL: http://lycos.cs.cmu.edu/
One of the largest search engine databases currently available. It
includes WWW, gopher, and ftp sites.
- And -- Yes
Or -- Yes (default)
Phrase Searching -- No
Relevancy Ranking -- Yes
Summary of Search Results -- Yes
Partial Word Search -- Yes; to achieve an exact match, end each
word with a period.
Subject Browsing -- Lycos 250; based on what the spider finds, the
250 sites that are found most frequently as links on other pages are
listed in 10 broad subject categories.
WWW Virtual Library
URL: http://www.yahoo.com/
One of the most popular places to begin looking for information.
Although the database is manually maintained and relatively small,
its value is enhanced because whenever a search is performed, links to
the following search engines are automatically provided: OpenText,
Lycos, WebCrawler, InfoSeek, Inktomi, and DejaNews.
- And -- Yes (default)
Or -- Yes
Phrase Searches -- Yes
Relevancy Ranking -- No
Summary of Search Results -- No
Partial Word Searches -- Yes
Subject Browsing -- 14 broad subject categories listed
Search Engines Only
DejaNews
URL: http://www.dejanews.com/
Indexes only Usenet archives.
- And -- Yes
Or -- Yes (default)
Phrase Searches -- No
Relevancy Ranking -- Yes
Summary of Search Results -- No
Partial Word Searches -- Yes
InfoSeek
URL: http://www.infoseek.com/
Indexes titles and comments on pages. InfoSeek charges a fee to
have complete access to the database, but often the demo search access
provides the needed information.
- And -- Yes
Or -- Yes (default)
Phrase Searches -- Yes (enclose phrase in quotes)
Relevance Ranking -- Yes
Summary of Search Results -- Yes
Partial Word Searches -- Yes
Inktomi
URL: http://inktomi.cs.berkeley.edu/
A relatively new, large database which rivals Lycos and WebCrawler.
- And -- Yes (use a + in front of any word that must be contained
in the returned references)
Or -- Yes (default)
Phrase Searches -- No
Relevancy Ranking -- Yes
Summary of Search Results -- No
Partial Word Searches -- Yes
Open Text
URL: http://www.opentext.com/
Indexes all words on every page, but searches can be limited to
specific areas (URL's, titles, summaries, etc.). An option is
provided to improve the results of any search.
- And -- Yes
Or -- Yes
Phrase Searches -- Yes (default)
Relevancy Ranking -- Yes
Summary of Search Results -- No
Partial Word Searches -- yes
WebCrawler
URL: http://webcrawler.com/
Indexes text of pages, including Web, gopher, and ftp sites, so it
can return extensive results. WebCrawler is owned by America OnLine,
but no fees are charged.
- And -- Yes (default)
Or -- Yes
Phrase Searches -- No
Relevancy Ranking -- Yes
Summary of Search Results -- No
Partial Word Searches -- Yes
World Wide Web Worm
URL: http://www.cs.colorado.edu/home/mcbryan/WWWW.html/
Searches titles and URL's only. It's a good search engine to use
when looking for an image or a moving picture because the URL's can be
searched using extensions such as "gif" or "mpg".
- And -- Yes (default)
Or -- Yes
Phrase Searches -- No
Relevancy Ranking -- No
Partial Word Searches -- Yes
Finally, there are some Web pages which list several search engines on
one page; and in some cases you can actually perform the search from
these pages. Some pages to investigate are:
CUI Mta-Index -- http://cuiwww.unige.ch/meta-index.html
Global Search -- http://ngwwmall.com/search/
Internet Search -- http://home.netscape.com/home/internet-search.html
SavvySearch -- http://www.cs.colostate.edu/~dreiling/smartform.html
Ted Slater's Search Engines --
http://www.regent.edu/~tedslat/tools.html
For more detailed information on search engines and spiders, read the
following:
December, John. "Spiders and Indexes: Keyword-Oriented Searching."
In World Wide Web Unleashed. Indianapolis: Sams
Publishing,1994, 386-407.
Ernst, Warren. "Finding the Web Pages You Want." In Using
Netscape: The User-Friendly Reference Indianapolis: QUE Corporation,
1995, 73-82.
Notess, Greg R. "Searching the World-Wide Web: Lycos, WebCrawler
and More." Online19 (July-August 1995):48-52.
Paul, Kathryn and Kathleen Matthews. "Is the Web Navigable?"
(Handouts from "Making Sense of the Internet" a preconference prior to
the British Columbia Library Association meeting, May 4-5, 1995).
http://burns.library.uvic.ca/BCLA_Overhead4.html
As always, send questions and comments to:
- Snail Mail:
- Tech Talk
- Billie Peterson
- Moody Memorial Library
- P.O. Box 97143
- Waco TX 76798-7143
- Phone:
- Voice: (817) 755-2344
- FAX: (817) 752-5332
E-Mail:
- INTERNET: petersonb@baylor.edu