Finding Information on the Internet : A Tutorial

Recommended Search Engines
UC Berkeley - Teaching Library Internet Workshops

Google is currently the most used search engine. It has one of the largest databases of Web pages, including many other types of web documents (blog posts, wiki pages, group discussion threads and document formats (e.g., PDFs, Word or Excel documents, PowerPoints). Despite the presence of all these formats, Google's popularity ranking often places worthwhile pages near the top of search results.

Google alone is not always sufficient, however. Not everything on the Web is fully searchable in Google. Overlap studies show that more than 80% of the pages in a major search engine's database exist only in that database. For this reason, getting a "second opinion" can be worth your time. For this purpose, we recommend Yahoo! Search or Exalead. We do not recommend using meta-search engines as your primary search tool.

Table of features
Some common techniques will work in any search engine. However, in this very competitive industry, search engines also strive to offer unique features. When in doubt, look for "help", "FAQ", or "about" links.

Search Engine Google
Yahoo! Search
Links to help Google help Yahoo! help Exalead help and FAQ

Size, type

IMMENSE. Size not disclosed in any way that allows comparison. Probably the biggest.

HUGE. Claims over 20 billion total "web objects." LARGE. Claims to have over 8 billion searchable pages.
Noteworthy features PageRank™ system includes hundreds of factors, emphasizing pages most heavily linked from other pages.
Many additional databases including Book Search, Scholar (journal articles), Blog Search, Patents, Images, etc.

Shortcuts give quick access to dictionary, synonyms, patents, traffic, stocks, encyclopedia, and more. Truncation lets you search by the first few letters of a word.
Proximity search lets you find terms NEAR each other or NEXT to each other.
Thumbnail page previews.
Extensive options for refining and limiting your search.
Phrase searching
what's this?

Enclose phrase in "double quotes".

Enclose phrase in "double quotes". Enclose phrase in "double quotes".
Boolean logic
what's this?

Partial. AND assumed between words.
Capitalize OR.
( ) accepted but not required.
In Advanced Search, partial Boolean available in boxes.

Accepts AND, OR, NOT or AND NOT. Must be capitalized.
( ) accepted but not required.

Partial. AND assumed between words.
Capitalize OR.
( ) accepted.
See Web Search Syntax for more options.
+Requires/ -Excludes
what's this?
- excludes 
+ retrieves "stop words" (e.g., +in)
- excludes 
+ will allow you to search common words: "+in truth"
- excludes 
+ retrieves "stop words" (e.g., +in)
what's this?
The search box at the top of the results page shows your current search. Modify this (e.g., add more terms at the end.) The search box at the top of the results page shows your current search. Modify this (e.g., add more terms at the end.) The search box at the top of the results page shows your current search. Modify this (e.g., add more terms at the end.)
Results Ranking
what's this?
Based on page popularity measured in links to it from other pages: high rank if a lot of other pages link to it.
Fuzzy AND also invoked.
Matching and ranking based on "cached" version of pages that may not be the most recent version.
Automatic Fuzzy AND. Popularity ranking emphasizes pages most heavily linked from other pages.
Field limiting
what's this?

Offers U.S.Gov't Search and other special searches. Patent search.

(Explanation of these distinctions.)

after:[time period]
before:[time period]
(For details, click on "Advanced search")
(what's this?)
No truncation. Stems some words. Search variant endings and synonyms separately, separating with OR (capitalized):
airline OR airlines
Neither. Search with OR as in Google. Use *
example: messag*
Language Yes. Major Romanized and non-Romanized languages in Advanced Search. Yes. Major Romanized and non-Romanized languages. Extensive language and geographic options. Use "Advanced Search".
Translation Yes, in "Translate this page" link following some pages. To and sometimes from English and major European languages and Chinese, Japanese, Korean. Ues its own translation software with user feedback. Available as a separate service. Yes, in "Translate this page" link following some pages.


You may also wish to consult "What Makes a Search Engine Good?" - a table (PDF file) summarizing useful factors for evaluating search engines.

How do Search Engines Work?

Search engines do not really search the World Wide Web directly. Each one searches a database of web pages that it has harvested and cached. When you use a search engine, you are always searching a somewhat stale copy of the real web page. When you click on links provided in a search engine's search results, you retrieve the current version of the page.

Search engine databases are selected and built by computer robot programs called spiders. These "crawl" the web, finding pages for potential inclusion by following the links in the pages they already have in their database. They cannot use imagination or enter terms in search boxes that they find on the web.

If a web page is never linked from any other page, search engine spiders cannot find it. The only way a brand new page can get into a search engine is for other pages to link to it, or for a human to submit its URL for inclusion. All major search engines offer ways to do this.

After spiders find pages, they pass them on to another computer program for "indexing." This program identifies the text, links, and other content in the page and stores it in the search engine database's files so that the database can be searched by keyword and whatever more advanced approaches are offered, and the page will be found if your search matches its content.

Many web pages are excluded from most search engines by policy. The contents of most of the searchable databases mounted on the web, such as library catalogs and article databases, are excluded because search engine spiders cannot access them. All this material is referred to as the "Invisible Web" -- what you don't see in search engine results.

