Evaluating Search Engines: Their Tools and their Results
March 20, 2010
Evaluating Search Engines: Their Tools and their Results
While Google, Yahoo! and Bing are the three search engines currently dominating the world of internet search there are other engines out there, and they’re also claiming to be equally capable of accessing an enormous volume of valuable and unique information at very impressive speeds in creative ways, and with the nonprofessional researcher in mind. For a simple search such as tomorrow’s weather or where to go for dinner, we can go almost anywhere on the web for the answer. But when we’re searching for information of a more scholarly nature choosing the “right” search engine becomes extremely important, a decision that impacts whether or not we receive relevant and reliable information.
In this paper I will examine three single search engines: Google, Bing and Exalead, and one meta search engine, Dogpile. I’ve chosen Google because it’s the biggest and most popular, Bing because although a relative newcomer, it seems to be quickly gaining in market share and worthy of evaluation, and Exalead because while previously an unknown (to me), it’s receiving good reviews and appears to have a fresh interface with a lot to offer. I also chose the meta searcher Dogpile because it gets results from Google, Yahoo!, Bing MSN Search, MIVA, Looksmart and Ask which are the most popular search engines of 2010. It also claims to search 50% more of the web than a single engine.
What features do these web search engines offer? How do they differ from one another in performance? What makes a search engine good? Should we use more than one search engine? These are some of the questions I will address.
Objectives of Study
To analyze the quality of the tools and features available on Google, Bing, Exalead and Dogpile in doing an advanced and scholarly search of the compound search topic, “Ethics and the Tuskegee Syphilis Study”.
To evaluate the performance of Google, Bing, Exalead and Dogpile with particular attention to the relevance of the end product, the quality of retrieved information and user satisfaction.
Method
Research was done on the variety of search engines currently available for free on the Internet. I investigated several sites that had favorable reviews such as Excite, Lycos, Wold, and Yebol, but for a variety of reasons ruled them out. They were either not up to date or their contact information seemed undocumented. The Web is constantly changing and the best way to navigate it is to constantly stay current, always reading, surfing, and experimenting with what is out there. Change is the only constant.
Several search engines which had received favorable reviews only a short time ago are no longer maintained. All four search engines were evaluated March 10th, March 13th, and March 14th so as to check for consistent search results from day to day. Internet Explorer was the only browser used in the search evaluations although I did do some comparison testing between Internet Explorer and Safari.
Search Engines for Study
Google http://www.google.com
Google's mission is ambitious, “to organize the world's information and make it universally accessible and useful” (Google 2010). The search engine company was founded in 1998 in Mountain View, California by Larry Page and Serey Brin, two Stanford University PhD. candidates. Google currently processes over one billion searcher requests every day, controls 65% of the search market, and claims to be three times larger than any other search engine. It uses PageRank, an algorithm that helps determine the relative importance of a web page. A big part of the retrieval is based on popularity of the site.
Bing http://bing.com
Bing (and Decide), calls itself a decision engine, and is owned and operated by Microsoft. “It‘s the go to search engine for those who need to make a decision quickly.” (Bing 2010) Steve Ballmer is currently the CEO. It is a relative newcomer in the field of internet searching, having just launched in June of 2009. It currently commands 11.5% (Bing February 2010) of the market share, behind Google and gaining quickly on Yahoo!. It’s searching is based on semantic technology from Powerset which uses natural language. Rather than employing a keyword it uses a targeted answer strategy.
Exalead http://exalead.com
A French firm founded in 2000 by search engine pioneer, Francoise Bourdoncle, Exalead is a search engine which attempts to add value to search results by displaying statistics and information about the results along the left-hand side of the page. “It’s goal is to bring structure, meaning and accessibility to previously unused or under-utilized data in the disparate, heterogeneous enterprise information cloud.” (Exalead 2010) Their front page is clean with bookmarks to CNN, BBC, the weather forecast and the Guardian, as well as a space for a custom page, a clear indication that the target audience may be both European and business oriented. The index size, as of June 2009, (undocumented) is over 8 billion pages, 2 billion images, and 200 million videos.
Dogpile http://dogpile.com
Dogpile is a meta search site which searches Google, Yahoo!, Bing MSN Search, About, MIVA (an ex-commerce directory), Looksmart (a directory) and Ask, claiming to “fetch” with a single click. It filters for duplication and reports that 88% of the top search results are unique to one of the four major (Google, Yahoo!, Bing and Ask) search providers. At the end of each retrieved search Dogpile displays which search engine provided the retrieved information. If more than one engine retrieved the source, this is also indicated. The site was developed by Aaron Flin and started operations in November 1996. It is currently owned by Infospace which owns other consumer brands including WebCrawler, DoGreatGood, and MetaCrawler.
Search Engines Tools
The following table summarizes what I discovered in my research, and compares the different tools and strategies used by Google, Bing, Exalead and Dogpile. I tested all of these features at some point in my research, although I may/may not have been able to include all of with my final results.
Search Engine
Bing
Exalead
Dogpile
Metasearcher
Size
IMMENSE
always growing -
billions of pages
LARGE,
but not nearly as big as Google
SMALLER
database -
8 billion pages
accesses pages from multiple engines
Search Technology
PageRank
(page popularity)
Powerset
(semantic)
regular expression
searches with other engines
technologies
Help links, FAQs
lots and lots, and all kinds
limited
good
very little
Can you choose document type? .pdf, word, .ppt, etc.
yes can request .pdf, .ps, .dwf, .xls, .ppt, .doc, .rtf, and .swf
include document type in search box
pdf, swf, word, ps, rtf, doc, swf & ,more the choices appear when that format is available in you search
recognizes document preference
Limit domain/site type? [edu, gov,etc.]
yes
could not find this option
yes
yes
Limit by exact phrase?
uses “double quotes” to create phrases
uses “double quotes” to create phrases
uses “double quotes” for exact phrase
uses “double quotes” for exact word phrase
Boolean Operators
assumes AND
accepts OR,
[ ] accepted, but not necessary
did not work well with Boolean operators
assumes AND,
[ ] accepted
supports what the single search engines require
Punctuation
&
Symbols
does not recognize symbols or punctuation -some exceptions
does not recognize symbols or punctuation -some exceptions
does not recognize symbols or punctuation -some exceptions
does not recognize symbols or punctuation -some exceptions
Case sensitive?
no
no
no
no
Can you add terms as you search?
easy to add/subtract terms
easy to add/subtract terms
easy to add/subtract terms
easy to add/subtract terms
Where do
you want to find it?
link:
site:
intitle:
inurl:
no
link:
site:
intitle:
inurl:
after/before: time period
could not find this option, but does seem to accept terms when you type in search box
Truncation
Wildcard*
Proximity?
use OR for different word endings uses * for initials
no stemming or truncation
uses * for truncation, also has proximity searching
no mention
of these
search tools
Filters
adult content
adult content
adult content
adult content
Special features
Scholar has timeline & Wonder Wheel search by Gov., Univ., patent, etc. etc.
great multimedia searching, photos are arranged so you can view all choices at once.
provides excellent narrowing options to help refine search
tricks of the trade
view of other popular searches
Chart Format Based on UC Berkeley - Teaching Library Internet Workshops
The Search
I chose to do a search on a compound/non-unitary subject that crossed several disciplines of study: historical, medical, philosophical, and political.
Search Topic - The Tuskegee Syphilis Study and Ethics
Search Using Google
I started my search in Google with a simple query, tuskegee. Google immediately provided several suggestions to help me narrow my search, the two most relevant to the search were:
tuskegee experiment results 320,000 (.44 seconds)
tuskegee syphilis study results 65,000 (.14 seconds)
I then refined search with:
[“tuskegee syphilis study” OR “tuskegee experiment”] AND [ethics OR bioethics] results 26,600 (.37 seconds)
[intitle:tuskegee syphilis study OR intitle:tuskegee experiment] AND [ethics OR bioethics] results 1760 (1.16 seconds)
Then:
[intitle:tuskegee syphilis study OR intitle:tuskegee experiment] [ethics OR bioethics] AND “human experiment” results 498 (.45 seconds)
[intitle:tuskegee syphilis study OR intitle:tuskegee experiment] [ethics OR bioethics] human experiment racism results 141 (.26 seconds)
[intitle:tuskegee syphilis study OR intitle:tuskegee experiment] [ethics OR bioethics] human experiment racism “black men” results 110 (.26 seconds)
Analysis of Retrieved information for Google Search
While 110 results is a still a lot of pages to examine when you‘re doing research, it is much more manageable than the original 320,000. The descriptions of the web pages are not all that informative at first glance. The url is listed along with a brief description of the page. All of the search words are bolded. If you have chosen a particular file type this is also included and bolded. There is also a link to similar web pages which will take you to new related links.
However if you click on “show options” in the upper left hand corner you will be able to view some wonderful options to help you either expand or limit your search. You can choose the types of results and the age of your results. From standard view you can look at the timeline, related searches, and the wonder wheel, all of great use if you’re unfamiliar with your topic and need suggestions for more query terms.
From standard results you can greatly enhance your results page by adding page preview which gives a nice thumbnail of the searched page; it also expands the abstract of the page providing much more detailed information. With the images option you can see images included on the page.
Still, the results from Google were not all that impressive for a scholarly paper. The first sources listed were Wikipedia and the review of a book on Amazon. Cited numbers for the articles (included in the abstract) were also quite low. I decided to dig a little deeper by clicking on the advanced search link.
Advanced search takes you to a page with fields to fill in. It automatically brings the search you started in Google Search. A link for help in Advanced Search provides you with Boolean Logic rules and explains how to limit your search by domain/site and file.
Search Using Advanced Search in Google
Fill in the fields on the advanced search page
Search using all these words: tuskegee-syphilis-study
This exact wording or phrase: tuskegee experiment
One or more of these words: ethics OR bioethics
File type - any
Search within a site or domain - left blank, as can only put in a single domain
results 2,210 (.30 seconds)
Decided to refine search by specifying domain type - experimented with .edu, .gov, .org and found that .edu gave “best” results.
Search using all these words: tuskegee-syphilis-study
This exact wording or phrase: tuskegee experiment
One or more of these words: ethics OR bioethics
File type - any
Search within a site or domain - .edu
results 276 (.39 seconds)
Further Refined
Search using all these words: nonconsensual
This exact wording or phrase: tuskegee-syphilis-experiment
One or more of these words: ethics OR bioethics OR racism
File type - any
Search within a site or domain - .edu
Results 122 (.44 seconds)
Analysis of Retrieved Information Using Advanced Search Google
This advanced search gave relevant scholarly documents that would provide helpful background on a research paper. The thumbnail, expanded abstract, image view, url, contact information, and citations were all available in Advanced Search. Wikipedia dropped off the list as did some basic information sites. The pages retrieved at the top were scholarly with good bibliographies, helpful for expanded research. In many cases primary sources such as interviews and letters from key players in the Tuskegee Study were referenced.
Search Using Google Scholar
[tuskegee syphilis study OR tuskegee experiment] ethics OR bioethics results 38,909 (.16 seconds)
[intitle:tuskegee syphilis study OR intitle:tuskegee experiment] [ethics OR bioethics] results 89 (.34 seconds)
Analysis of Retrieved Information Using Google Scholar
After refining my search and experimenting with the best terms, domains, and document types it was easy to plug the information into Google Scholar for a quick comparison of results. I actually used fewer terms and did not limit by domain or document type, so while the search had fewer limits I actually got better quality and more precise results. I was able to retrieve .gov,.edu, and .org documents.
Search Using Bing
[“tuskegee syphilis experiment” OR “tuskegee study”] results 36,000
[“tuskegee syphilis experiment” OR “tuskegee study”] [bioethics OR ethics] results 5,040,000
I decided to try the search without [ ] and was surprised by the results. Bing says parantheses are not necessary but I think they are not supported. I had much better results when I left them off.
"tuskegee syphilis experiment" AND ethics results 7,690
"tuskegee syphilis experiment" and ethics results 3,420
Analysis of Information Retrieved Using Bing
I experimented with many more terms both ANDing and ORing. AND seems to be the default operator for Bing. My search grew very large when I added “racism” “human experimentation” “nonconsensual”, and other terms that gave me good results in other search engines. The results list, while too large was easy to navigate. By moving the cursor to the right of the retrieved page a “more on this page” feature came into view. This had very complete information with content, keys, contacts, etc. all included. But Bing was giving me such huge returns that I decided it was time to move on to an advanced Bing search.
Search Using Advanced Search in Bing
[“tuskegee syphilis experiment” OR “tuskegee study”] [bioethics OR ethics] human experimentation results 2,930
[“tuskegee syphilis experiment” OR “tuskegee study”] [bioethics OR ethics] human experimentation nonconsensual results 30
[“tuskegee syphilis experiment” OR “tuskegee study”] [bioethics OR ethics] human experimentation cover-up results 134
Analysis of Retrieved Information Using Advanced Search Bing
Advanced Search Bing makes suggestions for terms to refine your search which was helpful. From the advanced search page you are also able to limit/expand your search terms, country, domain/site, and language. The options for the returns did not change from Bing Search, still had the nice “more on this page’ feature but no ability to look at a thumbnail or view images from the retrieved website.
Search Using Exalead
“tuskegee syphilis study” results - 4,566
“tuskegee syphilis study” AND ethics -results - 1,106
“tuskegee syphilis study” AND ethics AND nonconsensual results - 5
“tuskegee syphilis study” AND ethics AND racism - results 167
“tuskegee syphilis study” AND ethics AND racism AND 1972 - results 98
Analysis of Retrieved Information in Exalead/Advanced Search Exalead
I started in search but quickly moved to Advanced Search where they provided tips on “What are you looking for?” (exact phrase, exact word, etc. Boolean Logic) “Where do you want to find it?” (site, title, etc.) and “Which time?” from this page you can also select file type and site type. The results were relevant and of scholarly value. A thumbnail of the retrieved pages appears to the left of the abstract and includes all the information needed to make a decision on its value without having to visit the site.
Search Using Dogpile
Search term Tuskegee gave the following suggestion:
Tuskegee study and ethics
Analysis of Retrieved Information from Dogpile
I searched with this phrase plus many other variations I had used on the single search engines. The searches brought up pages of results, many of them irrelevant to the entered search topic. The list included some great examples of false drops, everything from how to cure cold sores, to where to get treated for syphilis, or how to be more ethical. Some good sites did appear further down the page and strong abstracts made it possible to evaluate the sites without leaving Dogpile. At the end of the abstract Dogpile credits the search engine(s) which found the site. It was very interesting to see how many of the websites listed only one search engine. Google had the lowest retrieval rate of the four search engines used for this particular search. Yahoo! was first, followed by Bing, and then Ask. Dogpile does not give the number of retrieved results at the top of the page, and digging deeper into their site indicated that the statistics bar provided those results, however I could not find the statistics bar.
Performance Evaluation
What Makes a Search Engine Good?
Search Engine
Bing
Exalead
Dogpile
% of Web Searched ?
Small %
of total web
Small %
of total web
Smaller %
of total web
50% more than a single search - still small %
Are pages refreshed?
search option of last 24 hours
unknown
questionable
unknown
Is database full text?
complete
complete
complete
complete
Is every word indexed?
yes
yes
yes
yes
Speed of retrieval
in hundredths of a second - displayed
very fast
very fast
very fast
Different results from day to day
different days retrieved different # of results
different days retrieved different # of results
different days retrieved different # of results
different days retrieved different # of results
Pages ranked by popularity, relevancy or both?
popularity,
then relevancy
relevancy
relevancy
relevancy
Search terms highlighted?
bolded
bolded
bolded
bolded
Gives a thumbnail of site from search page?
thumbnail with images too, also a more info pullout
no
thumbnail
thumbnail
to left of abstract
thumbnail
Readable abstract [URL, contact info, outline, keys, etc.]
in page preview able to view complete info
In Scholar incl. sited #
has some
relevant info, plus ‘more on this page” feature
includes cached, date, bookmark, etc.
the page is packed
provides a lot of pertinent information
Is the site easy to navigate?
yes
lots of info
yes, but sparse
simple site -easy to navigate
yes, packs a lot of info
Is the site visually pleasing?
logo is fun
beautiful clean front page
uncluttered front page
fun graphics with Arfie
Are there tutorials, FAQs and help pages?
plethora of info including blogs, discussions and help forums
limited
good amount of help/support
limited
had to “dig” help page
Can you customize look of the results ? [images, etc.]
yes
yes,
images well supported
yes
yes,
images are included
Languages
includes major lang.
+ Pig Latin Elmer Fudd, and Klingon
lists all
major
languages
lists all
major
languages
lists all
major
languages
Discussion
It’s extremely important to explore all of the different features when using a search engine for the first time. Take tutorials, read the help pages and the FAQs. It’s also important to visit the ’about’ page of a new or unknown website. I had planned to evaluate Yebol as one of my three websites for this project. I loved their page layout which is set up like a newspaper, with related topics on the right, other features to the left. At first glance it seemed to be a great website for a Readers’ Advisory as it listed authors with their works plus other read alike authors. However when I read that their name comes from Yebol (Yesheah [Jesus in Hebrew] is the Bread of Life) I was concerned there could be some filtering going on. What if I needed to research a non-Christian topic?
Before starting this project, the only search engine I had used previously was Google and Google Scholar, and even then I didn‘t realize all that Google could do. I am impressed with Google’s ability to both refine the search by limiting search topics and domains and also expand searching through a variety of media platforms from blogs, discussions, video twitting etc. At first glance the wonder wheel features and timeline seemed a little gimmicky, but the timeline actually proved to be very helpful in pinpointing the best dates to search for documents. For the Tuskegee Syphilis Experiment, 1930 was the start date of the study so it‘s not surprising a lot is written. But then for the rest of the 42 year experiment very little. Then after the experiment is exposed in 1972 there is a huge amount of literature. I would not have intuitively understood the best years for finding information. The timeline was instrumental in helping me understand this.
After spending a lot of time on Google, moving to Bing was an adjustment. The site is beautiful, the front page a piece of art, the images search full of great photos and graphics that you can view at a glance, but the help and FAQs are so sparse and difficult to find you wonder if you’re taking full advantage of all of the features available. Bing does offer suggestions if you misspell your search terms and has recommendations for expanded searches, but otherwise advice is very limited. One suggestion on the help page “Can’t find an answer? Get additional support.” What does that mean? Where? Who? How? The tagline for Bing is - “The official site to make key decisions quick and easy.” And perhaps that’s exactly all it’s good for.
Exalead’s site is uncluttered and easy to navigate, a little easier to use for a novice, with the learning curve a little less steep. The information retrieved was comparable to that found on Google. The one piece of information Google provided on Scholar was the number of works citing the information retrieved. This is the one important evaluating tool that would make Google my first choice but otherwise Exalead has everything I would want in a search engine, clean user friendly layout with easy to use format and precise and relevant retrieval of materials.
Overlap studies indicate that 80% of pages in a major search engine’s database are unique so it is always wise to search your topic on more than one search engine. (UC Berkeley) Side by side sites such as Bing vs. Google or Twingine can help you evaluate the overlap of your particular search query. Experts on internet searching recommend against meta search engines and the results from Dogpile pile do seem to bear this out. The information retrieved was less relevant, often comprised of mostly wiki and general information sites, much more than that found on the single engines. I also had trouble narrowing my search, and recovered a lot of false drops. It would also seem that a meta search engine is only as good as the individual search engines it’s using (UC Berkeley). It provides no real new information, just combines and filters for redundancy. I don’t think I’ll be using meta search engines much in the near future, not until I hear that they have improved.
Conclusion and Recommendation
In conclusion I have to say search engines are not created equal. Bing was a real disappointment. The visuals on the site are stunning and I do think their media features offer a nice dimension to search. They did provide some wonderful NPR audio stories of the Tuskegee subjects that were valuable, but overall their text results were superficial and redundant. Dogpile too, was below par, and had a lot of false drops, and consistently picked up ethics over the Tuskgee-syphilis-study.
Google still comes out on top as providing the most relevant and precise searching, especially if you use Google Scholar, but with such little overlap of results between search engines it’s wise to investigate your topic on at least one other search engine you trust. Just like medical surgery, the more serious the operation (research) the more important it becomes to seek a second and even third opinion. I will continue to use Google/Google Scholar but will check other search engines as I read about newcomers. I will also keep my eye on Exalead as I like the both the user interface and the quality of documents I retrieved.
This was a very valuable exercise for me and while I learned a lot about search I’ve also learned that unlike riding a bike, you can’t learn it and know it for life. Search on the internet is something you need to practice constantly. The search engines change as do the tools. Hopefully in time the Web will get more organized and users will find better consistency across the different search engines.
References
http://www.bing.com
http://www.dogpile.com
http://www.exalead.com
http://www.google.com
Chu, H., & Rosenthal, M. Search Engines for the World Wide Web: a Comparative Study and Evaluation Methodology 08 Mar 2010 ASIS 1996 Annual Conference Proceedings <http://www.asis.org/annual-96/ElectronicProceedings/chu.html
"UC Berkeley Finding Information on the Internet." Recommended Search Engines . n.p., 07012010. Web. 08 Mar 2010. <http:www.lib.berkely.edu/TeachingLib/Guides/Internet/SearchEngines.html