38,428 research outputs found
Evaluating the retrieval effectiveness of Web search engines using a representative query sample
Search engine retrieval effectiveness studies are usually small-scale, using
only limited query samples. Furthermore, queries are selected by the
researchers. We address these issues by taking a random representative sample
of 1,000 informational and 1,000 navigational queries from a major German
search engine and comparing Google's and Bing's results based on this sample.
Jurors were found through crowdsourcing, data was collected using specialised
software, the Relevance Assessment Tool (RAT). We found that while Google
outperforms Bing in both query types, the difference in the performance for
informational queries was rather low. However, for navigational queries, Google
found the correct answer in 95.3 per cent of cases whereas Bing only found the
correct answer 76.6 per cent of the time. We conclude that search engine
performance on navigational queries is of great importance, as users in this
case can clearly identify queries that have returned correct results. So,
performance on this query type may contribute to explaining user satisfaction
with search engines
Data Mining in Electronic Commerce
Modern business is rushing toward e-commerce. If the transition is done
properly, it enables better management, new services, lower transaction costs
and better customer relations. Success depends on skilled information
technologists, among whom are statisticians. This paper focuses on some of the
contributions that statisticians are making to help change the business world,
especially through the development and application of data mining methods. This
is a very large area, and the topics we cover are chosen to avoid overlap with
other papers in this special issue, as well as to respect the limitations of
our expertise. Inevitably, electronic commerce has raised and is raising fresh
research problems in a very wide range of statistical areas, and we try to
emphasize those challenges.Comment: Published at http://dx.doi.org/10.1214/088342306000000204 in the
Statistical Science (http://www.imstat.org/sts/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Combining information seeking services into a meta supply chain of facts
The World Wide Web has become a vital supplier of information that allows organizations to carry on such tasks as business intelligence, security monitoring, and risk assessments. Having a quick and reliable supply of correct facts from perspective is often mission critical. By following design science guidelines, we have explored ways to recombine facts from multiple sources, each with possibly different levels of responsiveness and accuracy, into one robust supply chain. Inspired by prior research on keyword-based meta-search engines (e.g., metacrawler.com), we have adapted the existing question answering algorithms for the task of analysis and triangulation of facts. We present a first prototype for a meta approach to fact seeking. Our meta engine sends a user's question to several fact seeking services that are publicly available on the Web (e.g., ask.com, brainboost.com, answerbus.com, NSIR, etc.) and analyzes the returned results jointly to identify and present to the user those that are most likely to be factually correct. The results of our evaluation on the standard test sets widely used in prior research support the evidence for the following: 1) the value-added of the meta approach: its performance surpasses the performance of each supplier, 2) the importance of using fact seeking services as suppliers to the meta engine rather than keyword driven search portals, and 3) the resilience of the meta approach: eliminating a single service does not noticeably impact the overall performance. We show that these properties make the meta-approach a more reliable supplier of facts than any of the currently available stand-alone services
Measuring success of open source projects using web search engines
What makes an open source project successful?
In this paper we show that the traditional factors of success of open source projects, such as number of downloads, deployments or commits are sometimes inconvenient or even insufficient. We then correlate success of an open source project with its popularity on the Web. We show several ideas of how such popularity could be measured using Web search engines and provide experimental results from quantitative analysis of the proposed measures on representative large samples of open source projects from SourceForge
What Users See – Structures in Search Engine Results Pages
This paper investigates the composition of search engine results pages. We define what elements the most
popular web search engines use on their results pages (e.g., organic results, advertisements, shortcuts) and to
which degree they are used for popular vs. rare queries. Therefore, we send 500 queries of both types to the
major search engines Google, Yahoo, Live.com and Ask. We count how often the different elements are used by
the individual engines. In total, our study is based on 42,758 elements. Findings include that search engines use
quite different approaches to results pages composition and therefore, the user gets to see quite different results
sets depending on the search engine and search query used. Organic results still play the major role in the results
pages, but different shortcuts are of some importance, too. Regarding the frequency of certain host within the
results sets, we find that all search engines show Wikipedia results quite often, while other hosts shown depend
on the search engine used. Both Google and Yahoo prefer results from their own offerings (such as YouTube or
Yahoo Answers). Since we used the .com interfaces of the search engines, results may not be valid for other
country-specific interfaces
Exploring the academic invisible web
Purpose: To provide a critical review of Bergman's 2001 study on the Deep
Web. In addition, we bring a new concept into the discussion, the Academic
Invisible Web (AIW). We define the Academic Invisible Web as consisting of all
databases and collections relevant to academia but not searchable by the
general-purpose internet search engines. Indexing this part of the Invisible
Web is central to scientific search engines. We provide an overview of
approaches followed thus far. Design/methodology/approach: Discussion of
measures and calculations, estimation based on informetric laws. Literature
review on approaches for uncovering information from the Invisible Web.
Findings: Bergman's size estimate of the Invisible Web is highly questionable.
We demonstrate some major errors in the conceptual design of the Bergman paper.
A new (raw) size estimate is given. Research limitations/implications: The
precision of our estimate is limited due to a small sample size and lack of
reliable data. Practical implications: We can show that no single library alone
will be able to index the Academic Invisible Web. We suggest collaboration to
accomplish this task. Originality/value: Provides library managers and those
interested in developing academic search engines with data on the size and
attributes of the Academic Invisible Web.Comment: 13 pages, 3 figure
Being Omnipresent To Be Almighty: The Importance of The Global Web Evidence for Organizational Expert Finding
Modern expert nding algorithms are developed under the
assumption that all possible expertise evidence for a person
is concentrated in a company that currently employs the
person. The evidence that can be acquired outside of an
enterprise is traditionally unnoticed. At the same time, the
Web is full of personal information which is sufficiently detailed to judge about a person's skills and knowledge. In this work, we review various sources of expertise evidence out-side of an organization and experiment with rankings built on the data acquired from six dierent sources, accessible through APIs of two major web search engines. We show that these rankings and their combinations are often more realistic and of higher quality than rankings built on organizational data only
- …