3,537 research outputs found
Evaluating the retrieval effectiveness of Web search engines using a representative query sample
Search engine retrieval effectiveness studies are usually small-scale, using
only limited query samples. Furthermore, queries are selected by the
researchers. We address these issues by taking a random representative sample
of 1,000 informational and 1,000 navigational queries from a major German
search engine and comparing Google's and Bing's results based on this sample.
Jurors were found through crowdsourcing, data was collected using specialised
software, the Relevance Assessment Tool (RAT). We found that while Google
outperforms Bing in both query types, the difference in the performance for
informational queries was rather low. However, for navigational queries, Google
found the correct answer in 95.3 per cent of cases whereas Bing only found the
correct answer 76.6 per cent of the time. We conclude that search engine
performance on navigational queries is of great importance, as users in this
case can clearly identify queries that have returned correct results. So,
performance on this query type may contribute to explaining user satisfaction
with search engines
Why We Read Wikipedia
Wikipedia is one of the most popular sites on the Web, with millions of users
relying on it to satisfy a broad range of information needs every day. Although
it is crucial to understand what exactly these needs are in order to be able to
meet them, little is currently known about why users visit Wikipedia. The goal
of this paper is to fill this gap by combining a survey of Wikipedia readers
with a log-based analysis of user activity. Based on an initial series of user
surveys, we build a taxonomy of Wikipedia use cases along several dimensions,
capturing users' motivations to visit Wikipedia, the depth of knowledge they
are seeking, and their knowledge of the topic of interest prior to visiting
Wikipedia. Then, we quantify the prevalence of these use cases via a
large-scale user survey conducted on live Wikipedia with almost 30,000
responses. Our analyses highlight the variety of factors driving users to
Wikipedia, such as current events, media coverage of a topic, personal
curiosity, work or school assignments, or boredom. Finally, we match survey
responses to the respondents' digital traces in Wikipedia's server logs,
enabling the discovery of behavioral patterns associated with specific use
cases. For instance, we observe long and fast-paced page sequences across
topics for users who are bored or exploring randomly, whereas those using
Wikipedia for work or school spend more time on individual articles focused on
topics such as science. Our findings advance our understanding of reader
motivations and behavior on Wikipedia and can have implications for developers
aiming to improve Wikipedia's user experience, editors striving to cater to
their readers' needs, third-party services (such as search engines) providing
access to Wikipedia content, and researchers aiming to build tools such as
recommendation engines.Comment: Published in WWW'17; v2 fixes caption of Table
Language Centres, Online Authentic Materials and Learners’ Needs: Improving Autonomy and Discovery in Language Learning
Unlike a few decades ago, using our phones, tablets, phablets or computers, all sorts of foreign language authentic materials are now easily available and accessible outside our language centres. Nevertheless, learners might find it difficult to select what is more effective for their learning process and be daunted by some complex features of naturally occurring language. This paper draws on previous studies and personal teaching experience to suggest that, in order to fully exploit these resources, language centres should aim at helping learners increase their ability in dealing with online authentic materials inside and outside the centre’s premises. In this perspective, they might consider the introduction of a relatively new approach based on corpora and online resources to enhance the learner’s autonomy and confidence when dealing with online authentic and unfiltered materials in a foreign language
Crowdsourcing based curation and user engagement in digital library design
A historical perspective on the development and success of Trove, including the original idea concept, the user centric design principles and social engagement with thousands of volunteers. The Trove service which is now ten years old is used by millions of Australians with the digitised Australian newspapers being the most popular resource.
Rose Holley, Special Collections Curator at UNSW Canberra discusses the findings of her research into crowdsourcing based curation. Using the digitised historic Australian Newspapers as an example, she looks at how the functionality and interface was developed in close relationship with the users, and how this led on to text correction of newspaper articles. It is nearly ten years since this pioneering project began and the motivations and achievements of the 50,000 volunteers are examined over this time. She questions how successfully the goal of improving text quality and therefore search has been achieved. She proposes that if a similar project was begun now then artificial intelligence software would be used such as OverProof post OCR correction tool to improve the quality of the text. OverProof has been trained on the manual corrections of the Australian newspaper corpus and trials demonstrate it is able to dramatically improve the quality of the corpus. Volunteer text correction could still continue afterwards for difficult text but the software would do the main donkey work, allowing users to have a better quality search
WikiChat: Stopping the Hallucination of Large Language Model Chatbots by Few-Shot Grounding on Wikipedia
This paper presents the first few-shot LLM-based chatbot that almost never
hallucinates and has high conversationality and low latency. WikiChat is
grounded on the English Wikipedia, the largest curated free-text corpus.
WikiChat generates a response from an LLM, retains only the grounded facts,
and combines them with additional information it retrieves from the corpus to
form factual and engaging responses. We distill WikiChat based on GPT-4 into a
7B-parameter LLaMA model with minimal loss of quality, to significantly improve
its latency, cost and privacy, and facilitate research and deployment.
Using a novel hybrid human-and-LLM evaluation methodology, we show that our
best system achieves 97.3% factual accuracy in simulated conversations. It
significantly outperforms all retrieval-based and LLM-based baselines, and by
3.9%, 38.6% and 51.0% on head, tail and recent knowledge compared to GPT-4.
Compared to previous state-of-the-art retrieval-based chatbots, WikiChat is
also significantly more informative and engaging, just like an LLM.
WikiChat achieves 97.9% factual accuracy in conversations with human users
about recent topics, 55.0% better than GPT-4, while receiving significantly
higher user ratings and more favorable comments.Comment: Findings of EMNLP 202
GOOGLE TRENDS DATA AS A PROXY FOR INTEREST IN LEADERSHIP
The purpose of this quantitative study was to investigate the observable patterns of online search behavior in the topic of leadership using Google Trends data. Institutions have had a historically difficult time predicting good leadership candidates. Better predictions can be made by using the big data offered by groups such as Google to learn who, where, and when people are interested in leadership. The study utilized descriptive, comparative, and correlative methodologies to study Google users’ interest in leadership from 2004 to 2017. Society has placed great value into leadership throughout history, and though overall interest remains strong, it appears that the expression of that interest may have changed over time. Key findings revealed that interest in leadership often peaks during the spring and fall seasons while dipping during the summer and the winter holiday seasons. Leadership interest also appears to be more concentrated in geographic locations that home certain universities and political arenas
Wikipedia in the eyes of its beholders: A systematic review of scholarly research on wikipedia readers and readership
Hundreds of scholarly studies have investigated various aspects of the immensely popular Wikipedia. Although a number of literature reviews have provided overviews of this vast body of research, none of them has specifically focused on the readers of Wikipedia and issues concerning its readership. In this systematic literature review, we review 99 studies to synthesize current knowledge regarding the readership of Wikipedia and also provide an analysis of research methods employed. The scholarly research has found that Wikipedia is popular not only for lighter topics such as entertainment, but also for more serious topics such as health information and legal background. Scholars, librarians and students are common users of Wikipedia, and it provides a unique opportunity for educating students in digital literacy. We conclude with a summary of key findings, implications for researchers, and implications for the Wikipedia community
- …