3,537 research outputs found

    Evaluating the retrieval effectiveness of Web search engines using a representative query sample

    Full text link
    Search engine retrieval effectiveness studies are usually small-scale, using only limited query samples. Furthermore, queries are selected by the researchers. We address these issues by taking a random representative sample of 1,000 informational and 1,000 navigational queries from a major German search engine and comparing Google's and Bing's results based on this sample. Jurors were found through crowdsourcing, data was collected using specialised software, the Relevance Assessment Tool (RAT). We found that while Google outperforms Bing in both query types, the difference in the performance for informational queries was rather low. However, for navigational queries, Google found the correct answer in 95.3 per cent of cases whereas Bing only found the correct answer 76.6 per cent of the time. We conclude that search engine performance on navigational queries is of great importance, as users in this case can clearly identify queries that have returned correct results. So, performance on this query type may contribute to explaining user satisfaction with search engines

    Why We Read Wikipedia

    Get PDF
    Wikipedia is one of the most popular sites on the Web, with millions of users relying on it to satisfy a broad range of information needs every day. Although it is crucial to understand what exactly these needs are in order to be able to meet them, little is currently known about why users visit Wikipedia. The goal of this paper is to fill this gap by combining a survey of Wikipedia readers with a log-based analysis of user activity. Based on an initial series of user surveys, we build a taxonomy of Wikipedia use cases along several dimensions, capturing users' motivations to visit Wikipedia, the depth of knowledge they are seeking, and their knowledge of the topic of interest prior to visiting Wikipedia. Then, we quantify the prevalence of these use cases via a large-scale user survey conducted on live Wikipedia with almost 30,000 responses. Our analyses highlight the variety of factors driving users to Wikipedia, such as current events, media coverage of a topic, personal curiosity, work or school assignments, or boredom. Finally, we match survey responses to the respondents' digital traces in Wikipedia's server logs, enabling the discovery of behavioral patterns associated with specific use cases. For instance, we observe long and fast-paced page sequences across topics for users who are bored or exploring randomly, whereas those using Wikipedia for work or school spend more time on individual articles focused on topics such as science. Our findings advance our understanding of reader motivations and behavior on Wikipedia and can have implications for developers aiming to improve Wikipedia's user experience, editors striving to cater to their readers' needs, third-party services (such as search engines) providing access to Wikipedia content, and researchers aiming to build tools such as recommendation engines.Comment: Published in WWW'17; v2 fixes caption of Table

    Language Centres, Online Authentic Materials and Learners’ Needs: Improving Autonomy and Discovery in Language Learning

    Get PDF
    Unlike a few decades ago, using our phones, tablets, phablets or computers, all sorts of foreign language authentic materials are now easily available and accessible outside our language centres. Nevertheless, learners might find it difficult to select what is more effective for their learning process and be daunted by some complex features of naturally occurring language. This paper draws on previous studies and personal teaching experience to suggest that, in order to fully exploit these resources, language centres should aim at helping learners increase their ability in dealing with online authentic materials inside and outside the centre’s premises. In this perspective, they might consider the introduction of a relatively new approach based on corpora and online resources to enhance the learner’s autonomy and confidence when dealing with online authentic and unfiltered materials in a foreign language

    Crowdsourcing based curation and user engagement in digital library design

    Get PDF
    A historical perspective on the development and success of Trove, including the original idea concept, the user centric design principles and social engagement with thousands of volunteers. The Trove service which is now ten years old is used by millions of Australians with the digitised Australian newspapers being the most popular resource. Rose Holley, Special Collections Curator at UNSW Canberra discusses the findings of her research into crowdsourcing based curation. Using the digitised historic Australian Newspapers as an example, she looks at how the functionality and interface was developed in close relationship with the users, and how this led on to text correction of newspaper articles. It is nearly ten years since this pioneering project began and the motivations and achievements of the 50,000 volunteers are examined over this time. She questions how successfully the goal of improving text quality and therefore search has been achieved. She proposes that if a similar project was begun now then artificial intelligence software would be used such as OverProof post OCR correction tool to improve the quality of the text. OverProof has been trained on the manual corrections of the Australian newspaper corpus and trials demonstrate it is able to dramatically improve the quality of the corpus. Volunteer text correction could still continue afterwards for difficult text but the software would do the main donkey work, allowing users to have a better quality search

    WikiChat: Stopping the Hallucination of Large Language Model Chatbots by Few-Shot Grounding on Wikipedia

    Full text link
    This paper presents the first few-shot LLM-based chatbot that almost never hallucinates and has high conversationality and low latency. WikiChat is grounded on the English Wikipedia, the largest curated free-text corpus. WikiChat generates a response from an LLM, retains only the grounded facts, and combines them with additional information it retrieves from the corpus to form factual and engaging responses. We distill WikiChat based on GPT-4 into a 7B-parameter LLaMA model with minimal loss of quality, to significantly improve its latency, cost and privacy, and facilitate research and deployment. Using a novel hybrid human-and-LLM evaluation methodology, we show that our best system achieves 97.3% factual accuracy in simulated conversations. It significantly outperforms all retrieval-based and LLM-based baselines, and by 3.9%, 38.6% and 51.0% on head, tail and recent knowledge compared to GPT-4. Compared to previous state-of-the-art retrieval-based chatbots, WikiChat is also significantly more informative and engaging, just like an LLM. WikiChat achieves 97.9% factual accuracy in conversations with human users about recent topics, 55.0% better than GPT-4, while receiving significantly higher user ratings and more favorable comments.Comment: Findings of EMNLP 202

    Beyond Microsoft: Intellectual Property, Peer Production and the Law’s Concern with Market Dominance.

    Get PDF

    GOOGLE TRENDS DATA AS A PROXY FOR INTEREST IN LEADERSHIP

    Get PDF
    The purpose of this quantitative study was to investigate the observable patterns of online search behavior in the topic of leadership using Google Trends data. Institutions have had a historically difficult time predicting good leadership candidates. Better predictions can be made by using the big data offered by groups such as Google to learn who, where, and when people are interested in leadership. The study utilized descriptive, comparative, and correlative methodologies to study Google users’ interest in leadership from 2004 to 2017. Society has placed great value into leadership throughout history, and though overall interest remains strong, it appears that the expression of that interest may have changed over time. Key findings revealed that interest in leadership often peaks during the spring and fall seasons while dipping during the summer and the winter holiday seasons. Leadership interest also appears to be more concentrated in geographic locations that home certain universities and political arenas

    Wikipedia in the eyes of its beholders: A systematic review of scholarly research on wikipedia readers and readership

    Get PDF
    Hundreds of scholarly studies have investigated various aspects of the immensely popular Wikipedia. Although a number of literature reviews have provided overviews of this vast body of research, none of them has specifically focused on the readers of Wikipedia and issues concerning its readership. In this systematic literature review, we review 99 studies to synthesize current knowledge regarding the readership of Wikipedia and also provide an analysis of research methods employed. The scholarly research has found that Wikipedia is popular not only for lighter topics such as entertainment, but also for more serious topics such as health information and legal background. Scholars, librarians and students are common users of Wikipedia, and it provides a unique opportunity for educating students in digital literacy. We conclude with a summary of key findings, implications for researchers, and implications for the Wikipedia community
    • …
    corecore