44,704 research outputs found

    ANTIQUE: A Non-Factoid Question Answering Benchmark

    Full text link
    Considering the widespread use of mobile and voice search, answer passage retrieval for non-factoid questions plays a critical role in modern information retrieval systems. Despite the importance of the task, the community still feels the significant lack of large-scale non-factoid question answering collections with real questions and comprehensive relevance judgments. In this paper, we develop and release a collection of 2,626 open-domain non-factoid questions from a diverse set of categories. The dataset, called ANTIQUE, contains 34,011 manual relevance annotations. The questions were asked by real users in a community question answering service, i.e., Yahoo! Answers. Relevance judgments for all the answers to each question were collected through crowdsourcing. To facilitate further research, we also include a brief analysis of the data as well as baseline results on both classical and recently developed neural IR models

    Finding Structured and Unstructured Features to Improve the Search Result of Complex Question

    Get PDF
    -Recently, search engine got challenge deal with such a natural language questions. Sometimes, these questions are complex questions. A complex question is a question that consists several clauses, several intentions or need long answer. In this work we proposed that finding structured features and unstructured features of questions and using structured data and unstructured data could improve the search result of complex questions. According to those, we will use two approaches, IR approach and structured retrieval, QA template. Our framework consists of three parts. Question analysis, Resource Discovery and Analysis The Relevant Answer. In Question Analysis we used a few assumptions, and tried to find structured and unstructured features of the questions. Structured feature refers to Structured data and unstructured feature refers to unstructured data. In the resource discovery we integrated structured data (relational database) and unstructured data (webpage) to take the advantaged of two kinds of data to improve and reach the relevant answer. We will find the best top fragments from context of the webpage In the Relevant Answer part, we made a score matching between the result from structured data and unstructured data, then finally used QA template to reformulate the question. In the experiment result, it shows that using structured feature and unstructured feature and using both structured and unstructured data, using approach IR and QA template could improve the search result of complex questions

    EveTAR: Building a Large-Scale Multi-Task Test Collection over Arabic Tweets

    Full text link
    This article introduces a new language-independent approach for creating a large-scale high-quality test collection of tweets that supports multiple information retrieval (IR) tasks without running a shared-task campaign. The adopted approach (demonstrated over Arabic tweets) designs the collection around significant (i.e., popular) events, which enables the development of topics that represent frequent information needs of Twitter users for which rich content exists. That inherently facilitates the support of multiple tasks that generally revolve around events, namely event detection, ad-hoc search, timeline generation, and real-time summarization. The key highlights of the approach include diversifying the judgment pool via interactive search and multiple manually-crafted queries per topic, collecting high-quality annotations via crowd-workers for relevancy and in-house annotators for novelty, filtering out low-agreement topics and inaccessible tweets, and providing multiple subsets of the collection for better availability. Applying our methodology on Arabic tweets resulted in EveTAR , the first freely-available tweet test collection for multiple IR tasks. EveTAR includes a crawl of 355M Arabic tweets and covers 50 significant events for which about 62K tweets were judged with substantial average inter-annotator agreement (Kappa value of 0.71). We demonstrate the usability of EveTAR by evaluating existing algorithms in the respective tasks. Results indicate that the new collection can support reliable ranking of IR systems that is comparable to similar TREC collections, while providing strong baseline results for future studies over Arabic tweets

    Reading Wikipedia to Answer Open-Domain Questions

    Full text link
    This paper proposes to tackle open- domain question answering using Wikipedia as the unique knowledge source: the answer to any factoid question is a text span in a Wikipedia article. This task of machine reading at scale combines the challenges of document retrieval (finding the relevant articles) with that of machine comprehension of text (identifying the answer spans from those articles). Our approach combines a search component based on bigram hashing and TF-IDF matching with a multi-layer recurrent neural network model trained to detect answers in Wikipedia paragraphs. Our experiments on multiple existing QA datasets indicate that (1) both modules are highly competitive with respect to existing counterparts and (2) multitask learning using distant supervision on their combination is an effective complete system on this challenging task.Comment: ACL2017, 10 page

    Machine learning research 1989-90

    Get PDF
    Multifunctional knowledge bases offer a significant advance in artificial intelligence because they can support numerous expert tasks within a domain. As a result they amortize the costs of building a knowledge base over multiple expert systems and they reduce the brittleness of each system. Due to the inevitable size and complexity of multifunctional knowledge bases, their construction and maintenance require knowledge engineering and acquisition tools that can automatically identify interactions between new and existing knowledge. Furthermore, their use requires software for accessing those portions of the knowledge base that coherently answer questions. Considerable progress was made in developing software for building and accessing multifunctional knowledge bases. A language was developed for representing knowledge, along with software tools for editing and displaying knowledge, a machine learning program for integrating new information into existing knowledge, and a question answering system for accessing the knowledge base

    Concept-based Interactive Query Expansion Support Tool (CIQUEST)

    Get PDF
    This report describes a three-year project (2000-03) undertaken in the Information Studies Department at The University of Sheffield and funded by Resource, The Council for Museums, Archives and Libraries. The overall aim of the research was to provide user support for query formulation and reformulation in searching large-scale textual resources including those of the World Wide Web. More specifically the objectives were: to investigate and evaluate methods for the automatic generation and organisation of concepts derived from retrieved document sets, based on statistical methods for term weighting; and to conduct user-based evaluations on the understanding, presentation and retrieval effectiveness of concept structures in selecting candidate terms for interactive query expansion. The TREC test collection formed the basis for the seven evaluative experiments conducted in the course of the project. These formed four distinct phases in the project plan. In the first phase, a series of experiments was conducted to investigate further techniques for concept derivation and hierarchical organisation and structure. The second phase was concerned with user-based validation of the concept structures. Results of phases 1 and 2 informed on the design of the test system and the user interface was developed in phase 3. The final phase entailed a user-based summative evaluation of the CiQuest system. The main findings demonstrate that concept hierarchies can effectively be generated from sets of retrieved documents and displayed to searchers in a meaningful way. The approach provides the searcher with an overview of the contents of the retrieved documents, which in turn facilitates the viewing of documents and selection of the most relevant ones. Concept hierarchies are a good source of terms for query expansion and can improve precision. The extraction of descriptive phrases as an alternative source of terms was also effective. With respect to presentation, cascading menus were easy to browse for selecting terms and for viewing documents. In conclusion the project dissemination programme and future work are outlined

    Beyond English text: Multilingual and multimedia information retrieval.

    Get PDF
    Non

    Planning strategically, designing architecturally : a framework for digital library services

    Get PDF
    In an era of unprecedented technological innovation and evolving user expectations and information seeking behaviour, we are arguably now an online society, with digital services increasingly common and increasingly preferred. As a trusted information provider, libraries are in an advantageous position to respond, but this requires integrated strategic and enterprise architecture planning, for information technology (IT) has evolved from a support role to a strategic role, providing the core management systems, communication networks, and delivery channels of the modern library. Further, IT components do not function in isolation from one another, but are interdependent elements of distributed and multidimensional systems encompassing people, processes, and technologies, which must consider social, economic, legal, organisational, and ergonomic requirements and relationships, as well as being logically sound from a technical perspective. Strategic planning provides direction, while enterprise architecture strategically aligns and holistically integrates business and information system architectures. While challenging, such integrated planning should be regarded as an opportunity for the library to evolve as an enterprise in the digital age, or at minimum, to simply keep pace with societal change and alternative service providers. Without strategy, a library risks being directed by outside forces with independent motivations and inadequate understanding of its broader societal role. Without enterprise architecture, it risks technological disparity, redundancy, and obsolescence. Adopting an interdisciplinary approach, this conceptual paper provides an integrated framework for strategic and architectural planning of digital library services. The concept of the library as an enterprise is also introduced
    corecore