123,189 research outputs found

    Text categorization and similarity analysis: similarity measure, literature review

    Get PDF
    Document classification and provenance has become an important area of computer science as the amount of digital information is growing significantly. Organisations are storing documents on computers rather than in paper form. Software is now required that will show the similarities between documents (i.e. document classification) and to point out duplicates and possibly the history of each document (i.e. provenance). Poor organisation is common and leads to situations like above. There exists a number of software solutions in this area designed to make document organisation as simple as possible. I'm doing my project with Pingar who are a company based in Auckland who aim to help organise the growing amount of unstructured digital data. This reports analyses the existing literature in this area with the aim to determine what already exists and how my project will be different from existing solutions

    Systematic reviews of health effects of social interventions: 1. Finding the evidence: how far should you go?

    Get PDF
    Study objective: There is little guidance on how to identify useful evidence about the health effects of social interventions. The aim of this study was to assess the value of different ways of finding this type of information. Design: Retrospective analysis of the sources of studies for one systematic review. Setting: Case study of a systematic review of the effectiveness of interventions in promoting a population shift from using cars towards walking and cycling. Main results: Only four of the 69 relevant studies were found in a "first-line" health database such as Medline. About half of all relevant studies were found through the specialist Transport database. Nine relevant studies were found through purposive internet searches and seven relevant studies were found by chance. The unique contribution of experts was not to identify additional studies, but to provide more information about those already found in the literature. Conclusions: Most of the evidence needed for this review was not found in studies indexed in familiar literature databases. Applying a sensitive search strategy across multiple databases and interfaces is very labour intensive. Retrospective analysis suggests that a more efficient method might have been to search a few key resources, then to ask authors and experts directly for the most robust reports of studies identified. However, internet publications and serendipitous discoveries did make a significant contribution to the total set of relevant evidence. Undertaking a comprehensive search may provide unique evidence and insights that would not be obtained using a more focused search

    Reading Wikipedia to Answer Open-Domain Questions

    Full text link
    This paper proposes to tackle open- domain question answering using Wikipedia as the unique knowledge source: the answer to any factoid question is a text span in a Wikipedia article. This task of machine reading at scale combines the challenges of document retrieval (finding the relevant articles) with that of machine comprehension of text (identifying the answer spans from those articles). Our approach combines a search component based on bigram hashing and TF-IDF matching with a multi-layer recurrent neural network model trained to detect answers in Wikipedia paragraphs. Our experiments on multiple existing QA datasets indicate that (1) both modules are highly competitive with respect to existing counterparts and (2) multitask learning using distant supervision on their combination is an effective complete system on this challenging task.Comment: ACL2017, 10 page

    Answering Complex Questions by Joining Multi-Document Evidence with Quasi Knowledge Graphs

    No full text
    Direct answering of questions that involve multiple entities and relations is a challenge for text-based QA. This problem is most pronounced when answers can be found only by joining evidence from multiple documents. Curated knowledge graphs (KGs) may yield good answers, but are limited by their inherent incompleteness and potential staleness. This paper presents QUEST, a method that can answer complex questions directly from textual sources on-the-fly, by computing similarity joins over partial results from different documents. Our method is completely unsupervised, avoiding training-data bottlenecks and being able to cope with rapidly evolving ad hoc topics and formulation style in user questions. QUEST builds a noisy quasi KG with node and edge weights, consisting of dynamically retrieved entity names and relational phrases. It augments this graph with types and semantic alignments, and computes the best answers by an algorithm for Group Steiner Trees. We evaluate QUEST on benchmarks of complex questions, and show that it substantially outperforms state-of-the-art baselines

    Analogy Mining for Specific Design Needs

    Full text link
    Finding analogical inspirations in distant domains is a powerful way of solving problems. However, as the number of inspirations that could be matched and the dimensions on which that matching could occur grow, it becomes challenging for designers to find inspirations relevant to their needs. Furthermore, designers are often interested in exploring specific aspects of a product-- for example, one designer might be interested in improving the brewing capability of an outdoor coffee maker, while another might wish to optimize for portability. In this paper we introduce a novel system for targeting analogical search for specific needs. Specifically, we contribute a novel analogical search engine for expressing and abstracting specific design needs that returns more distant yet relevant inspirations than alternate approaches

    Matching Queries to Frequently Asked Questions: Search Functionality for the MRSA Web-Portal

    Get PDF
    As part of the long-term EUREGIO MRSA-net project a system was developed which enables health care workers and the general public to quickly find answers to their questions regarding the MRSA pathogen. This paper focuses on how these questions can be answered using Information Retrieval (IR) and Natural Language Processing (NLP) techniques on a Frequently-Asked-Questions-style (FAQ) database
    corecore