Search CORE

123,189 research outputs found

Text categorization and similarity analysis: similarity measure, literature review

Author: Fowke Michael
Heese Ralf
Hinze Annika
Publication venue: University of Waikato, Department of Computer Science
Publication date: 01/12/2013
Field of study

Document classification and provenance has become an important area of computer science as the amount of digital information is growing significantly. Organisations are storing documents on computers rather than in paper form. Software is now required that will show the similarities between documents (i.e. document classification) and to point out duplicates and possibly the history of each document (i.e. provenance). Poor organisation is common and leads to situations like above. There exists a number of software solutions in this area designed to make document organisation as simple as possible. I'm doing my project with Pingar who are a company based in Auckland who aim to help organise the growing amount of unstructured digital data. This reports analyses the existing literature in this area with the aim to determine what already exists and how my project will be different from existing solutions

Research Commons@Waikato

Systematic reviews of health effects of social interventions: 1. Finding the evidence: how far should you go?

Author: Egan M.
Hamilton V.
Ogilvie D.
Petticrew M.
Publication venue: 'BMJ'
Publication date: 01/01/2005
Field of study

Study objective: There is little guidance on how to identify useful evidence about the health effects of social interventions. The aim of this study was to assess the value of different ways of finding this type of information. Design: Retrospective analysis of the sources of studies for one systematic review. Setting: Case study of a systematic review of the effectiveness of interventions in promoting a population shift from using cars towards walking and cycling. Main results: Only four of the 69 relevant studies were found in a "first-line" health database such as Medline. About half of all relevant studies were found through the specialist Transport database. Nine relevant studies were found through purposive internet searches and seven relevant studies were found by chance. The unique contribution of experts was not to identify additional studies, but to provide more information about those already found in the literature. Conclusions: Most of the evidence needed for this review was not found in studies indexed in familiar literature databases. Applying a sensitive search strategy across multiple databases and interfaces is very labour intensive. Retrospective analysis suggests that a more efficient method might have been to search a few key resources, then to ask authors and experts directly for the most robust reports of studies identified. However, internet publications and serendipitous discoveries did make a significant contribution to the total set of relevant evidence. Undertaking a comprehensive search may provide unique evidence and insights that would not be obtained using a more focused search

Crossref

LSHTM Research Online

PubMed Central

Enlighten

Reading Wikipedia to Answer Open-Domain Questions

Author: Bordes Antoine
Chen Danqi
Fisch Adam
Weston Jason
Publication venue
Publication date: 01/01/2017
Field of study

This paper proposes to tackle open- domain question answering using Wikipedia as the unique knowledge source: the answer to any factoid question is a text span in a Wikipedia article. This task of machine reading at scale combines the challenges of document retrieval (finding the relevant articles) with that of machine comprehension of text (identifying the answer spans from those articles). Our approach combines a search component based on bigram hashing and TF-IDF matching with a multi-layer recurrent neural network model trained to detect answers in Wikipedia paragraphs. Our experiments on multiple existing QA datasets indicate that (1) both modules are highly competitive with respect to existing counterparts and (2) multitask learning using distant supervision on their combination is an effective complete system on this challenging task.Comment: ACL2017, 10 page

arXiv.org e-Print Archive

Crossref

Answering Complex Questions by Joining Multi-Document Evidence with Quasi Knowledge Graphs

Author: Abujabal A.
Lu X.
Pramanik S.
Saha Roy R.
Wang Y.
Weikum G.
Publication venue
Publication date: 01/01/2019
Field of study

Direct answering of questions that involve multiple entities and relations is a challenge for text-based QA. This problem is most pronounced when answers can be found only by joining evidence from multiple documents. Curated knowledge graphs (KGs) may yield good answers, but are limited by their inherent incompleteness and potential staleness. This paper presents QUEST, a method that can answer complex questions directly from textual sources on-the-fly, by computing similarity joins over partial results from different documents. Our method is completely unsupervised, avoiding training-data bottlenecks and being able to cope with rapidly evolving ad hoc topics and formulation style in user questions. QUEST builds a noisy quasi KG with node and edge weights, consisting of dynamically retrieved entity names and relational phrases. It augments this graph with types and semantic alignments, and computes the best answers by an algorithm for Group Steiner Trees. We evaluate QUEST on benchmarks of complex questions, and show that it substantially outperforms state-of-the-art baselines

MPG.PuRe

Analogy Mining for Specific Design Needs

Author: Bahdanau Dzmitry
Bird Edward Loper
Blei David M.
Fu Katherine
Gentner Dedre
Gentner Dedre
Gentner Dedre
Guroff Margaret
Library
Mikolov Tomas
Pennington Jeffrey
Venema Vibeke
Publication venue
Publication date: 19/12/2017
Field of study

Finding analogical inspirations in distant domains is a powerful way of solving problems. However, as the number of inspirations that could be matched and the dimensions on which that matching could occur grow, it becomes challenging for designers to find inspirations relevant to their needs. Furthermore, designers are often interested in exploring specific aspects of a product-- for example, one designer might be interested in improving the brewing capability of an outdoor coffee maker, while another might wish to optimize for portability. In this paper we introduce a novel system for targeting analogical search for specific needs. Specifically, we contribute a novel analogical search engine for expressing and abstracting specific design needs that returns more distant yet relevant inspirations than alternate approaches

arXiv.org e-Print Archive

Crossref

Matching Queries to Frequently Asked Questions: Search Functionality for the MRSA Web-Portal

Author: Akker Rieks op den
Tigelaar Almer S.
Verhoeven Fenne
Publication venue: Centre for Telematics and Information Technology, University of Twente
Publication date: 01/01/2009
Field of study

As part of the long-term EUREGIO MRSA-net project a system was developed which enables health care workers and the general public to quickly find answers to their questions regarding the MRSA pathogen. This paper focuses on how these questions can be answered using Information Retrieval (IR) and Natural Language Processing (NLP) techniques on a Frequently-Asked-Questions-style (FAQ) database

University of Twente Research Information