50,533 research outputs found
Improving Information Extraction by Acquiring External Evidence with Reinforcement Learning
Most successful information extraction systems operate with access to a large
collection of documents. In this work, we explore the task of acquiring and
incorporating external evidence to improve extraction accuracy in domains where
the amount of training data is scarce. This process entails issuing search
queries, extraction from new sources and reconciliation of extracted values,
which are repeated until sufficient evidence is collected. We approach the
problem using a reinforcement learning framework where our model learns to
select optimal actions based on contextual information. We employ a deep
Q-network, trained to optimize a reward function that reflects extraction
accuracy while penalizing extra effort. Our experiments on two databases -- of
shooting incidents, and food adulteration cases -- demonstrate that our system
significantly outperforms traditional extractors and a competitive
meta-classifier baseline.Comment: Appearing in EMNLP 2016 (12 pages incl. supplementary material
Optical tomography: Image improvement using mixed projection of parallel and fan beam modes
Mixed parallel and fan beam projection is a technique used to increase the quality images. This research focuses on enhancing the image quality in optical tomography. Image quality can be defined by measuring the Peak Signal to Noise Ratio (PSNR) and Normalized Mean Square Error (NMSE) parameters. The findings of this research prove that by combining parallel and fan beam projection, the image quality can be increased by more than 10%in terms of its PSNR value and more than 100% in terms of its NMSE value compared to a single parallel beam
Query-Based Summarization using Rhetorical Structure Theory
Research on Question Answering is focused mainly on classifying the question type and finding
the answer. Presenting the answer in a way that suits the user’s needs has received little
attention. This paper shows how existing question answering systems—which aim at finding
precise answers to questions—can be improved by exploiting summarization techniques to extract
more than just the answer from the document in which the answer resides. This is done
using a graph search algorithm which searches for relevant sentences in the discourse structure,
which is represented as a graph. The Rhetorical Structure Theory (RST) is used to create a
graph representation of a text document. The output is an extensive answer, which not only
answers the question, but also gives the user an opportunity to assess the accuracy of the answer
(is this what I am looking for?), and to find additional information that is related to the question,
and which may satisfy an information need. This has been implemented in a working multimodal
question answering system where it operates with two independently developed question
answering modules
Title-Guided Encoding for Keyphrase Generation
Keyphrase generation (KG) aims to generate a set of keyphrases given a
document, which is a fundamental task in natural language processing (NLP).
Most previous methods solve this problem in an extractive manner, while
recently, several attempts are made under the generative setting using deep
neural networks. However, the state-of-the-art generative methods simply treat
the document title and the document main body equally, ignoring the leading
role of the title to the overall document. To solve this problem, we introduce
a new model called Title-Guided Network (TG-Net) for automatic keyphrase
generation task based on the encoder-decoder architecture with two new
features: (i) the title is additionally employed as a query-like input, and
(ii) a title-guided encoder gathers the relevant information from the title to
each word in the document. Experiments on a range of KG datasets demonstrate
that our model outperforms the state-of-the-art models with a large margin,
especially for documents with either very low or very high title length ratios.Comment: AAAI 1
Toward Entity-Aware Search
As the Web has evolved into a data-rich repository, with the standard "page view," current search engines are becoming increasingly inadequate for a wide range of query tasks. While we often search for various data "entities" (e.g., phone number, paper PDF, date), today's engines only take us indirectly to pages. In my Ph.D. study, we focus on a novel type of Web search that is aware of data entities inside pages, a significant departure from traditional document retrieval. We study the various essential aspects of supporting entity-aware Web search. To begin with, we tackle the core challenge of ranking entities, by distilling its underlying conceptual model Impression Model and developing a probabilistic ranking framework, EntityRank, that is able to seamlessly integrate both local and global information in ranking. We also report a prototype system built to show the initial promise of the proposal. Then, we aim at distilling and abstracting the essential computation requirements of entity search. From the dual views of reasoning--entity as input and entity as output, we propose a dual-inversion framework, with two indexing and partition schemes, towards efficient and scalable query processing. Further, to recognize more entity instances, we study the problem of entity synonym discovery through mining query log data. The results we obtained so far have shown clear promise of entity-aware search, in its usefulness, effectiveness, efficiency and scalability
Applying semantic web technologies to knowledge sharing in aerospace engineering
This paper details an integrated methodology to optimise Knowledge reuse and sharing, illustrated with a use case in the aeronautics domain. It uses Ontologies as a central modelling strategy for the Capture of Knowledge from legacy docu-ments via automated means, or directly in systems interfacing with Knowledge workers, via user-defined, web-based forms. The domain ontologies used for Knowledge Capture also guide the retrieval of the Knowledge extracted from the data using a Semantic Search System that provides support for multiple modalities during search. This approach has been applied and evaluated successfully within the aerospace domain, and is currently being extended for use in other domains on an increasingly large scale
- …