8,519 research outputs found
Text-based Question Answering from Information Retrieval and Deep Neural Network Perspectives: A Survey
Text-based Question Answering (QA) is a challenging task which aims at
finding short concrete answers for users' questions. This line of research has
been widely studied with information retrieval techniques and has received
increasing attention in recent years by considering deep neural network
approaches. Deep learning approaches, which are the main focus of this paper,
provide a powerful technique to learn multiple layers of representations and
interaction between questions and texts. In this paper, we provide a
comprehensive overview of different models proposed for the QA task, including
both traditional information retrieval perspective, and more recent deep neural
network perspective. We also introduce well-known datasets for the task and
present available results from the literature to have a comparison between
different techniques
Text Embeddings for Retrieval From a Large Knowledge Base
Text embedding representing natural language documents in a semantic vector
space can be used for document retrieval using nearest neighbor lookup. In
order to study the feasibility of neural models specialized for retrieval in a
semantically meaningful way, we suggest the use of the Stanford Question
Answering Dataset (SQuAD) in an open-domain question answering context, where
the first task is to find paragraphs useful for answering a given question.
First, we compare the quality of various text-embedding methods on the
performance of retrieval and give an extensive empirical comparison on the
performance of various non-augmented base embedding with, and without IDF
weighting. Our main results are that by training deep residual neural models,
specifically for retrieval purposes, can yield significant gains when it is
used to augment existing embeddings. We also establish that deeper models are
superior to this task. The best base baseline embeddings augmented by our
learned neural approach improves the top-1 paragraph recall of the system by
14%.Comment: 12 pages, 7 figure
Convolutional Neural Network: Text Classification Model for Open Domain Question Answering System
Recently machine learning is being applied to almost every data domain one of
which is Question Answering Systems (QAS). A typical Question Answering System
is fairly an information retrieval system, which matches documents or text and
retrieve the most accurate one. The idea of open domain question answering
system put forth, involves convolutional neural network text classifiers. The
Classification model presented in this paper is multi-class text classifier.
The neural network classifier can be trained on large dataset. We report series
of experiments conducted on Convolution Neural Network (CNN) by training it on
two different datasets. Neural network model is trained on top of word
embedding. Softmax layer is applied to calculate loss and mapping of
semantically related words. Gathered results can help justify the fact that
proposed hypothetical QAS is feasible. We further propose a method to integrate
Convolutional Neural Network Classifier to an open domain question answering
system. The idea of Open domain will be further explained, but the generality
of it indicates to the system of domain specific trainable models, thus making
it an open domain.Comment: 12 pages, typos corrected, tables added, references adde
A Compare-Aggregate Model with Latent Clustering for Answer Selection
In this paper, we propose a novel method for a sentence-level
answer-selection task that is a fundamental problem in natural language
processing. First, we explore the effect of additional information by adopting
a pretrained language model to compute the vector representation of the input
text and by applying transfer learning from a large-scale corpus. Second, we
enhance the compare-aggregate model by proposing a novel latent clustering
method to compute additional information within the target corpus and by
changing the objective function from listwise to pointwise. To evaluate the
performance of the proposed approaches, experiments are performed with the
WikiQA and TREC-QA datasets. The empirical results demonstrate the superiority
of our proposed approach, which achieve state-of-the-art performance for both
datasets.Comment: 5 pages, Accepted as a conference paper at CIKM 201
Visual Word2Vec (vis-w2v): Learning Visually Grounded Word Embeddings Using Abstract Scenes
We propose a model to learn visually grounded word embeddings (vis-w2v) to
capture visual notions of semantic relatedness. While word embeddings trained
using text have been extremely successful, they cannot uncover notions of
semantic relatedness implicit in our visual world. For instance, although
"eats" and "stares at" seem unrelated in text, they share semantics visually.
When people are eating something, they also tend to stare at the food.
Grounding diverse relations like "eats" and "stares at" into vision remains
challenging, despite recent progress in vision. We note that the visual
grounding of words depends on semantics, and not the literal pixels. We thus
use abstract scenes created from clipart to provide the visual grounding. We
find that the embeddings we learn capture fine-grained, visually grounded
notions of semantic relatedness. We show improvements over text-only word
embeddings (word2vec) on three tasks: common-sense assertion classification,
visual paraphrasing and text-based image retrieval. Our code and datasets are
available online.Comment: 15 pages, 11 figure
A Brief Survey of Text Mining: Classification, Clustering and Extraction Techniques
The amount of text that is generated every day is increasing dramatically.
This tremendous volume of mostly unstructured text cannot be simply processed
and perceived by computers. Therefore, efficient and effective techniques and
algorithms are required to discover useful patterns. Text mining is the task of
extracting meaningful information from text, which has gained significant
attentions in recent years. In this paper, we describe several of the most
fundamental text mining tasks and techniques including text pre-processing,
classification and clustering. Additionally, we briefly explain text mining in
biomedical and health care domains.Comment: some of References format have update
Biomedical Question Answering via Weighted Neural Network Passage Retrieval
The amount of publicly available biomedical literature has been growing
rapidly in recent years, yet question answering systems still struggle to
exploit the full potential of this source of data. In a preliminary processing
step, many question answering systems rely on retrieval models for identifying
relevant documents and passages. This paper proposes a weighted cosine distance
retrieval scheme based on neural network word embeddings. Our experiments are
based on publicly available data and tasks from the BioASQ biomedical question
answering challenge and demonstrate significant performance gains over a wide
range of state-of-the-art models.Comment: To appear in ECIR 201
Automated text summarisation and evidence-based medicine: A survey of two domains
The practice of evidence-based medicine (EBM) urges medical practitioners to
utilise the latest research evidence when making clinical decisions. Because of
the massive and growing volume of published research on various medical topics,
practitioners often find themselves overloaded with information. As such,
natural language processing research has recently commenced exploring
techniques for performing medical domain-specific automated text summarisation
(ATS) techniques-- targeted towards the task of condensing large medical texts.
However, the development of effective summarisation techniques for this task
requires cross-domain knowledge. We present a survey of EBM, the
domain-specific needs for EBM, automated summarisation techniques, and how they
have been applied hitherto. We envision that this survey will serve as a first
resource for the development of future operational text summarisation
techniques for EBM
iParaphrasing: Extracting Visually Grounded Paraphrases via an Image
A paraphrase is a restatement of the meaning of a text in other words.
Paraphrases have been studied to enhance the performance of many natural
language processing tasks. In this paper, we propose a novel task iParaphrasing
to extract visually grounded paraphrases (VGPs), which are different phrasal
expressions describing the same visual concept in an image. These extracted
VGPs have the potential to improve language and image multimodal tasks such as
visual question answering and image captioning. How to model the similarity
between VGPs is the key of iParaphrasing. We apply various existing methods as
well as propose a novel neural network-based method with image attention, and
report the results of the first attempt toward iParaphrasing.Comment: COLING 201
State of the Art, Evaluation and Recommendations regarding "Document Processing and Visualization Techniques"
Several Networks of Excellence have been set up in the framework of the
European FP5 research program. Among these Networks of Excellence, the NEMIS
project focuses on the field of Text Mining.
Within this field, document processing and visualization was identified as
one of the key topics and the WG1 working group was created in the NEMIS
project, to carry out a detailed survey of techniques associated with the text
mining process and to identify the relevant research topics in related research
areas.
In this document we present the results of this comprehensive survey. The
report includes a description of the current state-of-the-art and practice, a
roadmap for follow-up research in the identified areas, and recommendations for
anticipated technological development in the domain of text mining.Comment: 54 pages, Report of Working Group 1 for the European Network of
Excellence (NoE) in Text Mining and its Applications in Statistics (NEMIS
- …