160 research outputs found
Complex question answering : minimizing the gaps and beyond
xi, 192 leaves : ill. ; 29 cmCurrent Question Answering (QA) systems have been significantly advanced in demonstrating
finer abilities to answer simple factoid and list questions. Such questions are easier
to process as they require small snippets of texts as the answers. However, there is
a category of questions that represents a more complex information need, which cannot
be satisfied easily by simply extracting a single entity or a single sentence. For example,
the question: “How was Japan affected by the earthquake?” suggests that the inquirer is
looking for information in the context of a wider perspective. We call these “complex questions”
and focus on the task of answering them with the intention to minimize the existing
gaps in the literature.
The major limitation of the available search and QA systems is that they lack a way of
measuring whether a user is satisfied with the information provided. This was our motivation
to propose a reinforcement learning formulation to the complex question answering
problem. Next, we presented an integer linear programming formulation where sentence
compression models were applied for the query-focused multi-document summarization
task in order to investigate if sentence compression improves the overall performance.
Both compression and summarization were considered as global optimization problems.
We also investigated the impact of syntactic and semantic information in a graph-based
random walk method for answering complex questions. Decomposing a complex question
into a series of simple questions and then reusing the techniques developed for answering
simple questions is an effective means of answering complex questions. We proposed a
supervised approach for automatically learning good decompositions of complex questions
in this work. A complex question often asks about a topic of user’s interest. Therefore, the
problem of complex question decomposition closely relates to the problem of topic to question
generation. We addressed this challenge and proposed a topic to question generation
approach to enhance the scope of our problem domain
A reinforcement learning formulation to the complex question answering problem
International audienceWe use extractive multi-document summarization techniques to perform complex question answering and formulate it as a reinforcement learning problem. Given a set of complex questions, a list of relevant documents per question, and the corresponding human generated summaries (i.e. answers to the questions) as training data, the reinforcement learning module iteratively learns a number of feature weights in order to facilitate the automatic generation of summaries i.e. answers to previously unseen complex questions. A reward function is used to measure the similarities between the candidate (machine generated) summary sentences and the abstract summaries. In the training stage, the learner iteratively selects the important document sentences to be included in the candidate summary, analyzes the reward function and updates the related feature weights accordingly. The final weights are used to generate summaries as answers to unseen complex questions in the testing stage. Evaluation results show the effectiveness of our system. We also incorporate user interaction into the reinforcement learner to guide the candidate summary sentence selection process. Experiments reveal the positive impact of the user interaction component on the reinforcement learning framework
Improvements to the complex question answering models
x, 128 leaves : ill. ; 29 cmIn recent years the amount of information on the web has increased dramatically. As a
result, it has become a challenge for the researchers to find effective ways that can help us
query and extract meaning from these large repositories. Standard document search engines
try to address the problem by presenting the users a ranked list of relevant documents. In
most cases, this is not enough as the end-user has to go through the entire document to find
out the answer he is looking for. Question answering, which is the retrieving of answers
to natural language questions from a document collection, tries to remove the onus on the
end-user by providing direct access to relevant information.
This thesis is concerned with open-domain complex question answering. Unlike simple
questions, complex questions cannot be answered easily as they often require inferencing
and synthesizing information from multiple documents. Hence, we considered the task
of complex question answering as query-focused multi-document summarization. In this
thesis, to improve complex question answering we experimented with both empirical and
machine learning approaches. We extracted several features of different types (i.e. lexical,
lexical semantic, syntactic and semantic) for each of the sentences in the document
collection in order to measure its relevancy to the user query.
We have formulated the task of complex question answering using reinforcement framework,
which to our best knowledge has not been applied for this task before and has the
potential to improve itself by fine-tuning the feature weights from user feedback. We have
also used unsupervised machine learning techniques (random walk, manifold ranking) and
augmented semantic and syntactic information to improve them. Finally we experimented
with question decomposition where instead of trying to find the answer of the complex
question directly, we decomposed the complex question into a set of simple questions and
synthesized the answers to get our final result
Answering complex questions : supervised approaches
x, 108 leaves : ill. ; 29 cmThe term “Google” has become a verb for most of us. Search engines, however, have
certain limitations. For example ask it for the impact of the current global financial crisis
in different parts of the world, and you can expect to sift through thousands of results for
the answer. This motivates the research in complex question answering where the purpose
is to create summaries of large volumes of information as answers to complex questions,
rather than simply offering a listing of sources. Unlike simple questions, complex questions
cannot be answered easily as they often require inferencing and synthesizing information
from multiple documents. Hence, this task is accomplished by the query-focused multidocument
summarization systems. In this thesis we apply different supervised learning
techniques to confront the complex question answering problem. To run our experiments,
we consider the DUC-2007 main task.
A huge amount of labeled data is a prerequisite for supervised training. It is expensive
and time consuming when humans perform the labeling task manually. Automatic labeling
can be a good remedy to this problem. We employ five different automatic annotation
techniques to build extracts from human abstracts using ROUGE, Basic Element (BE) overlap,
syntactic similarity measure, semantic similarity measure and Extended String Subsequence
Kernel (ESSK). The representative supervised methods we use are Support Vector
Machines (SVM), Conditional Random Fields (CRF), Hidden Markov Models (HMM) and
Maximum Entropy (MaxEnt). We annotate DUC-2006 data and use them to train our systems,
whereas 25 topics of DUC-2007 data set are used as test data. The evaluation results
reveal the impact of automatic labeling methods on the performance of the supervised approaches
to complex question answering. We also experiment with two ensemble-based
approaches that show promising results for this problem domain
Selecting and Generating Computational Meaning Representations for Short Texts
Language conveys meaning, so natural language processing (NLP) requires representations of meaning. This work addresses two broad questions: (1) What meaning representation should we use? and (2) How can we transform text to our chosen meaning representation? In the first part, we explore different meaning representations (MRs) of short texts, ranging from surface forms to deep-learning-based models. We show the advantages and disadvantages of a variety of MRs for summarization, paraphrase detection, and clustering. In the second part, we use SQL as a running example for an in-depth look at how we can parse text into our chosen MR. We examine the text-to-SQL problem from three perspectives—methodology, systems, and applications—and show how each contributes to a fuller understanding of the task.PHDComputer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttps://deepblue.lib.umich.edu/bitstream/2027.42/143967/1/cfdollak_1.pd
- …