6 research outputs found
Similarity measures and diversity rankings for query-focused sentence extraction
Query-focused sentence extraction generally refers to an extractive approach to select a set of sentences that responds to a specific information need. It is one of the major approaches employed in multi-document summarization, focused summarization, and complex question answering. The major advantage of most extractive methods over the natural language processing (NLP) intensive methods is that they are relatively simple, theoretically sound â drawing upon several supervised and unsupervised learning techniques, and often produce equally strong empirical performance. Many research areas, including information retrieval and text mining, have recently moved toward the extractive query-focused sentence generation as its outputs have great potential to support every dayâs information seeking activities. Particularly, as more information have been created and stored online, extractive-based summarization systems may quickly utilize several ubiquitous resources, such as Google search results and social medias, to extract summaries to answer usersâ queries.This thesis explores how the performance of sentence extraction tasks can be improved to create higher quality outputs. Specifically, two major areas are investigated. First, we examine the issue of natural language variation which affects the similarity judgment of sentences. As sentences are much shorter than documents, they generally contain fewer occurring words. Moreover, the similarity notions of sentences are different than those of documents as they tend to be very specific in meanings. Thus many document-level similarity measures are likely to perform well at this level. In this work, we address these issues in two application domains. First, we present a hybrid method, utilizing both unsupervised and supervised techniques, to compute the similarity of interrogative sentences for factoid question reuse. Next, we propose a novel structural similarity measure based on sentence semantics for paraphrase identification and textual entailment recognition tasks. The empirical evaluations suggest the effectiveness of the proposed methods in improving the accuracy of sentence similarity judgments.Furthermore, we examine the effects of the proposed similarity measure in two specific sentence extraction tasks, focused summarization and complex question answering. In conjunction with the proposed similarity measure, we also explore the issues of novelty, redundancy, and diversity in sentence extraction. To that end, we present a novel approach to promote diversity of extracted sets of sentences based on the negative endorsement principle. Negative-signed edges are employed to represent a redundancy relation between sentence nodes in graphs. Then, sentences are reranked according to the long-term negative endorsements from random walk. Additionally, we propose a unified centrality ranking and diversity ranking based on the aforementioned principle. The results from a comprehensive evaluation confirm that the proposed methods perform competitively, compared to many state-of-the-art methods.Ph.D., Information Science -- Drexel University, 201
Advanced techniques for personalized, interactive question answering
Using a computer to answer questions has been a human dream since the beginning of
the digital era. A first step towards the achievement of such an ambitious goal is to deal
with naturallangilage to enable the computer to understand what its user asks.
The discipline that studies the conD:ection between natural language and the represen~
tation of its meaning via computational models is computational linguistics. According
to such discipline, Question Answering can be defined as the task that, given a question
formulated in natural language, aims at finding one or more concise answers in the form
of sentences or phrases.
Question Answering can be interpreted as a sub-discipline of information retrieval
with the added challenge of applying sophisticated techniques to identify the complex
syntactic and semantic relationships present in text. Although it is widely accepted that
Question Answering represents a step beyond standard infomiation retrieval, allowing a
more sophisticated and satisfactory response to the user's information needs, it still shares
a series of unsolved issues with the latter.
First, in most state-of-the-art Question Answering systems, the results are created
independently of the questioner's characteristics, goals and needs. This is a serious limitation
in several cases: for instance, a primary school child and a History student may
need different answers to the questlon: When did, the Middle Ages begin?
Moreover, users often issue queries not as standalone but in the context of a wider
information need, for instance when researching a specific topic. Although it has recently been proposed that providing Question Answering systems with dialogue interfaces
would encourage and accommodate the submission of multiple related questions
and handle the user's requests for clarification, interactive Question Answering is still at
its early stages:
Furthermore, an i~sue which still remains open in current Question Answering is
that of efficiently answering complex questions, such as those invoking definitions and
descriptions (e.g. What is a metaphor?). Indeed, it is difficult to design criteria to assess
the correctness of answers to such complex questions.
.. These are the central research problems addressed by this thesis, and are solved as
follows.
An in-depth study on complex Question Answering led to the development of classifiers
for complex answers. These exploit a variety of lexical, syntactic and shallow
semantic features to perform textual classification using tree-~ernel functions for Support
Vector Machines.
The issue of personalization is solved by the integration of a User Modelling corn':
ponent within the the Question Answering model. The User Model is able to filter and
fe-rank results based on the user's reading level and interests.
The issue ofinteractivity is approached by the development of a dialogue model and a
dialogue manager suitable for open-domain interactive Question Answering. The utility
of such model is corroborated by the integration of an interactive interface to allow reference
resolution and follow-up conversation into the core Question Answerin,g system and
by its evaluation.
Finally, the models of personalized and interactive Question Answering are integrated
in a comprehensive framework forming a unified model for future Question Answering
research
Complex question answering : minimizing the gaps and beyond
xi, 192 leaves : ill. ; 29 cmCurrent Question Answering (QA) systems have been significantly advanced in demonstrating
finer abilities to answer simple factoid and list questions. Such questions are easier
to process as they require small snippets of texts as the answers. However, there is
a category of questions that represents a more complex information need, which cannot
be satisfied easily by simply extracting a single entity or a single sentence. For example,
the question: âHow was Japan affected by the earthquake?â suggests that the inquirer is
looking for information in the context of a wider perspective. We call these âcomplex questionsâ
and focus on the task of answering them with the intention to minimize the existing
gaps in the literature.
The major limitation of the available search and QA systems is that they lack a way of
measuring whether a user is satisfied with the information provided. This was our motivation
to propose a reinforcement learning formulation to the complex question answering
problem. Next, we presented an integer linear programming formulation where sentence
compression models were applied for the query-focused multi-document summarization
task in order to investigate if sentence compression improves the overall performance.
Both compression and summarization were considered as global optimization problems.
We also investigated the impact of syntactic and semantic information in a graph-based
random walk method for answering complex questions. Decomposing a complex question
into a series of simple questions and then reusing the techniques developed for answering
simple questions is an effective means of answering complex questions. We proposed a
supervised approach for automatically learning good decompositions of complex questions
in this work. A complex question often asks about a topic of userâs interest. Therefore, the
problem of complex question decomposition closely relates to the problem of topic to question
generation. We addressed this challenge and proposed a topic to question generation
approach to enhance the scope of our problem domain
Infra-estrutura de um serviço online de resposta-a-perguntas com base na web portuguesa
Trabalho de projecto de mestrado em Engenharia InformĂĄtica, apresentado Ă Universidade de Lisboa, atravĂ©s da Faculdade de CiĂȘncias, 2007A Internet promoveu uma nova forma de comunicação global, com um impacto profundo na disseminação da informação. Em consequĂȘncia, tornamse necessĂĄrias novas soluçÔes tecnolĂłgicas que permitam explorar os recursos actualmente disponĂveis. Numa era em que os motores de busca de documentos jĂĄ fazem parte da vida quotidiana do cibernauta, o prĂłximo passo Ă© permitir que os utilizadores da rede obtenham breves respostas a perguntas especĂficas. O projecto QueXting foi encetado pelo Grupo de Fala e Linguagem Natural (NLX) do Departamento de InformĂĄtica da Faculdade de CiĂȘncias da Universidade de Lisboa com o objectivo principal de contribuir para um melhor acesso Ă informação, possibilitando a realização de perguntas em PortuguĂȘs e a obtenção de respostas a partir da informação disponĂvel nos documentos escritos em lĂngua portuguesa. Para tal, pretende oferecer livre acesso a um serviço online que processa os documentos da Internet escritos nesta lĂngua e que começarĂĄ por proporcionar respostas a perguntas factuais. O sistema de respostaaperguntas QueXting tem como pilares a arquitectura e metodologia recentemente amadurecidas neste domĂnio cientĂfico e diversas ferramentas linguĂsticas, especĂficas para a lĂngua portuguesa, que o NLX tem vindo a desenvolver. O processamento linguĂstico especĂfico Ă© um dos factores chave que distingue a tarefa de respostaaperguntas das restantes tarefas de recuperação e extracção de informação, permitindo um processamento profundo dos pedidos de informação e a obtenção de respostas exactas. Esta dissertação apresenta os resultados do desenvolvimento da infraestrutura do sistema QueXting, que servirĂĄ de base ao processamento especĂfico de diversos tipos de perguntas factuais. Apresenta ainda os resultados obtidos no processamento de um tipo de pergunta especĂfico, para o qual foram realizadas avaliaçÔes preliminares.The Internet promoted a new form of global communication, with deep impact in the dissemination of information. As a consequence, new technological solutions are needed for the exploitation of the resources thus made available. At a time when document search engines are already part of the daily life of Internet users, the next step is to allow these users to obtain brief answers to specific questions. The QueXting project was undertaken by the Natural Language and Speech Group (NLX) at the Department of Informatics of the Faculty of Sciences of the University of Lisbon, with the main goal of contributing to a better access to information available in the Portuguese language. To this end, a web service supporting questions in Portuguese should be made freely available, gathering answers from documents written in this language. The QueXting questionanswering system is implemented through a general methodology and architecture that have recently matured in this scientific domain. Furthermore, it is supported by several linguistic tools, specific for Portuguese, that the NLX group has been developing. This specific linguistic processing is a key factor distinguishing the task of questionanswering from the remaining tasks of information retrieval and extraction, allowing the deep processing of information requests and the extraction of exact answers. This dissertation reports on the development of the infrastructure of the QueXting system, which will support the specific processing of several types of factoid questions. Such processing has already been applied to one specific type of question, for which some preliminary evaluations were completed
Beyond Question Answering: Understanding the Information Need of the User
Intelligent interaction between humans and computers has been a dream of artificial intelligence since the beginning of digital era and one of the original motivations behind the creation of artificial intelligence. A key step towards the achievement of such an ambitious goal is to enable the Question Answering systems understand the information need of the user.
In this thesis, we attempt to enable the QA system's ability to understand the user's information need by three approaches. First, an clarification question generation method is proposed to help the user clarify the information need and bridge information need gap between QA system and the user. Next, a translation based model is obtained from the large archives of Community Question Answering data, to model the information need behind a question and boost the performance of question recommendation. Finally, a fine-grained classification framework is proposed to enable the systems to recommend answered questions based on information need satisfaction
Parallel corpus multi stream question answering with applications to the Qu'ran
Question-Answering (QA) is an important research area, which is concerned with developing an automated process that answers questions posed by humans in a natural language. QA is a shared task for the Information Retrieval (IR), Information Extraction (IE), and Natural Language Processing communities (NLP). A technical review of different QA system models and methodologies reveals that a typical QA system consists of different components to accept a natural language question from a user and deliver its answer(s) back to the user. Existing systems have been usually aimed at structured/ unstructured data collected from everyday English text, i.e. text collected from television programmes, news wires, conversations, novels and other similar genres. Despite all up-to-date research in the subject area, a notable fact is that none of the existing QA Systems has been tested on a Parallel Corpus of religious text with the aim of question answering. Religious text has peculiar characteristics and features which make it more challenging for traditional QA methods than other kinds of text.
This thesis proposes PARMS (Parallel Corpus Multi Stream) Methodology; a novel method applying existing advanced IR (Information Retrieval) techniques, and combining them with NLP (Natural Language Processing) methods and additional semantic knowledge to implement QA (Question Answering) for a parallel corpus. A parallel Corpus involves use of multiple forms of the same corpus where each form differs from others in a certain aspect, e.g. translations of a scripture from one language to another by different translators. Additional semantic knowledge can be referred as a stream of information related to a corpus. PARMS uses Multiple Streams of semantic knowledge including a general ontology (WordNet) and domain-specific ontologies (QurTerms, QurAna, QurSim). This additional knowledge has been used in embedded form for Query Expansion, Corpus Enrichment and Answer Ranking.
The PARMS Methodology has wider applications. This thesis applies it to the Quran â the core text of Islam; as a first case study. The PARMS Method uses parallel corpus comprising ten different English translations of the Quran. An individual Quranic verse is treated as an answer to questions asked in a natural language, English. This thesis also implements PARMS QA Application as a proof of concept for the PARMS methodology. The PARMS Methodology aims to evaluate the range of semantic knowledge streams separately and in combination; and also to evaluate alternative subsets of the DATA source: QA from one stream vs. parallel corpus. Results show that use of Parallel Corpus and Multiple Streams of semantic knowledge have obvious advantages. To the best of my knowledge, this method is developed for the first time and it is expected to be a benchmark for further research area