18 research outputs found
Query-based extracting: how to support the answer?
Human-made query-based summaries commonly contain information not explicitly asked for. They answer the user query, but also provide supporting information. In order to find this information in the source text, a graph is used to model the strength and type of relations between sentences of the query and document cluster, based on various features. The resulting extracts rank second in overall readability in the DUC 2006 evaluation. Employment of better question answering methods is the key to improve also content-based evaluation results
Bare-Bones Dependency Parsing — A Case for Occam's Razor?
Proceedings of the 18th Nordic Conference of Computational Linguistics
NODALIDA 2011.
Editors: Bolette Sandford Pedersen, Gunta Nešpore and Inguna Skadiņa.
NEALT Proceedings Series, Vol. 11 (2011), 6-11.
© 2011 The editors and contributors.
Published by
Northern European Association for Language
Technology (NEALT)
http://omilia.uio.no/nealt .
Electronically published at
Tartu University Library (Estonia)
http://hdl.handle.net/10062/16955
On the voice-activated question answering
[EN] Question answering (QA) is probably one of the most challenging tasks in the field of natural language processing. It requires search engines that are capable of extracting concise, precise fragments of text that contain an answer to a question posed by the user. The incorporation of voice interfaces to the QA systems adds a more natural and very appealing perspective for these systems. This paper provides a comprehensive description of current state-of-the-art voice-activated QA systems. Finally, the scenarios that will emerge from the introduction of speech recognition in QA will be discussed. © 2006 IEEE.This work was supported in part by Research Projects TIN2009-13391-C04-03 and TIN2008-06856-C05-02. This paper was recommended by Associate Editor V. Marik.Rosso, P.; Hurtado Oliver, LF.; Segarra Soriano, E.; Sanchís Arnal, E. (2012). On the voice-activated question answering. IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews. 42(1):75-85. https://doi.org/10.1109/TSMCC.2010.2089620S758542
Topic indexing and retrieval for open domain factoid question answering
Factoid Question Answering is an exciting area of Natural Language Engineering that
has the potential to replace one major use of search engines today. In this dissertation,
I introduce a new method of handling factoid questions whose answers are proper
names. The method, Topic Indexing and Retrieval, addresses two issues that prevent
current factoid QA system from realising this potential: They can’t satisfy users’ demand
for almost immediate answers, and they can’t produce answers based on evidence
distributed across a corpus.
The first issue arises because the architecture common to QA systems is not easily
scaled to heavy use because so much of the work is done on-line: Text retrieved by
information retrieval (IR) undergoes expensive and time-consuming answer extraction
while the user awaits an answer. If QA systems are to become as heavily used as
popular web search engines, this massive process bottle-neck must be overcome.
The second issue of how to make use of the distributed evidence in a corpus is relevant
when no single passage in the corpus provides sufficient evidence for an answer
to a given question. QA systems commonly look for a text span that contains sufficient
evidence to both locate and justify an answer. But this will fail in the case of questions
that require evidence from more than one passage in the corpus.
Topic Indexing and Retrieval method developed in this thesis addresses both these
issues for factoid questions with proper name answers by restructuring the corpus in
such a way that it enables direct retrieval of answers using off-the-shelf IR. The method
has been evaluated on 377 TREC questions with proper name answers and 41 questions
that require multiple pieces of evidence from different parts of the TREC AQUAINT
corpus. With regards to the first evaluation, scores of 0.340 in Accuracy and 0.395 in
Mean Reciprocal Rank (MRR) show that the Topic Indexing and Retrieval performs
well for this type of questions. A second evaluation compares performance on a corpus
of 41 multi-evidence questions by a question-factoring baseline method that can
be used with the standard QA architecture and by my Topic Indexing and Retrieval
method. The superior performance of the latter (MRR of 0.454 against 0.341) demonstrates
its value in answering such questions
Acquiring syntactic and semantic transformations in question answering
One and the same fact in natural language can be expressed in many different ways by
using different words and/or a different syntax. This phenomenon, commonly called
paraphrasing, is the main reason why Natural Language Processing (NLP) is such a
challenging task. This becomes especially obvious in Question Answering (QA) where
the task is to automatically answer a question posed in natural language, usually in a
text collection also consisting of natural language texts. It cannot be assumed that an
answer sentence to a question uses the same words as the question and that these words
are combined in the same way by using the same syntactic rules.
In this thesis we describe methods that can help to address this problem. Firstly
we explore how lexical resources, i.e. FrameNet, PropBank and VerbNet can be used
to recognize a wide range of syntactic realizations that an answer sentence to a given
question can have. We find that our methods based on these resources work well for
web-based Question Answering. However we identify two problems: 1) All three resources
as of yet have significant coverage issues. 2) These resources are not suitable
to identify answer sentences that show some form of indirect evidence. While the
first problem hinders performance currently, it is not a theoretical problem that renders
the approach unsuitable–it rather shows that more efforts have to be made to produce
more complete resources. The second problem is more persistent. Many valid answer
sentences–especially in small, journalistic corpora–do not provide direct evidence for
a question, rather they strongly suggest an answer without logically implying it. Semantically
motivated resources like FrameNet, PropBank and VerbNet can not easily
be employed to recognize such forms of indirect evidence.
In order to investigate ways of dealing with indirect evidence, we used Amazon’s
Mechanical Turk to collect over 8,000 manually identified answer sentences from the
AQUAINT corpus to the over 1,900 TREC questions from the 2002 to 2006 QA tracks.
The pairs of answer sentences and their corresponding questions form the QASP corpus,
which we released to the public in April 2008. In this dissertation, we use the
QASP corpus to develop an approach to QA based on matching dependency relations
between answer candidates and question constituents in the answer sentences. By
acquiring knowledge about syntactic and semantic transformations from dependency
relations in the QASP corpus, additional answer candidates can be identified that could
not be linked to the question with our first approach
Recuperación de pasajes multilingües para la búsqueda de respuestas
JAVA Information Retrieval System (JIRS) es un sistema de Recuperación de Información especialmente orientado a tareas de Búsqueda de Respuestas. Los tradicionales motores de búsqueda se basan en las palabras claves de la pregunta para obtener los documentos relevantes a una consulta. JIRS, por el contrario, intenta obtener trozos de texto, es decir pasajes, con mayor probabilidad de contener la respuesta. Para ello realiza una búsqueda basada en los n-gramas de la pregunta -efectuada en lenguaje natural- usando tres posibles modelos. Los modelos de n-gramas desarrollados son independientes del idioma, lo que hace de JIRS un sistema idóneo para trabajar en ambientes multilingües.
Por otra parte, JIRS incorpora un potente núcleo que permite una adaptación y escalabilidad sin precedentes en los modernos motores de búsqueda. Desde sus inicios fue diseñado para que fuera una herramienta potente que permitiese adaptarse sin dificultad a muy diferentes funciones. Esto permite ampliar o modificar aspectos de JIRS de forma muy fácil e intuitiva sin que el usuario final tenga que conocer el código desarrollado por otros. Además, permite generar nuevas aplicaciones con una estructura cliente/servidor, distribuida, etc. únicamente modificando el archivo de configuración.
Este trabajo presenta el estado del arte de la Recuperación de Información centrándose en la Búsqueda de Respuestas multilingüe, así como una descripción detallada de JIRS junto con sus modelos de búsqueda exponiendo, finalmente, los resultados obtenidos por este sistema en las competiciones del CLEF.Gómez Soriano, JM. (2007). Recuperación de pasajes multilingües para la búsqueda de respuestas [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/1930Palanci