10 research outputs found
Combining Wikipedia and Newswire Texts for Question Answering in Spanish
4 pages, 1 figure.-- Contributed to: Advances in Multilingual and Multimodal Information Retrieval: 8th Workshop of the Cross-Language Evaluation Forum (CLEF 2007, Budapest, Hungary, Sep 19-21, 2007).This paper describes the adaptations of the MIRACLE group QA system in order to participate in the Spanish monolingual question answering task at QA@CLEF 2007. A system, initially developed for the EFE collection, was reused for Wikipedia. Answers from both collections were combined using temporal information extracted from questions and collections. Reusing the EFE subsystem has proven not feasible, and questions with answers only in Wikipedia have obtained low accuracy. Besides, a co-reference module based on heuristics was introduced for processing topic-related questions. This module achieves good coverage in different situations but it is hindered by the moderate accuracy of the base system and the chaining of incorrect answers.This work has been partially supported by the Regional Government of Madrid under the Research Network MAVIR (S-0505/TIC-0267) and projects by the Spanish Ministry of Education and Science (TIN2004/07083,TIN2004-07588-C03-02,TIN2007-67407-C03-01).Publicad
Recommended from our members
IBM’s PIQUANT II in TREC2004
PIQUANT II, the system we used for TREC 2004, is a completely reengineered system whose core functionalities for answering factoid and list questions remain largely unchanged from previous. We continue to address these questions using our multi-strategy and multi-source approach
Vers une prédiction automatique de la difficulté d'une question en langue naturelle
International audienceNous proposons et testons deux méthodes de prédiction de la capacité d'un système à répondre à une question factuelle. Une telle prédiction permet de déterminer si l'on doit initier un dialogue afin de préciser ou de reformuler la question posée par l'utilisateur. La première approche que nous proposons est une adaptation d'une méthode de prédiction dans le domaine de la recherche documentaire, basée soit sur des machines à vecteurs supports (SVM) soit sur des arbres de décision, avec des critères tels que le contenu des questions ou des documents, et des mesures de cohésion entre les documents ou passages de documents d'où sont extraits les réponses. L'autre approche vise à utiliser le type de réponse attendue pour décider de la capacité du système à répondre. Les deux approches ont été testées sur les données de la campagne Technolangue EQUER des systèmes de questions-réponses en français. L'approche à base de SVM est celle qui obtient les meilleurs résultats. Elle permet de distinguer au mieux les questions faciles, celles auxquelles notre système apporte une bonne réponse, des questions difficiles, celles restées sans réponses ou auxquelles le système a répondu de manière incorrecte. A l'opposé on montre que pour notre système, le type de réponse attendue (personnes, quantités, lieux...) n'est pas un facteur déterminant pour la difficulté d'une question
EQueR : Evaluation de systèmes de Question-Réponse
International audienceUn système de question-réponse (QR) permet de poser une question en langue naturelle et se donne pour but d'extraire la réponse, quand elle y figure, d'un ensemble de textes. En cela, ces systèmes traitent de recherche d'informations précises, ou factuelles, c'est-à-dire telles qu'elles puissent être spécifiées en une seule question et dont la réponse tient en peu de mots. Typiquement, ce sont des réponses fournissant des dates, ou des noms de personnalités comme par exemple « Quand est mort Henri IV ? » ou « Qui a tué Henri IV ? », mais aussi donnant des caractéristiques sur des entités ou des événements moins faciles à typer, par exemple « Comment est mort Henri IV ? » ou « De quelle couleur est le drapeau français ? ». La recherche en question-réponse connaît un essor important depuis quelques années. On peut le constater au travers des conférences d'évaluation en recherche d'information qui proposent toutes une tâche question-réponse dorénavant, mais aussi par les conférences qui sont nombreuses à proposer ce thème dans leurs appels à propositions d'articles, et enfin via l'existence d'ateliers spécifiques à ce thème dans les grandes conférences de recherche d'information (RI) mais aussi de traitement de la langue et d'intelligence artificielle. Cela est sans doute dû à une conjonction de facteurs : 1) l'inadéquation des systèmes de recherche d'information qui proposent systématiquement une liste de documents face à différents besoins utilisateur. En effet, lorsque l'utilisateur recherche une information précise, il semble plus pertinent à la fois de pouvoir poser sa question en langue naturelle, ce qui lui permet de mieux préciser sa requête, et de ne retourner en résultat qu'un court passage contenant le réponse cherchée ; 2) l'arrivée à maturité d'un certain nombre de techniques en RI et en traitement de la langue qui permettent d'en envisager une application à large échelle, sans restriction sur le domaine traité ; 3) la possibilité de définir un cadre d'évaluation des systèmes
An Intelligent Framework for Natural Language Stems Processing
This work describes an intelligent framework that enables the derivation of stems from inflected words. Word stemming is one of the most important factors affecting the performance of many language applications including parsing, syntactic analysis, speech recognition, retrieval systems, medical systems, tutoring systems, biological systems,…, and translation systems. Computational stemming is essential for dealing with some natural language processing such as Arabic Language, since Arabic is a highly inflected language. Computational stemming is an urgent necessity for dealing with Arabic natural language processing. The framework is based on logic programming that creates a program to enabling the computer to reason logically. This framework provides information on semantics of words and resolves ambiguity. It determines the position of each addition or bound morpheme and identifies whether the inflected word is a subject, object, or something else. Position identification (expression) is vital for enhancing understandability mechanisms. The proposed framework adapts bi-directional approaches. It can deduce morphemes from inflected words or it can build inflected words from stems. The proposed framework handles multi-word expressions and identification of names. The framework is based on definiteclause grammar where rules are built according to Arabic patterns (templates) using programming language prolog as predicates in first-order logic. This framework is based on using predicates in firstorder logic with object-oriented programming convention which can address problems of complexity. This complexity of natural language processing comes from the huge amount of storage required. This storage reduces the efficiency of the software system. In order to deal with this complexity, the research uses Prolog as it is based on efficient and simple proof routines. It has dynamic memory allocation of automatic garbage collection. This facility, in addition to relieve th
Topic indexing and retrieval for open domain factoid question answering
Factoid Question Answering is an exciting area of Natural Language Engineering that
has the potential to replace one major use of search engines today. In this dissertation,
I introduce a new method of handling factoid questions whose answers are proper
names. The method, Topic Indexing and Retrieval, addresses two issues that prevent
current factoid QA system from realising this potential: They can’t satisfy users’ demand
for almost immediate answers, and they can’t produce answers based on evidence
distributed across a corpus.
The first issue arises because the architecture common to QA systems is not easily
scaled to heavy use because so much of the work is done on-line: Text retrieved by
information retrieval (IR) undergoes expensive and time-consuming answer extraction
while the user awaits an answer. If QA systems are to become as heavily used as
popular web search engines, this massive process bottle-neck must be overcome.
The second issue of how to make use of the distributed evidence in a corpus is relevant
when no single passage in the corpus provides sufficient evidence for an answer
to a given question. QA systems commonly look for a text span that contains sufficient
evidence to both locate and justify an answer. But this will fail in the case of questions
that require evidence from more than one passage in the corpus.
Topic Indexing and Retrieval method developed in this thesis addresses both these
issues for factoid questions with proper name answers by restructuring the corpus in
such a way that it enables direct retrieval of answers using off-the-shelf IR. The method
has been evaluated on 377 TREC questions with proper name answers and 41 questions
that require multiple pieces of evidence from different parts of the TREC AQUAINT
corpus. With regards to the first evaluation, scores of 0.340 in Accuracy and 0.395 in
Mean Reciprocal Rank (MRR) show that the Topic Indexing and Retrieval performs
well for this type of questions. A second evaluation compares performance on a corpus
of 41 multi-evidence questions by a question-factoring baseline method that can
be used with the standard QA architecture and by my Topic Indexing and Retrieval
method. The superior performance of the latter (MRR of 0.454 against 0.341) demonstrates
its value in answering such questions
A Multi-Strategy and Multi-Source Approach to Question Answering
this paper, we first describe the architecture on which PIQUANT is based. We then describe the answering agents currently implemented within the PIQUANT system, and how they were configured for our TREC2002 runs. Finally, we show that significant performance improvement was achieved by our multi-agent architecture by comparing our TREC2002 results against individual answering agent performanc
Advanced techniques for personalized, interactive question answering
Using a computer to answer questions has been a human dream since the beginning of
the digital era. A first step towards the achievement of such an ambitious goal is to deal
with naturallangilage to enable the computer to understand what its user asks.
The discipline that studies the conD:ection between natural language and the represen~
tation of its meaning via computational models is computational linguistics. According
to such discipline, Question Answering can be defined as the task that, given a question
formulated in natural language, aims at finding one or more concise answers in the form
of sentences or phrases.
Question Answering can be interpreted as a sub-discipline of information retrieval
with the added challenge of applying sophisticated techniques to identify the complex
syntactic and semantic relationships present in text. Although it is widely accepted that
Question Answering represents a step beyond standard infomiation retrieval, allowing a
more sophisticated and satisfactory response to the user's information needs, it still shares
a series of unsolved issues with the latter.
First, in most state-of-the-art Question Answering systems, the results are created
independently of the questioner's characteristics, goals and needs. This is a serious limitation
in several cases: for instance, a primary school child and a History student may
need different answers to the questlon: When did, the Middle Ages begin?
Moreover, users often issue queries not as standalone but in the context of a wider
information need, for instance when researching a specific topic. Although it has recently been proposed that providing Question Answering systems with dialogue interfaces
would encourage and accommodate the submission of multiple related questions
and handle the user's requests for clarification, interactive Question Answering is still at
its early stages:
Furthermore, an i~sue which still remains open in current Question Answering is
that of efficiently answering complex questions, such as those invoking definitions and
descriptions (e.g. What is a metaphor?). Indeed, it is difficult to design criteria to assess
the correctness of answers to such complex questions.
.. These are the central research problems addressed by this thesis, and are solved as
follows.
An in-depth study on complex Question Answering led to the development of classifiers
for complex answers. These exploit a variety of lexical, syntactic and shallow
semantic features to perform textual classification using tree-~ernel functions for Support
Vector Machines.
The issue of personalization is solved by the integration of a User Modelling corn':
ponent within the the Question Answering model. The User Model is able to filter and
fe-rank results based on the user's reading level and interests.
The issue ofinteractivity is approached by the development of a dialogue model and a
dialogue manager suitable for open-domain interactive Question Answering. The utility
of such model is corroborated by the integration of an interactive interface to allow reference
resolution and follow-up conversation into the core Question Answerin,g system and
by its evaluation.
Finally, the models of personalized and interactive Question Answering are integrated
in a comprehensive framework forming a unified model for future Question Answering
research