8 research outputs found
Dealing with ambiguities in an answer extraction system
We report on the treatment of ambiguity in ExtrAns, a system that performs an exhaustive linguistic analysis of UNIX manpages to do answer extraction over them. Disambiguation is performed in two stages. The first stage consists of a set of simple rules that delete some of the wrong interpretations that can be spot with purely syntactic information. The second stage extends the use of Brill and Resnik's algorithm to disambiguate several types of attachment ambiguities. Ambiguities that pass the disambiguation procedures are handled by ExtrAns by displaying the answers to the user with graded selective highlighting
A real world implementation of answer extraction
In this paper we describe ExtrAns, an answer extraction system. Answer extraction (AE) aims at retrieving those exact passages of a document that directly answer a given user question. AE is more ambitious than information retrieval and information extraction in that the retrieval results are phrases, not entire documents, and in that the queries may be arbitrarily specific. It is less ambitious than full-fledged question answering in that the answers are not generated from a knowledge base but looked up in the text of documents. The current version of ExtrAns is able to parse unedited Unix "man pages", and derive the logical form of their sentences. User queries are also translated into logical forms. A theorem prover then retrieves the relevant phrases, which are presented through selective highlighting in their context
Extraction automatique de réponses : implémentation du systÚme ExtrAns
Nous dĂ©crivons dans cet article un systĂšme d'extraction automatique de rĂ©ponses. L'extraction automatique de rĂ©ponses (EAR) a pour but de trouver les passages d'un document qui rĂ©pondent directement Ă une question posĂ©e par un utilisateur. L'EAR est plus ambitieuse que la recherche d'informations et l'extraction d'informations dans le sens que les rĂ©sultats de la recherche sont des phrases et non pas des documents en entier, et dans le sens que les questions peuvent ĂȘtre formulĂ©es de façon libre. Elle est par contre moins ambitieuse que les systĂšmes questions-rĂ©ponses car les rĂ©ponses ne sont pas gĂ©nĂ©rĂ©es Ă partir d'une base de connaissance, mais extraites des textes.
La version actuelle d'ExtrAns permet d'analyser la documentation en ligne (en Anglais) du systÚme Unix (les "man pages"), et construit une représentation sémantique (sous forme d'expressions logiques) des phrases. Un programme de démonstration de théorÚmes trouve ensuite les passages pertinents qui sont mis en évidence dans leur contexte
Towards answer extraction: an application to technical domains
The shortcomings of traditional Information Retrieval are most evident when users require exact information rather than relevant documents. This practical need is pushing the research community towards systems that can exactly pinpoint those parts of documents that contain the information requested. Answer Extraction (AE) systems aim to satisfy this need. This paper presents one such system (ExtrAns) which works by transforming documents and queries into a semantic representation called Minimal Logical Form (MLF) and derives the answers by logical proof from the documents. MLFs use underspeciïŹcation to overcome the problems associated with a complete semantic representation and offer the possibility of monotonic, non-destructive extension
Exploiting paraphrases in a question answering system
We present a Question Answering system for technical domains which makes an intelligent use of paraphrases to increase the likelihood of finding the answer to the user's question. The system implements a simple and efficient logic representation of questions and answers that maps paraphrases to the same underlying semantic representation. Further, paraphrases of technical terminology are dealt with by a separate process that detects surface variants
NLP for answer extraction in technical domains
In this paper we argue that question answering (QA) over technical domains is distinctly different from TREC-based QA or Web-based QA and it cannot benefit from data-intensive approaches. Technical questions arise in situations where concrete problems require specific answers and explanations. Finding a justification of the answer in the context of the document is essential if we have to solve a real-world problem. We show that NLP techniques can be used successfully in technical domains for high-precision access to information stored in documents. We present Extr- Ans, an answer extraction system over technical domains, its architecture, its use of logical forms for answer extractions and how terminology extraction becomes an important part of the system
Anaphora resolution in ExtrAns
The true power of anaphora resolution algorithms can only be gauged when embedded into specific Natural Language Processing (NLP) applications. In this paper we report on the anaphora resolution module from ExtrAns, an answer extraction system. The anaphora resolution module is based on Lappin and Leass' original algorithm, which used McCord's Slot Grammar as the inherent parser. We report how to port Lappin and Leass' algorithm to Link Grammar, a freely available dependency-based parsing system that is used in a range of NLP applications. Finally, we report on how the equivalence classes that result from the anaphora resolution algorithm are incorporated into the logical forms used by ExtrAns
Knowledge-based Question Answering
Large amounts of technical documentation are available in machine readable format, however there is a lack of effective ways to access them. In this paper we propose an approach based on linguistic techniques, geared towards the creation of a domain-specific Knowledge Base, starting from the available technical documentation. We then discuss an effective way to access the information encoded in the Knowledge Base. Given a user question phrased in natural language the system is capable of retrieving the encoded semantic information that most closely matches the user input, and present it by highlighting the textual elements that were used to deduct it