2 research outputs found
Answer Extraction from Technical Texts
For most companies and organizations, technical documents are highly valued knowledge sources because they combine the know-how and experience of specialists in a particular domain. To guarantee the optimal use of these documents in specific problem situations, people must be able to quickly find precise and highly reliable information. Answer extraction is a new technology that helps users find precise answers to their questions in technical documents. In this article, the authors present ExtrAns, a real-world answer extraction system designed for technical domains. ExtrAns uses robust natural language processing technology and a semantic representation for information's propositional content. Knowing the forms of a domain's terminology and understanding the relation between the terms is vital for answer extraction. By applying rewrite rules in a systematic way, ExtrAns gets a grip on technical terminology
Question answering in terminology-rich technical domains
The current tendency in Question Answering is towards the processing of large volumes of open-domain text. This tendency is spurred by the creation of the Question Answering track in TREC, and the recent increase of systems that use the Web to extract the answers to the questions. This has undoubtly the advantage that narrow, application-specific concerns can be overlooked in favor of more general approaches. However the unconstrained nature of the domain and questions does not necessarily lead to systems that are better at specific tasks, as they might be required in a deployed application.
It has been already been observed in other competitions (notably the Information Extraction competitions organized under the name of Message Understanding Conferences) that the nature of the competitive process tends to select a type of system that better adapts to the evaluation itself, rather than systems that deal in an optimal way with the problem. [To use a comparison from evolution theory, a too severe selection in a given local environment leads to a converge of the population to a very limited genetic pool, which is then uncapable of coping with even a minor change in the environment.]
In restricted domains, systems cannot take advantage of the so-called "Zipf's law of Questions" [Prager], which states that there is an inverse relation between the frequency of certain types of questions and their complexity. In other words, the questions most frequently asked are those that can be solved with simpler techniques. By targeting a smaller set of frequent questions types, system can achieve good results with limited effort.
By contrast, the non-redundant nature of most technical documentation, and the use of domain specific sublanguage and terminology, makes them unsuitable to (some of) the approaches seen in the TREC QA competition. In the proposed contribution We will discuss the specific nature of technical documentation, with examples from real domains (e.g. the Maintenance Manual of a large commercial aircraft) and illustrate solutions that have been adopted in a deployed system.
An example of the difference between technical documents and open domain texts is the focus on specific types of entities. While in Open Domain systems Named Entities play a major role, in Technical Documentation they are almost irrelevant, by contrast a far greater role is played by domain terminology.
Technical domains present the additional problem of "domain navigation". By assuming that users are familiar with domain concepts, inexpert users are presented with a barrier separating questions from answers. Unfamiliarity with domain terminology might lead to questions which contain imperfect formulations of domain terms. A question answering system for junior doctors or training technicians needs therefore to use whatever scarce domain knowledge is contained in a query to extract relevant answers. Detecting terminological variants and exploiting the relations between terms (like synonymy, meronymy, antonymy) is vital to this task.
Another idiosyncrasy of technical domains is the tendency towards definitional questions ("what is the ANT connection?"), which prove tricky to answer precisely in a generic document collection (and for this reason they have been deliberately left out of the recent TREC 2002). In Technical Domains it can be expected that such type of question would play a major role, and therefore systems must be capable of coping with them.
In this book chapter we aim to explain the above concepts and illustrate them with examples taken from text from technical domains. We will also illustrate why techniques that are typically used in data-intensive open-domain question-answering systems would not work effectively in technical domains that have less data redundancy. In sum, we will show that question-answering of technical domains present a better opportunity to explore content-based approaches to question-answering, while at the same time bringing the possibility of producing commercially viable systems in the short term