9 research outputs found

    Communication with www in Czech

    Get PDF
    summary:This paper describes UIO, a multi–domain question–answering system for the Czech language that looks for answers on the web. UIO exploits two fields, namely natural language interface to databases and question answering. In its current version, UIO can be used for asking questions about train and coach timetables, cinema and theatre performances, about currency exchange rates, name–days and on the Diderot Encyclopaedia. Much effort have been made into making addition of a new domain very easy. No limits concerning words or the form of a question need to be set in UIO. Users can ask syntactically correct as well as incorrect questions, or use keywords. A Czech morphological analyser and a bottom-up chart parser are employed for analysis of the question. The database of multiword expressions is automatically updated when a new item has been found on the web. For all domains UIO has an accuracy rate about 8

    Herramientas para la investigación en Recuperación de Información: Karpanta, un motor de búsqueda experimental

    Get PDF
    Karpanta, a search engine that implements a great number of different algorithms (more than 300), and that isolates the process of automatic indexing and resolving queries of the phases of lexical analysis and visualization, is presented. The code is very simple and easily modifiable, since it solves the totality of the operations by means of simple SQL sentences, storing the data in relational tables. Karpanta is free and open code with a GPL license that can be used, freely modified and adapted by any researcher; and has been implemented specifically as a research environment tool for the interdisciplinary field of Information Retrieval. On the other side, Karpanta also can be successfully used operationally for the real tasks that occur in a documentation centreSe presenta Karpanta, un motor de recuperación sumamente flexible, que implementa un gran número de algoritmos diferentes (más de 300), y que aísla el proceso de indización automática y resolución de consultas de las fases de análisis léxico y visualización. El código es sumamente simple y fácilmente modificable, dado que resuelve la totalidad de las operaciones mediante sencillas sentencias SQL, almacenando los datos en tablas relacionales. Karpanta es un paquete de código libre y abierto con licencia pública general (GPL) que puede ser utilizado, modificado y adaptado libremente por cualquier investigador; y se ha diseñado específicamente para constituir una herramienta de investigación en el área interdisciplinar de la Recuperación de la Información. De otro lado, Karpanta puede ser también usado con éxito operacionalmente, en entornos reales y para tareas reales como las que puedan darse en un Centro de Documentación. (A

    How NLP Can Improve Question Answering

    Get PDF
    Answering open-domain factual questions requires Natural Language processing for refining document selection and answer identification. With our system QALC, we have participated to the Question Answering track of the TREC8, TREC9, and TREC10 evaluations. QALC performs an analysis of documents relying on multi-word term search and their linguistic variation both to minimize the number of documents selected and to provide additional clues when comparing question and sentence representations. This comparison process also makes use of the results of a syntactic parsing of the questions and Named Entity recognition functionalities. Answer extraction relies on the application of syntactic patterns chosen according to the kind of information that is sought for, and categorized depending on the syntactic form of the question. These patterns allow QALC to handle nicely linguistic variations at the answer leve

    Finding answers to questions, in text collections or web, in open domain or specialty domains

    Get PDF
    International audienceThis chapter is dedicated to factual question answering, i.e. extracting precise and exact answers to question given in natural language from texts. A question in natural language gives more information than a bag of word query (i.e. a query made of a list of words), and provides clues for finding precise answers. We will first focus on the presentation of the underlying problems mainly due to the existence of linguistic variations between questions and their answerable pieces of texts for selecting relevant passages and extracting reliable answers. We will first present how to answer factual question in open domain. We will also present answering questions in specialty domain as it requires dealing with semi-structured knowledge and specialized terminologies, and can lead to different applications, as information management in corporations for example. Searching answers on the Web constitutes another application frame and introduces specificities linked to Web redundancy or collaborative usage. Besides, the Web is also multilingual, and a challenging problem consists in searching answers in target language documents other than the source language of the question. For all these topics, we present main approaches and the remaining problems

    Coping with Alternate Formulations of Questions and Answers

    Get PDF
    We present in this chapter the QALC system which has participated in the four TREC QA evaluations. We focus here on the problem of linguistic variation in order to be able to relate questions and answers. We present first, variation at the term level which consists in retrieving questions terms in document sentences even if morphologic, syntactic or semantic variations alter them. Our second subject matter concerns variation at the sentence level that we handle as different partial reformulations of questions. Questions are associated with extraction patterns based on the question syntactic type and the object that is under query. We present the whole system thus allowing situating how QALC deals with variation, and different evaluations

    QUESTION ANSWERING SYSTEMS

    Get PDF
    Sustavi pitanja i odgovora su način pretraživanja informacija koji odgovara na pitanje postavljeno prirodnim jezikom. Sastoje se od triju glavnih komponenata: klasifikacija pitanja, pronalaženja informacija i izvlačenja odgovora. Uvođenjem velikih zajedničkih procjena, kao što je TREC konferencija, stvorene su snažne interesne zajednice i ubrzan napredak u istraživanju sustava pitanja i odgovora. U radu su prikazane domene problema odgovaranja na pitanja. Prikazan je povijesni razvoj sustava pitanja i odgovora od jednostavnijih, koji su bili usko specijalizirani, do današnjih, puno složenijih i kvalitetnijih sustava koji su sposobni dati nam kratke i sažete odgovore na pitanja iz raznih domena. Analizirani su sustavi prema namjeni. Dat je pregled dosadašnjih istraživanja o sustavima pitanja i odgovora. Definirana je opća arhitektura sustava pitanja i odgovora te analizirani aktualni pristupi u svakoj fazi arhitekture. Sustavi pitanja i odgovora još uvijek su u fazi razvoja i eksperimentiranja. Zaključeno je kako još uvijek najveći problem sustava pitanja i odgovora predstavlja odabir najtočnijeg i najsažetijeg odgovora s kojim će korisnik biti zadovoljan.The question answering systems are a way of information retrieval that responds to the question asked in natural language. They consist of three main components : question classification, information retrieval and response extraction. The implementation of large-scale joint assessments such as TREC conference, has created a powerful interest community and accelerated the progress in the study of question answering systems. This paper presents the problem domains in answering questions. It reviews the historical development of the question answering systems, from the simplest highly specialized ones to the present more sophisticated and higer quality systems that are able to give us short and concise answers to questions belonging to a variety of domains. The systems are analyzed according to the purpose. The paper provides an overview of previous research on question answering systems. It defines the general architecture of question answering systems and analyzes current approaches to every phase of architecture. The question answering systems are still under development and experimentation. In conclusion, the selection of the most accurate and the most concise response that will satisfy the user, still remains the largest problem in question answering systems

    EQueR : Evaluation de systèmes de Question-Réponse

    Get PDF
    International audienceUn système de question-réponse (QR) permet de poser une question en langue naturelle et se donne pour but d'extraire la réponse, quand elle y figure, d'un ensemble de textes. En cela, ces systèmes traitent de recherche d'informations précises, ou factuelles, c'est-à-dire telles qu'elles puissent être spécifiées en une seule question et dont la réponse tient en peu de mots. Typiquement, ce sont des réponses fournissant des dates, ou des noms de personnalités comme par exemple « Quand est mort Henri IV ? » ou « Qui a tué Henri IV ? », mais aussi donnant des caractéristiques sur des entités ou des événements moins faciles à typer, par exemple « Comment est mort Henri IV ? » ou « De quelle couleur est le drapeau français ? ». La recherche en question-réponse connaît un essor important depuis quelques années. On peut le constater au travers des conférences d'évaluation en recherche d'information qui proposent toutes une tâche question-réponse dorénavant, mais aussi par les conférences qui sont nombreuses à proposer ce thème dans leurs appels à propositions d'articles, et enfin via l'existence d'ateliers spécifiques à ce thème dans les grandes conférences de recherche d'information (RI) mais aussi de traitement de la langue et d'intelligence artificielle. Cela est sans doute dû à une conjonction de facteurs : 1) l'inadéquation des systèmes de recherche d'information qui proposent systématiquement une liste de documents face à différents besoins utilisateur. En effet, lorsque l'utilisateur recherche une information précise, il semble plus pertinent à la fois de pouvoir poser sa question en langue naturelle, ce qui lui permet de mieux préciser sa requête, et de ne retourner en résultat qu'un court passage contenant le réponse cherchée ; 2) l'arrivée à maturité d'un certain nombre de techniques en RI et en traitement de la langue qui permettent d'en envisager une application à large échelle, sans restriction sur le domaine traité ; 3) la possibilité de définir un cadre d'évaluation des systèmes

    Questions-Réponses en domaine ouvert (sélection pertinente de documents en fonction du contexte de la question)

    Get PDF
    Les problématiques abordées dans ma thèse sont de définir une adaptation unifiée entre la sélection des documents et les stratégies de recherche de la réponse à partir du type des documents et de celui des questions, intégrer la solution au système de Questions-Réponses (QR) RITEL du LIMSI et évaluer son apport. Nous développons et étudions une méthode basée sur une approche de Recherche d Information pour la sélection de documents en QR. Celle-ci s appuie sur un modèle de langue et un modèle de classification binaire de texte en catégorie pertinent ou non pertinent d un point de vue QR. Cette méthode permet de filtrer les documents sélectionnés pour l extraction de réponses par un système QR. Nous présentons la méthode et ses modèles, et la testons dans le cadre QR à l aide de RITEL. L évaluation est faite en français en contexte web sur un corpus de 500 000 pages web et de questions factuelles fournis par le programme Quaero. Celle-ci est menée soit sur des documents complets, soit sur des segments de documents. L hypothèse suivie est que le contenu informationnel des segments est plus cohérent et facilite l extraction de réponses. Dans le premier cas, les gains obtenus sont faibles comparés aux résultats de référence (sans filtrage). Dans le second cas, les gains sont plus élevés et confortent l hypothèse, sans pour autant être significatifs. Une étude approfondie des liens existant entre les performances de RITEL et les paramètres de filtrage complète ces évaluations. Le système de segmentation créé pour travailler sur des segments est détaillé et évalué. Son évaluation nous sert à mesurer l impact de la variabilité naturelle des pages web (en taille et en contenu) sur la tâche QR, en lien avec l hypothèse précédente. En général, les résultats expérimentaux obtenus suggèrent que notre méthode aide un système QR dans sa tâche. Cependant, de nouvelles évaluations sont à mener pour rendre ces résultats significatifs, et notamment en utilisant des corpus de questions plus importants.This thesis aims at defining a unified adaptation of the document selection and answer extraction strategies, based on the document and question types, in a Question-Answering (QA) context. The solution is integrated in RITEL (a LIMSI QA system) to assess the contribution. We develop and investigate a method based on an Information Retrieval approach for the selection of relevant documents in QA. The method is based on a language model and a binary model of textual classification in relevant or irrelevant category. It is used to filter unusable documents for answer extraction by matching lists of a priori relevant documents to the question type automatically. First, we present the method along with its underlying models and we evaluate it on the QA task with RITEL in French. The evaluation is done on a corpus of 500,000 unsegmented web pages with factoid questions provided by the Quaero program (i.e. evaluation at the document level or D-level). Then, we evaluate the methodon segmented web pages (i.e. evaluation at the segment level or S-level). The idea is that information content is more consistent with segments, which facilitates answer extraction. D-filtering brings a small improvement over the baseline (no filtering). S-filtering outperforms both the baseline and D-filtering but not significantly. Finally, we study at the S-level the links between RITEL s performances and the key parameters of the method. In order to apply the method on segments, we created a system of web page segmentation. We present and evaluate it on the QA task with the same corpora used to evaluate our document selection method. This evaluation follows the former hypothesis and measures the impact of natural web page variability (in terms of size and content) on RITEL in its task. In general, the experimental results we obtained suggest that our IR-based method helps a QA system in its task, however further investigations should be conducted especially with larger corpora of questions to make them significant.PARIS11-SCD-Bib. électronique (914719901) / SudocSudocFranceF
    corecore