13 research outputs found

    The bilingual system MUSCLEF at QA@ CLEF 2006

    Get PDF
    International audienceThis paper presents our bilingual question answering system MUSCLEF. We underline the difficulties encountered when shifting from a mono to a cross-lingual system, then we focus on the evaluation of three modules of MUSCLEF: question analysis, answer extraction and fusion. We finally present how we re-used different modules of MUSCLEF to participate in AVE (Answer Validation Exercise)

    Utilisation de la syntaxe pour valider les réponses à des questions par plusieurs documents.

    Get PDF
    National audienceCet article présente FIDJI, un système de questions-réponses pour le français, combinant des informations syntaxiques sur la question et les documents avec des techniques plus traditionnelles du domaine, telles que la reconnaissance des entités nommées et la pondération des termes. Nous expérimentons notament dans ce système la validation des réponses dans plusieurs documents, ainsi que des techniques spécifiques permettant de répondre à différents types de questions (comme les questions attendant des réponses multiples (liste) ou une réponse booléenne)

    Finding answers to questions, in text collections or web, in open domain or specialty domains

    Get PDF
    International audienceThis chapter is dedicated to factual question answering, i.e. extracting precise and exact answers to question given in natural language from texts. A question in natural language gives more information than a bag of word query (i.e. a query made of a list of words), and provides clues for finding precise answers. We will first focus on the presentation of the underlying problems mainly due to the existence of linguistic variations between questions and their answerable pieces of texts for selecting relevant passages and extracting reliable answers. We will first present how to answer factual question in open domain. We will also present answering questions in specialty domain as it requires dealing with semi-structured knowledge and specialized terminologies, and can lead to different applications, as information management in corporations for example. Searching answers on the Web constitutes another application frame and introduces specificities linked to Web redundancy or collaborative usage. Besides, the Web is also multilingual, and a challenging problem consists in searching answers in target language documents other than the source language of the question. For all these topics, we present main approaches and the remaining problems

    Selecting answers to questions from Web documents by a robust validation process

    Get PDF
    International audienceQuestion answering (QA) systems aim at finding answers to question posed in natural language using a collection of documents. When the collection is extracted from the Web, the structure and style of the texts are quite different from those of newspaper articles. We developed a QA system based on an answer validation process able to handle Web specificity. A large number of candidate answers are extracted from short passages in order to be validated according to question and passages characteristics. The validation module is based on a machine learning approach. It takes into account criteria characterizing both the passage and answer relevance at the surface, lexical, syntactic and semantic levels to deal with different types of texts. We present and compare results obtained for factual questions posed on a Web and on a newspaper collection. We show that our system outperforms a baseline by up to 48% in MRR

    Boosting Chinese Question Answering with Two Lightweight Methods: ABSPs and SCO-QAT

    Get PDF
    [[abstract]]Question Answering (QA) research has been conducted in many languages. Nearly all the top performing systems use heavy methods that require sophisticated techniques, such as parsers or logic provers. However, such techniques are usually unavailable or unaffordable for under-resourced languages or in resource-limited situations. In this article, we describe how a top-performing Chinese QA system can be designed by using lightweight methods effectively. We propose two lightweight methods, namely the Sum of Co-occurrences of Question and Answer Terms (SCO-QAT) and Alignment-based Surface Patterns (ABSPs). SCO-QAT is a co-occurrence-based answer-ranking method that does not need extra knowledge, word-ignoring heuristic rules, or tools. It calculates co-occurrence scores based on the passage retrieval results. ABSPs are syntactic patterns trained from question-answer pairs with a multiple alignment algorithm. They are used to capture the relations between terms and then use the relations to filter answers. We attribute the success of the ABSPs and SCO-QAT methods to the effective use of local syntactic information and global co-occurrence information. By using SCO-QAT and ABSPs, we improved the RU-Accuracy of our testbed QA system, ASQA, from 0.445 to 0.535 on the NTCIR-5 dataset. It also achieved the top 0.5 RU-Accuracy on the NTCIR-6 dataset. The result shows that lightweight methods are not only cheaper to implement, but also have the potential to achieve state-of-the-art performances.[[notice]]補正完畢[[journaltype]]國外[[incitationindex]]E

    Sélection de réponses à des questions dans un corpus Web par validation

    Get PDF
    National audienceLes systèmes de questions réponses recherchent la réponse à une question posée en langue naturelle dans un ensemble de documents. Les collections Web diffèrent des articles de journaux de par leurs structures et leur style. Pour tenir compte de ces spécificités nous avons développé un système fondé sur une approche robuste de validation où des réponses candidates sont extraites à partir de courts passages textuels puis ordonnées par apprentissage. Les résultats montrent une amélioration du MRR (Mean Reciprocal Rank) de 48% par rapport à la baseline

    L’extraction des réponses dans un système de question-réponse

    Get PDF
    National audienceLes systèmes de question-réponse sont la plupart du temps composés de trois grands modules : l’analyse de la question, la sélection des documents et l’extraction de la réponse. Dans cet article, nous nous intéressons au troisième module, plus particulièrement dans le cas plus délicat où la réponse attendue n’est pas du type entité nommée. Nous décrivons comment l’analyseur Cass est employé pour marquer la réponse dans les phrases candidates et nous évaluons les résultats de cette approche. Au préalable, nous décrivons et évaluons le module dédié à l’analyse de la question, car les informations qui en sont issues sont nécessaires à notre étape finale d’extraction

    Improvements to GeoQA, a Question Answering system for Geospatial Questions

    Get PDF
    Η παρούσα εργασία αποτελεί μια προσπάθεια για συγκέντρωση, μελέτη και σύγκριση συστημάτων απάντησης ερωτήσεων όπως τα QUINT, TEMPO και NEQA και του σκελετού συστημάτων απάντησης ερωτήσεων Frankenstein. Η μελέτη επικεντρώνεται στην απάντηση ερωτήσεων σε γεωχωρικά δεδομένα και πιο στο σύστημα GeoQA. Το σύστημα αυτό έχει προταθεί πρόσφατα και ειναι το πρώτο σύστημα απάντησης ερωτήσεων πάνω σε συνδεδεμένα γεωχωρικά δεδομένα βασιζόμενο σε πρότυπα. Βελτιώνουμε το παραπάνω σύστημα χρησιμοποιώντας τα δεδομένα για το σχήμα των βάσεων γνώσης του, προσθέτοντας πρότυπα για πιο σύνθετες ερωτήσεις και αναπτύσσοντας το υποσύστημα για την επεξεργασία φυσικής γλώσσας.We study the question-answering GeoQA which was proposed recently. GeoQA is the first template-based question answering system for linked geospatial data. We improve this system by exploiting the data schema information of the kb’s it’s using, adding more templates for more complex queries and by improving the natural language processing module in order to recognize the patterns. The current work is also an attempt to concentrate, study and compare some other question-answering systems like QUINT, Qanary methodology and Frankenstein framework for question answering systems

    RECUPERACIÓN DE PASAJES EN TEXTOS LEGALES Y PATENTES MULTILINGÜES

    Full text link
    En este trabajo se expone: la problemática de la recuperación de pasajes, el dominio de los textos legales y las patentes y su característica de diversidad idiomática. Se presentan técnicas para solucionar problemas de recuperación de información y se analizan dos participaciones en competencias con prepuestas de enfoques novedosos.Correa García, S. (2010). RECUPERACIÓN DE PASAJES EN TEXTOS LEGALES Y PATENTES MULTILINGÜES. http://hdl.handle.net/10251/14084Archivo delegad

    Arabic named entity recognition

    Full text link
    En esta tesis doctoral se describen las investigaciones realizadas con el objetivo de determinar las mejores tecnicas para construir un Reconocedor de Entidades Nombradas en Arabe. Tal sistema tendria la habilidad de identificar y clasificar las entidades nombradas que se encuentran en un texto arabe de dominio abierto. La tarea de Reconocimiento de Entidades Nombradas (REN) ayuda a otras tareas de Procesamiento del Lenguaje Natural (por ejemplo, la Recuperacion de Informacion, la Busqueda de Respuestas, la Traduccion Automatica, etc.) a lograr mejores resultados gracias al enriquecimiento que a~nade al texto. En la literatura existen diversos trabajos que investigan la tarea de REN para un idioma especifico o desde una perspectiva independiente del lenguaje. Sin embargo, hasta el momento, se han publicado muy pocos trabajos que estudien dicha tarea para el arabe. El arabe tiene una ortografia especial y una morfologia compleja, estos aspectos aportan nuevos desafios para la investigacion en la tarea de REN. Una investigacion completa del REN para elarabe no solo aportaria las tecnicas necesarias para conseguir un alto rendimiento, sino que tambien proporcionara un analisis de los errores y una discusion sobre los resultados que benefician a la comunidad de investigadores del REN. El objetivo principal de esta tesis es satisfacer esa necesidad. Para ello hemos: 1. Elaborado un estudio de los diferentes aspectos del arabe relacionados con dicha tarea; 2. Analizado el estado del arte del REN; 3. Llevado a cabo una comparativa de los resultados obtenidos por diferentes tecnicas de aprendizaje automatico; 4. Desarrollado un metodo basado en la combinacion de diferentes clasificadores, donde cada clasificador trata con una sola clase de entidades nombradas y emplea el conjunto de caracteristicas y la tecnica de aprendizaje automatico mas adecuados para la clase de entidades nombradas en cuestion. Nuestros experimentos han sido evaluados sobre nueve conjuntos de test.Benajiba, Y. (2009). Arabic named entity recognition [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/8318Palanci
    corecore