16 research outputs found

    Open subtitles 2018 : Statistical rescoring of sentence alignments in large, noisy parallel corpora

    Get PDF
    Peer reviewe

    Recognizing Textual Entailment with Tree Edit Distance: Application to Question Answering and Information Extraction

    No full text
    This thesis addresses the problem of Recognizing Textual Entailment (i.e. recognizing that the meaning of a text entails the meaning of another text) using a Tree Edit Distance algorithm between the syntactic trees of the two texts. A key aspect of the approach is the estimation of the cost for the editing operations (i.e. Insertion, Deletion, Substitution) among words. Our aim is to compare the contribution of different resources providing entailment rules, including lexical rules from WordNet and the UniAlberta thesaurus, and syntactic rules automatically acquired by the Dirt and TEASE systems. We carried out a number of experiments over the PASCAL-RTE dataset in order to estimate the contribution of different combinations of the available resources. In addition, we have developed and evaluated an Answer Validation module for Question Answering and a Relation Extraction system, both of them based on textual entailment

    FBK_NK: a WordNet-based System for Multi-Way Classification of Semantic Relations

    No full text
    We describe a WordNet-based system for the extraction of semantic relations between pairs of nominals appearing in English texts. The system adopts a lightweight approach, based on training a Bayesian Network classifier using large sets of binary features. Our features consider: i) the context surrounding the annotated nominals, and ii) different types of knowledge extracted from WordNet, including direct and explicit relations between the annotated nominals, and more general and implicit evidence (e.g. seman- tic boundary collocations). The system achieved a Macro-averaged F1 of 68.02% on the “Multi-Way Classification of Se-mantic Relations Between Pairs of Nominals” task (Task #8) at SemEval-2010

    An Open-Source Package for Recognizing Textual Entailment

    No full text
    This paper presents a general-purpose open source package for recognizing Textual Entailment. The system implements a collection of algorithms, providing a configurable framework to quickly set up a working environment to experiment with the RTE task. Fast prototyping of new solutions is also allowed by the possibility to extend its modular architecture. We present the tool as a useful resource to approach the Textual Entailment problem, as an instrument for didactic purposes, and as an opportunity to create a collaborative environment to promote research in the field

    Recognizing Textual Entailment with Tree Edit Distance: Application to Question Answering and Information Extraction

    No full text
    This thesis addresses the problem of Recognizing Textual Entailment (i.e. recognizing that the meaning of a text entails the meaning of another text) using a Tree Edit Distance algorithm between the syntactic trees of the two texts. A key aspect of the approach is the estimation of the cost for the editing operations (i.e. Insertion, Deletion, Substitution) among words. Our aim is to compare the contribution of different resources providing entailment rules, including lexical rules from WordNet and the UniAlberta thesaurus, and syntactic rules automatically acquired by the Dirt and TEASE systems. We carried out a number of experiments over the PASCAL-RTE dataset in order to estimate the contribution of different combinations of the available resources. In addition, we have developed and evaluated an Answer Validation module for Question Answering and a Relation Extraction system, both of them based on textual entailment

    Document Filtering and Ranking Using Syntax and Statistics for Open Domain Question Answering

    No full text
    Document Filtering and Ranking Using Syntax and Statistics for Open Domain Question Answering. This paper presents a strategy for a syntax based ranking of documents specifically oriented to Question Answering (QA). This strategy should limit the number of documents, processed by an answer extraction module of an syntax oriented QA system. Several measures for statistical scoring of expressions are presented and evaluated on 400 factoid questions from the TREC-12 competition. We prove that syntax based document filtering can outperform classical inverse document frequency approaches (idf

    Mining Wikipedia for Large-scale Repositories of Context-Sensitive Entailment Rules

    No full text
    This paper focuses on the central role played by lexical information in the task of Recognizing Textual Entailment. In particular, the usefulness of lexical knowledge extracted from several widely used static resources, represented in the form of entailment rules, is compared with a method to extract lexical information from Wikipedia as a dynamic knowledge resource. The proposed acquisition method aims at maximizing two key features of the resulting entailment rules: coverage (i.e. the proportion of rules successfully applied over a dataset of TE pairs), and context sensitivity (i.e. the proportion of rules applied in appropriate contexts). Evaluation results show that Wikipedia can be effectively used as a source of lexical entailment rules, featuring both higher coverage and context sensitivity with respect to other resources

    Is it Worth Submitting this Run? Assess your RTE System with a Good Sparring Partner.

    No full text
    We address two issues related to the devel- opment of systems for Recognizing Textual Entailment. The first is the impossibility to capitalize on lessons learned over the different datasets available, due to the changing nature of traditional RTE evaluation settings. The second is the lack of simple ways to assess the results achieved by our system on a given training corpus, and figure out its real potential on unseen test data. Our contribution is the ex- tension of an open-source RTE package with an automatic way to explore the large search space of possible configurations, in order to select the most promising one over a given dataset. From the developers’ point of view, the efficiency and ease of use of the system, together with the good results achieved on all previous RTE datasets, represent a useful support, providing an immediate term of comparison to position the results of their approach

    Multilingual Pattern Libraries for Question Answering: a Case Study for Definition Questions

    No full text
    In this paper we investigate the effectiveness of a novel resource for Multilingual Question Answering (QA). Such a resource consists of a set of multilingual pattern libraries for answer extraction and validation. In the spirit of the ongoing attempts to develop freely available resources for QA, we argue that the distribution and use of pattern libraries will contribute to make Multilingual QA a more feasible task
    corecore