37 research outputs found

    Tree edit distance as a baseline approach for paraphrase representation

    Get PDF
    Finding an adequate paraphrase representation formalism is a challenging issue in Natural Language Processing. In this paper, we analyse the performance of Tree Edit Distance as a paraphrase representation baseline. Our experiments using Edit Distance Textual Entailment Suite show that, as Tree Edit Distance consists of a purely syntactic approach, paraphrase alternations not based on structural reorganizations do not find an adequate representation. They also show that there is much scope for better modelling of the way trees are aligned

    Paraphrase concept and typology. A linguistically based and computationally oriented approach

    Get PDF
    In this paper, we present a critical analysis of the state of the art in the definition and typologies of paraphrasing. This analysis shows that there exists no characterization of paraphrasing that is comprehensive, linguistically based and computationally tractable at the same time. The following sets out to define and delimit the concept on the basis of the propositional content. We present a general, inclusive and computationally oriented typology of the linguistic mechanisms that give rise to form variations between paraphrase pairs

    WRPA: A system for relational paraphrase acquisition from Wikipedia

    Get PDF
    In this paper we present WRPA, a system for Relational Paraphrase Acquisition from Wikipedia. WRPA extracts paraphrasing patterns that hold a particular relation between two entities taking advantage of Wikipedia structure. What is new in this system is that Wikipedia's exploitation goes beyond infoboxes, reaching itemized information embedded in Wikipedia pages. WRPA is language independent, assuming that there exists Wikipedia and shallow linguistic tools for that particular language, and also independent of the relation addressed

    La terminologia en la construcció de les ciències gastronòmiques. Experiència docent en el grau de Ciències Culinàries i Gastronòmiques

    Get PDF
    El setembre de 2014, amb el guiatge de Joan Roca, es va inaugurar el grau de Ciències Culinàries i Gastronòmiques, el primer grau interuniversitari de ciències gastronòmiques a l'Estat espanyol liderat per dues universitats públiques, la Universitat de Barcelona (mitjançant l'Escola Universitària d'Hoteleria i Turisme CETT-UB i el Campus de l'Alimentació de Torribera) i la Universitat Politècnica de Catalunya (mitjançant l'Escola Superior d'Agricultura de Barcelona), amb la col·laboració de la Fundació Alícia. Aquesta iniciativa, juntament amb projectes com el BulliLab de Ferran Adrià i la Unitat UB-Bullipèdia (unitat de col·laboració entre la UB i elBulliFoundation durant el període 2012-2015), són mostres del procés d'academització que viu actualment la gastronomia

    Plagiarism meets paraphrasing: insights for the new generation in automatic plagiarism detection

    Get PDF
    Although paraphrasing is the linguistic mechanism underlying many plagiarism cases, little attention has been paid to its analysis in the framework of automatic plagiarism detection. Therefore, state-of-the-art plagiarism detectors find it difficult to detect cases of paraphrase plagiarism. In this article, we analyse the relationship between paraphrasing and plagiarism, paying special attention to which paraphrase phenomena underlie acts of plagiarism and which of them are detected by plagiarism detection systems. With this aim in mind, we created the P4P corpus, a new resource which uses a paraphrase typology to annotate a subset of the PAN-PC-10 corpus for automatic plagiarism detection. The results of the Second International Competition on Plagiarism Detection were analysed in the light of this annotation. The presented experiments show that (i) more complex paraphrase phenomena and a high density of paraphrase mechanisms make plagiarism detection more difficult, (ii) lexical substitutions are the paraphrase mechanisms used the most when plagiarising, and (iii) paraphrase mechanisms tend to shorten the plagiarized text. For the first time, the paraphrase mechanisms behind plagiarism have been analysed, providing critical insights for the improvement of automatic plagiarism detection systems

    CoCo, a web interface for corpora compilation

    Get PDF
    CoCo es una interfaz web colaborativa para la compilación de recursos lingüísticos. En esta demo se presenta una de sus posibles aplicaciones: la obtención de paráfrasis. / CoCo is a collaborative web interface for the compilation of linguistic resources. In this demo we are presenting one of its possible applications: paraphrase acquisition.Peer ReviewedPostprint (published version

    CoCo, a web interface for corpora compilation

    Get PDF
    CoCo is a collaborative web interface for the compilation of linguistic resources. In this demo we are presenting one of its possible applications: paraphrase acquisition

    ClInt: A bilingual Spanish-Catalan spoken corpus of clinical interviews

    Get PDF
    In this paper we present ClInt (Clinical Interview), a bilingual Spanish-Catalan spoken corpus that contains 15 hours of clinical interviews. It consists of audio files aligned with multiple-level transcriptions comprising orthographic, phonetic and morphological information, as well as linguistic and extralinguistic encoding. This is a previously non-existent resource for these languages and it offers a wide-ranging exploitation potential in a broad variety of disciplines such as Linguistics, Natural Language Processing and related fields

    The TALP participation at TAC-KBP 2012

    Get PDF
    This document describes the work performed by the Universitat Politècnica de Catalunya (UPC) in its first participation at TAC-KBP 2012 in both the Entity Linking and the Slot Filling tasks.Peer ReviewedPostprint (author’s final draft
    corecore