52 research outputs found

    Translation of "It" in a Deep Syntax Framework

    Get PDF
    We present a novel approach to the translation of the English personal pronoun it to Czech. We conduct a linguistic analysis on how the distinct categories of it are usually mapped to their Czech counterparts. Armed with these observations, we design a discriminative translation model of it, which is then integrated into the TectoMT deep syntax MT framework. Features in the model take advantage of rich syntactic annotation TectoMT is based on, external tools for anaphoricity resolution, lexical co-occurrence frequencies measured on a large parallel corpus and gold coreference annotation. Even though the new model for it exhibits no improvement in terms of BLEU, manual evaluation shows that it outperforms the original solution in 8.5% sentences containing it

    Coreference chains in Czech, English and Russian: Preliminary findings

    Get PDF
    Tento článek je pilotní srovnavací výzkum koreferenčních řetězců v češtině, angličtině a ruštině. Podrobili jsme analýze 16 srovnatelných textů ve třech jazycích. Naší motivací bylo zjistit lingvistickou strukturu koreferenčních řetězců v těchto jazycích a určit, které faktory ovlivňují tuto strukturu

    Two Case Studies on Translating Pronouns in a Deep Syntax Framework

    Get PDF
    We focus on improving the translation of the English pronoun it and English reflexive pronouns in an English-Czech syntax-based machine translation framework. Our evaluation both from intrinsic and extrinsic perspective shows that adding specialized syntactic and coreference-related features leads to an improvement in trans- lation quality

    CoNLL 2017 Shared Task : Multilingual Parsing from Raw Text to Universal Dependencies

    Get PDF
    The Conference on Computational Natural Language Learning (CoNLL) features a shared task, in which participants train and test their learning systems on the same data sets. In 2017, one of two tasks was devoted to learning dependency parsers for a large number of languages, in a real world setting without any gold-standard annotation on input. All test sets followed a unified annotation scheme, namely that of Universal Dependencies. In this paper, we define the task and evaluation methodology, describe data preparation, report and analyze the main results, and provide a brief categorization of the different approaches of the participating systems.Peer reviewe

    Relatório de estágio em farmácia comunitária

    Get PDF
    Relatório de estágio realizado no âmbito do Mestrado Integrado em Ciências Farmacêuticas, apresentado à Faculdade de Farmácia da Universidade de Coimbr

    Extended nominal coreference and bridging anaphora (an approach to annotation of Czech data in Prague dependency treebank)

    Get PDF
    The dissertation presents one of the possible models of processmg extended textual coreference and bridging anaphora in a large textual corpora, which we then use for annotation of certain relations in texts of the Prague Oependency Treebank (POT). Based, on the one hand, on the literature concerning the theory of reference, discource and some findings of theoretical linguistics, and, on the other hand, using the existing methodology of annotations, we created a detailed classification of textual coreferential relations and types of bridging anaphora. Within textual coreference, we distinguish between two types of textual coreferential relations - coreferential relations between noun phrases with specific reference and coreferential relation between noun phrases with non-specific, primarily generic, reference. We determined six types of relations for bridging anaphora: relation PART- between part and whole; relation SUBSET - between a set and a subset or element of a set; FUNCT - between an object and a unique function on that entity; CONTRAST- between semantíc and contextual opposites; relation ANAF of anaphorical referencing between noncoreferencial objects; REST- for other examples of bridging anaphora. One of the goals of the research is to create a system of theoretical principals that would be used..

    Towards Automatic Minuting of Meetings

    No full text
    Many meetings of different kinds will potentially benefit from technological support like automatic creation of meeting minutes. To prepare a reasonable automation, we need to have a detailed understanding of common types of meetings, of the linguistic properties and commonalities in the structure of meeting minutes, as well as of methods for their automation. In this paper, we summarize the quality criteria and linguistic properties of meeting minutes, describe the available meeting corpora and meeting datasets and propose a classification of meetings and minutes types. Furthermore, we analyze the methods and tools for automatic minuting with respect to their use with existing types of datasets. We summarize the obtained knowledge with respect to our goal of designing automatic minuting and present our first steps in this direction

    Extended Textual Coreference and Bridging Relations in PDT 2.0

    No full text
    Annotation of extended textual coreference and bridging relations in the Prague Dependency Treebank 2.

    Coreferential expressions in English and Czech

    No full text
    In this talk, we present a comprehensive study on mappings between certain classes of coreferential expressions in English and Czech. We focused on central pronouns, relative pronouns and anaphoric zeros. For instance, the English sentence "It switched to a caffeine-free formula using its new Coke in 1985" has been in PCEDT translated to "V roce 1985 přešla na bezkofeinovou recepturu, kterou používá pro svojí novou kolu". This pair of sentences exhibits several types of changes in expressing coreference: English personal pronouns turns into a Czech zero, possessive pronoun into a possessive reflexive and finally, the -ing participle has been translated to a relative clause. In a similar manner, we have collected a statistics of mappings from a subsection of PCEDT, which we will support by multiple examples and contrast with the theoretical assumptions. For such a study, the quality of word alignment is crucial. Thus, we designed a rule-based refining algorithm for English personal and possessive pronouns and Czech relative pronouns, which served as an automatic alignment pre-annotation. Subsequently, this annotation has been manually corrected and completed, obtaining a basis for this empirical study
    corecore