52 research outputs found
Translation of "It" in a Deep Syntax Framework
We present a novel approach to the translation of the English personal pronoun it to Czech. We conduct a linguistic analysis on how the distinct categories of it are usually mapped to their Czech counterparts. Armed with these observations, we design a discriminative translation model of it, which is then integrated into the TectoMT deep syntax MT framework. Features in the model take advantage of rich syntactic annotation TectoMT is based on, external
tools for anaphoricity resolution, lexical co-occurrence frequencies measured on a large parallel corpus and gold coreference annotation. Even though the new model for it exhibits no improvement in terms of BLEU, manual evaluation shows that it outperforms the original solution in
8.5% sentences containing it
Coreference chains in Czech, English and Russian: Preliminary findings
Tento článek je pilotní srovnavací výzkum koreferenčních řetězců v češtině, angličtině a ruštině. Podrobili jsme analýze 16 srovnatelných textů ve třech jazycích. Naší motivací bylo zjistit lingvistickou strukturu koreferenčních řetězců v těchto jazycích a určit, které faktory ovlivňují tuto strukturu
Two Case Studies on Translating Pronouns in a Deep Syntax Framework
We focus on improving the translation of the English pronoun it and English reflexive pronouns in an English-Czech syntax-based machine translation framework. Our evaluation both from intrinsic and extrinsic perspective shows that adding specialized syntactic and coreference-related features leads to an improvement in trans-
lation quality
CoNLL 2017 Shared Task : Multilingual Parsing from Raw Text to Universal Dependencies
The Conference on Computational Natural Language Learning (CoNLL) features a shared task, in which participants train and test their learning systems on the same data sets. In 2017, one of two tasks was devoted to learning dependency parsers for a large number of languages, in a real world setting without any gold-standard annotation on input. All test sets followed a unified annotation scheme, namely that of Universal Dependencies. In this paper, we define the task and evaluation methodology, describe data preparation, report and analyze the main results, and provide a brief categorization of the different approaches of the participating systems.Peer reviewe
Relatório de estágio em farmácia comunitária
Relatório de estágio realizado no âmbito do Mestrado Integrado em Ciências Farmacêuticas, apresentado à Faculdade de Farmácia da Universidade de Coimbr
Extended nominal coreference and bridging anaphora (an approach to annotation of Czech data in Prague dependency treebank)
The dissertation presents one of the possible models of processmg extended textual coreference and bridging anaphora in a large textual corpora, which we then use for annotation of certain relations in texts of the Prague Oependency Treebank (POT). Based, on the one hand, on the literature concerning the theory of reference, discource and some findings of theoretical linguistics, and, on the other hand, using the existing methodology of annotations, we created a detailed classification of textual coreferential relations and types of bridging anaphora. Within textual coreference, we distinguish between two types of textual coreferential relations - coreferential relations between noun phrases with specific reference and coreferential relation between noun phrases with non-specific, primarily generic, reference. We determined six types of relations for bridging anaphora: relation PART- between part and whole; relation SUBSET - between a set and a subset or element of a set; FUNCT - between an object and a unique function on that entity; CONTRAST- between semantíc and contextual opposites; relation ANAF of anaphorical referencing between noncoreferencial objects; REST- for other examples of bridging anaphora. One of the goals of the research is to create a system of theoretical principals that would be used..
Towards Automatic Minuting of Meetings
Many meetings of different kinds will potentially benefit from technological support like automatic creation of meeting minutes. To prepare a reasonable automation, we need to have a detailed understanding of common types of meetings, of the linguistic properties and commonalities in the structure of meeting minutes, as well as of methods for their automation.
In this paper, we summarize the quality criteria and linguistic properties of meeting minutes, describe the available meeting corpora and meeting datasets and propose a classification of meetings and minutes types. Furthermore, we analyze the methods and tools for automatic minuting with respect to their use with existing types of datasets. We summarize the obtained knowledge with respect to our goal of designing automatic minuting and present our first steps in this direction
Extended Textual Coreference and Bridging Relations in PDT 2.0
Annotation of extended textual coreference and bridging relations in the Prague Dependency Treebank 2.
Coreferential expressions in English and Czech
In this talk, we present a comprehensive study on mappings between certain classes of coreferential expressions in English and Czech. We focused on central pronouns, relative pronouns and anaphoric zeros. For instance, the English sentence "It switched to a caffeine-free formula using its new Coke in 1985" has been in PCEDT translated to "V roce 1985 přešla na bezkofeinovou recepturu, kterou používá pro svojí novou kolu". This pair of sentences exhibits several types of changes in expressing coreference: English personal pronouns turns into a Czech zero, possessive pronoun into a possessive reflexive and finally, the -ing participle has been translated to a relative clause. In a similar manner, we have collected a statistics of mappings from a subsection of PCEDT, which we will support by multiple examples and contrast with the theoretical assumptions. For such a study, the quality of word alignment is crucial. Thus, we designed a rule-based refining algorithm for English personal and possessive pronouns and Czech relative pronouns, which served as an automatic alignment pre-annotation. Subsequently, this annotation has been manually corrected and completed, obtaining a basis for this empirical study
- …