41 research outputs found

    German Perception Verbs: Automatic Classification of Prototypical and Multiple Non-literal Meanings

    Get PDF
    This paper presents a token-based automatic classification of German perception verbs into literal vs. multiple non-literal senses. Based on a corpus-based dataset of German perception verbs and their systematic meaning shifts, we identify one verb of each of the four perception classes optical, acoustic, olfactory, haptic, and use Decision Trees relying on syntactic and semantic corpus-based features to classify the verb uses into 3-4 senses each. Our classifier reaches accuracies between 45.5% and 69.4%, in comparison to baselines between 27.5% and 39.0%. In three out of four cases analyzed our classifier’s accuracy is significantly higher than the according baseline

    German compound splitting using the compound productivity of morphemes

    Get PDF
    In this work, we present a novel compound splitting method for German by capturing the compound productivity of morphemes. We use a giga web corpus to create a lexicon and decompose noun compounds by computing the probabilities of compound elements as bound and free morphemes. Furthermore, we provide a uniformed evaluation of several unsupervised approaches and morphological analysers for the task. Our method achieved a high F1 score of 0.92, which was a comparable result to state-of-the-art methods

    CUNI System for the WMT17 Multimodal Translation Task

    Get PDF
    In this paper, we describe our submissions to the WMT17 Multimodal Translation Task. For Task 1 (multimodal translation), our best scoring system is a purely textual neural translation of the source image caption to the target language. The main feature of the system is the use of additional data that was acquired by selecting similar sentences from parallel corpora and by data synthesis with back-translation. For Task 2 (cross-lingual image captioning), our best submitted system generates an English caption which is then translated by the best system used in Task 1. We also present negative results, which are based on ideas that we believe have potential of making improvements, but did not prove to be useful in our particular setup.Comment: 8 pages; Camera-ready submission to WMT1

    Alternative Solutions to a Language Design Problem: The Role of Adjectives and Gender Marking in Efficient Communication

    Get PDF
    A central goal of typological research is to characterize linguistic features in terms of both their functional role and their fit to social and cognitive systems. One long-standing puzzle concerns why certain languages employ grammatical gender. In an information theoretic analysis of German noun classification, Dye, Milin, Futrell, and Ramscar (2017) enumerated a number of important processing advantages gender confers. Yet this raises a further puzzle: If gender systems are so beneficial to processing, what does this mean for languages that make do without them? Here, we compare the communicative function of gender marking in German (a deterministic system) to that of prenominal adjectives in English (a probabilistic one), finding that despite their differences, both systems act to efficiently smooth information over discourse, making nouns more equally predictable in context. We examine why evolutionary pressures may favor one system over another and discuss the implications for compositional accounts of meaning and Gricean principles of communication

    Neural reranking for dependency parsing: An evaluation

    Get PDF
    Recent work has shown that neural rerankers can improve results for dependency parsing over the top k trees produced by a base parser. However, all neural rerankers so far have been evaluated on English and Chinese only, both languages with a configurational word order and poor morphology. In the paper, we re-assess the potential of successful neural reranking models from the literature on English and on two morphologically rich(er) languages, German and Czech. In addition, we introduce a new variation of a discriminative reranker based on graph convolutional networks (GCNs). We show that the GCN not only outperforms previous models on English but is the only model that is able to improve results over the baselines on German and Czech. We explain the differences in reranking performance based on an analysis of a) the gold tree ratio and b) the variety in the k-best lists

    Improving Word Sense Disambiguation in Neural Machine Translation with Sense Embeddings

    Get PDF

    A functional theory of gender paradigms

    Get PDF
    corecore