41 research outputs found
German Perception Verbs: Automatic Classification of Prototypical and Multiple Non-literal Meanings
This paper presents a token-based automatic classification of German perception verbs into literal vs. multiple non-literal senses. Based on a corpus-based dataset of German perception verbs and their systematic meaning shifts, we identify one verb of each of the four perception classes optical, acoustic, olfactory, haptic, and use Decision Trees relying on syntactic and semantic corpus-based features to classify the verb uses into 3-4 senses each. Our classifier reaches accuracies between 45.5% and 69.4%, in comparison to baselines between 27.5% and 39.0%. In three out of four cases analyzed our classifier’s accuracy is significantly higher than the according baseline
German compound splitting using the compound productivity of morphemes
In this work, we present a novel compound splitting method for German by capturing the compound productivity of morphemes. We use a giga web corpus to create a lexicon and decompose noun compounds by computing the probabilities of compound elements as bound and free morphemes. Furthermore, we provide a uniformed evaluation of several unsupervised approaches and morphological analysers for the task. Our method achieved a high F1 score of 0.92, which was a comparable result to state-of-the-art methods
CUNI System for the WMT17 Multimodal Translation Task
In this paper, we describe our submissions to the WMT17 Multimodal
Translation Task. For Task 1 (multimodal translation), our best scoring system
is a purely textual neural translation of the source image caption to the
target language. The main feature of the system is the use of additional data
that was acquired by selecting similar sentences from parallel corpora and by
data synthesis with back-translation. For Task 2 (cross-lingual image
captioning), our best submitted system generates an English caption which is
then translated by the best system used in Task 1. We also present negative
results, which are based on ideas that we believe have potential of making
improvements, but did not prove to be useful in our particular setup.Comment: 8 pages; Camera-ready submission to WMT1
Alternative Solutions to a Language Design Problem: The Role of Adjectives and Gender Marking in Efficient Communication
A central goal of typological research is to characterize linguistic features in terms of both their functional role and their fit to social and cognitive systems. One long-standing puzzle concerns why certain languages employ grammatical gender. In an information theoretic analysis of German noun classification, Dye, Milin, Futrell, and Ramscar (2017) enumerated a number of important processing advantages gender confers. Yet this raises a further puzzle: If gender systems are so beneficial to processing, what does this mean for languages that make do without them? Here, we compare the communicative function of gender marking in German (a deterministic system) to that of prenominal adjectives in English (a probabilistic one), finding that despite their differences, both systems act to efficiently smooth information over discourse, making nouns more equally predictable in context. We examine why evolutionary pressures may favor one system over another and discuss the implications for compositional accounts of meaning and Gricean principles of communication
Neural reranking for dependency parsing: An evaluation
Recent work has shown that neural rerankers can improve results for dependency parsing over the top k trees produced by a base parser. However, all neural rerankers so far have been evaluated on English and Chinese only, both languages with a configurational word order and poor morphology. In the paper, we re-assess the potential of successful neural reranking models from the literature on English and on two morphologically rich(er) languages, German and Czech. In addition, we introduce a new variation of a discriminative reranker based on graph convolutional networks (GCNs). We show that the GCN not
only outperforms previous models on English but is the only model that is able to improve results over the baselines on German and Czech. We explain the differences in reranking performance based on an analysis of a) the gold tree ratio and b) the variety in the k-best lists