29 research outputs found
German particle verbs: Compositionality at the syntax-semantics interface
Particle verbs represent a type of multi-word expression composed of a base verb and a particle. The meaning of the particle verb is often, but not always, derived from the meaning of the base verb, sometimes in quite complex ways. In this work, we computationally assess the levels of German particle verb compositionality, with the use of distributional semantic methods. Our results demonstrate that the prediction of particle verb compositionality is possible with statistical significance. Furthermore, we investigate properties of German particle verbs that are relevant for their compositionality: the particular subcategorization behavior of particle verbs and their corresponding base verbs, and the question in how far the verb particles can be attributed meaning by themselves, which they contribute to the particle verb
Proceedings of the Conference on Natural Language Processing 2010
This book contains state-of-the-art contributions to the 10th
conference on Natural Language Processing, KONVENS 2010
(Konferenz zur Verarbeitung natĂźrlicher Sprache), with a focus
on semantic processing.
The KONVENS in general aims at offering a broad perspective
on current research and developments within the interdisciplinary
field of natural language processing. The central theme
draws specific attention towards addressing linguistic aspects
ofmeaning, covering deep as well as shallow approaches to semantic
processing. The contributions address both knowledgebased
and data-driven methods for modelling and acquiring
semantic information, and discuss the role of semantic information
in applications of language technology.
The articles demonstrate the importance of semantic processing,
and present novel and creative approaches to natural
language processing in general. Some contributions put their
focus on developing and improving NLP systems for tasks like
Named Entity Recognition or Word Sense Disambiguation, or
focus on semantic knowledge acquisition and exploitation with
respect to collaboratively built ressources, or harvesting semantic
information in virtual games. Others are set within the
context of real-world applications, such as Authoring Aids, Text
Summarisation and Information Retrieval. The collection highlights
the importance of semantic processing for different areas
and applications in Natural Language Processing, and provides
the reader with an overview of current research in this field
Extraction of ontology schema components from financial news
In this thesis we describe an incremental multi-layer rule-based methodology for the extraction of ontology schema components from German financial newspaper text. By Extraction of Ontology Schema Components we mean the detection of new concepts and relations between these concepts for ontology building. The process of detecting concepts and relations between these concepts corresponds to the intensional part of an ontology and is often referred to as ontology learning. We present the process of rule generation for the extraction of ontology schema components as well as the application of the generated rules.In dieser Arbeit beschreiben wir eine inkrementelle mehrschichtige regelbasierte Methode fĂźr die Extraktion von Ontologiekomponenten aus einer deutschen Wirtschaftszeitung. Die Arbeit beschreibt sowohl den Generierungsprozess der Regeln fĂźr die Extraktion von ontologischem Wissen als auch die Anwendung dieser Regeln. Unter Extraktion von Ontologiekomponenten verstehen wir die Erkennung von neuen Konzepten und Beziehungen zwischen diesen Konzepten fĂźr die Erstellung von Ontologien. Der Prozess der Extraktion von Konzepten und Beziehungen zwischen diesen Konzepten entspricht dem intensionalen Teil einer Ontologie und wird im Englischen Ontology Learning genannt. Im Deutschen enspricht dies dem Lernen von Ontologien
An interdisciplinary, cross-lingual perspective
Multiword expressions (MWEs), such as noun compounds (e.g. nickname in English, and Ohrwurm in German), complex verbs (e.g. give up in English, and aufgeben in German) and idioms (e.g. break the ice in English, and das Eis brechen in German), may be interpreted literally but often undergo meaning shifts with respect to their constituents. Theoretical, psycholinguistic as well as computational linguistic research remain puzzled by when and how MWEs receive literal vs. meaning-shifted interpretations, what the contributions of the MWE constituents are to the degree of semantic transparency (i.e., meaning compositionality) of the MWE, and how literal vs. meaning-shifted MWEs are processed and computed. This edited volume presents an interdisciplinary selection of seven papers on recent findings across linguistic, psycholinguistic, corpus-based and computational research fields and perspectives, discussing the interaction of constituent properties and MWE meanings, and how MWE constituents contribute to the processing and representation of MWEs. The collection is based on a workshop at the 2017 annual conference of the German Linguistic Society (DGfS) that took place at Saarland University in SaarbrĂźcken, German
The role of constituents in multiword expressions
Multiword expressions (MWEs), such as noun compounds (e.g. nickname in English, and Ohrwurm in German), complex verbs (e.g. give up in English, and aufgeben in German) and idioms (e.g. break the ice in English, and das Eis brechen in German), may be interpreted literally but often undergo meaning shifts with respect to their constituents. Theoretical, psycholinguistic as well as computational linguistic research remain puzzled by when and how MWEs receive literal vs. meaning-shifted interpretations, what the contributions of the MWE constituents are to the degree of semantic transparency (i.e., meaning compositionality) of the MWE, and how literal vs. meaning-shifted MWEs are processed and computed. This edited volume presents an interdisciplinary selection of seven papers on recent findings across linguistic, psycholinguistic, corpus-based and computational research fields and perspectives, discussing the interaction of constituent properties and MWE meanings, and how MWE constituents contribute to the processing and representation of MWEs. The collection is based on a workshop at the 2017 annual conference of the German Linguistic Society (DGfS) that took place at Saarland University in SaarbrĂźcken, Germany
The role of constituents in multiword expressions
Multiword expressions (MWEs), such as noun compounds (e.g. nickname in English, and Ohrwurm in German), complex verbs (e.g. give up in English, and aufgeben in German) and idioms (e.g. break the ice in English, and das Eis brechen in German), may be interpreted literally but often undergo meaning shifts with respect to their constituents. Theoretical, psycholinguistic as well as computational linguistic research remain puzzled by when and how MWEs receive literal vs. meaning-shifted interpretations, what the contributions of the MWE constituents are to the degree of semantic transparency (i.e., meaning compositionality) of the MWE, and how literal vs. meaning-shifted MWEs are processed and computed. This edited volume presents an interdisciplinary selection of seven papers on recent findings across linguistic, psycholinguistic, corpus-based and computational research fields and perspectives, discussing the interaction of constituent properties and MWE meanings, and how MWE constituents contribute to the processing and representation of MWEs. The collection is based on a workshop at the 2017 annual conference of the German Linguistic Society (DGfS) that took place at Saarland University in SaarbrĂźcken, Germany
An empirical analysis of phrase-based and neural machine translation
Two popular types of machine translation (MT) are phrase-based and neural
machine translation systems. Both of these types of systems are composed of
multiple complex models or layers. Each of these models and layers learns
different linguistic aspects of the source language. However, for some of these
models and layers, it is not clear which linguistic phenomena are learned or
how this information is learned. For phrase-based MT systems, it is often clear
what information is learned by each model, and the question is rather how this
information is learned, especially for its phrase reordering model. For neural
machine translation systems, the situation is even more complex, since for many
cases it is not exactly clear what information is learned and how it is
learned.
To shed light on what linguistic phenomena are captured by MT systems, we
analyze the behavior of important models in both phrase-based and neural MT
systems. We consider phrase reordering models from phrase-based MT systems to
investigate which words from inside of a phrase have the biggest impact on
defining the phrase reordering behavior. Additionally, to contribute to the
interpretability of neural MT systems we study the behavior of the attention
model, which is a key component in neural MT systems and the closest model in
functionality to phrase reordering models in phrase-based systems. The
attention model together with the encoder hidden state representations form the
main components to encode source side linguistic information in neural MT. To
this end, we also analyze the information captured in the encoder hidden state
representations of a neural MT system. We investigate the extent to which
syntactic and lexical-semantic information from the source side is captured by
hidden state representations of different neural MT architectures.Comment: PhD thesis, University of Amsterdam, October 2020.
https://pure.uva.nl/ws/files/51388868/Thesis.pd
Investigating the Selection of Example Sentences for Unknown Target Words in ICALL Reading Texts for L2 German
Institute for Communicating and Collaborative SystemsThis thesis considers possible criteria for the selection of example
sentences for difficult or unknown words in reading texts for students of German
as a Second Language (GSL). The examples are intended to be provided
within the context of an Intelligent Computer-Aided Language Learning (ICALL) Vocabulary
Learning System, where students can choose
among several explanation options for difficult words. Some of these options (e.g. glosses)
have received a good deal of attention in the ICALL/Second Language (L2) Acquisition
literature; in contrast, literature on examples has been the near exclusive province
of lexicographers.
The selection of examples is explored from an educational,
L2 teaching point of view: the thesis is intended as a first
exploration of the question of what makes an example helpful to the
L2 student from the perspective of L2 teachers. An important motivation for this work is that
selecting examples from a dictionary or randomly from a corpus has
several drawbacks: first, the number of available dictionary
examples is limited; second, the examples fail to take into account the context
in which the word was encountered; and third, the rationale
and precise principles behind the selection of dictionary examples is usually
less than clear.
Central to this thesis is the hypothesis that a random selection of example
sentences from a suitable corpus can be improved by a guided selection process that takes
into account characteristics of helpful examples.
This is investigated by an empirical study conducted with teachers of L2 German.
The teacher data show that four dimensions are significant criteria amenable to analysis:
(a) reduced syntactic complexity, (b) sentence similarity,
provision of (c) significant co-occurrences and (d) semantically related words.
Models based on these dimensions are developed using logistic regression analysis,
and evaluated through two further empirical studies with teachers and students of L2 German.
The results of the teacher evaluation are encouraging: for the teacher evaluation, they indicate
that, for one of the models, the top-ranked selections perform on the same level as dictionary
examples. In addition, the model provides a ranking of potential examples that roughly
corresponds to that of experienced teachers of L2 German. The student evaluation confirms
and notably improves on the teacher evaluation in that the best-performing model of the
teacher evaluation significantly outperforms both random corpus selections
and dictionary examples (when a penalty for missing entries is included)