119 research outputs found
AÂ distributional semantic study on German event nominalizations
AbstractWe present the results of a large-scale corpus-based comparison of two German event nominalization patterns: deverbal nouns in -ung (e.g., die Evaluierung, 'the evaluation') and nominal infinitives (e.g., das Evaluieren, 'the evaluating'). Among the many available event nominalization patterns for German, we selected these two because they are both highly productive and challenging from the semantic point of view. Both patterns are known to keep a tight relation with the event denoted by the base verb, but with different nuances. Our study targets a better understanding of the differences in their semantic import.The key notion of our comparison is that of semantic transparency, and we propose a usage-based characterization of the relationship between derived nominals and their bases. Using methods from distributional semantics, we bring to bear two concrete measures of transparency which highlight different nuances: the first one, cosine, detects nominalizations which are semantically similar to their bases; the second one, distributional inclusion, detects nominalizations which are used in a subset of the contexts of the base verb. We find that only the inclusion measure helps in characterizing the difference between the two types of nominalizations, in relation with the traditionally considered variable of relative frequency (Hay, 2001). Finally, the distributional analysis allows us to frame our comparison in the broader coordinates of the inflection vs. derivation cline
First International Workshop on Lexical Resources
International audienceLexical resources are one of the main sources of linguistic information for research and applications in Natural Language Processing and related fields. In recent years advances have been achieved in both symbolic aspects of lexical resource development (lexical formalisms, rule-based tools) and statistical techniques for the acquisition and enrichment of lexical resources, both monolingual and multilingual. The latter have allowed for faster development of large-scale morphological, syntactic and/or semantic resources, for widely-used as well as resource-scarce languages. Moreover, the notion of dynamic lexicon is used increasingly for taking into account the fact that the lexicon undergoes a permanent evolution.This workshop aims at sketching a large picture of the state of the art in the domain of lexical resource modeling and development. It is also dedicated to research on the application of lexical resources for improving corpus-based studies and language processing tools, both in NLP and in other language-related fields, such as linguistics, translation studies, and didactics
Foundation, Implementation and Evaluation of the MorphoSaurus System: Subword Indexing, Lexical Learning and Word Sense Disambiguation for Medical Cross-Language Information Retrieval
Im medizinischen Alltag, zu welchem viel Dokumentations- und Recherchearbeit gehört, ist mittlerweile der überwiegende Teil textuell kodierter Information elektronisch verfügbar. Hiermit kommt der Entwicklung leistungsfähiger Methoden zur effizienten Recherche eine vorrangige Bedeutung zu.
Bewertet man die Nützlichkeit gängiger Textretrievalsysteme aus dem Blickwinkel der medizinischen Fachsprache, dann mangelt es ihnen an morphologischer Funktionalität (Flexion, Derivation und Komposition), lexikalisch-semantischer Funktionalität und der Fähigkeit zu einer sprachübergreifenden Analyse großer Dokumentenbestände.
In der vorliegenden Promotionsschrift werden die theoretischen Grundlagen des MorphoSaurus-Systems (ein Akronym für Morphem-Thesaurus) behandelt. Dessen methodischer Kern stellt ein um Morpheme der medizinischen Fach- und Laiensprache gruppierter Thesaurus dar, dessen Einträge mittels semantischer Relationen sprachübergreifend verknüpft sind. Darauf aufbauend wird ein Verfahren vorgestellt, welches (komplexe) Wörter in Morpheme segmentiert, die durch sprachunabhängige, konzeptklassenartige Symbole ersetzt werden. Die resultierende Repräsentation ist die Basis für das sprachübergreifende, morphemorientierte Textretrieval.
Neben der Kerntechnologie wird eine Methode zur automatischen Akquise von Lexikoneinträgen vorgestellt, wodurch bestehende Morphemlexika um weitere Sprachen ergänzt werden. Die Berücksichtigung sprachübergreifender Phänomene führt im Anschluss zu einem neuartigen Verfahren zur Auflösung von semantischen Ambiguitäten.
Die Leistungsfähigkeit des morphemorientierten Textretrievals wird im Rahmen umfangreicher, standardisierter Evaluationen empirisch getestet und gängigen Herangehensweisen gegenübergestellt
Analyzing Text Complexity and Text Simplification: Connecting Linguistics, Processing and Educational Applications
Reading plays an important role in the process of learning and knowledge acquisition
for both children and adults. However, not all texts are accessible to every
prospective reader. Reading difficulties can arise when there is a mismatch between
a reader’s language proficiency and the linguistic complexity of the text
they read. In such cases, simplifying the text in its linguistic form while retaining
all the content could aid reader comprehension. In this thesis, we study text
complexity and simplification from a computational linguistic perspective.
We propose a new approach to automatically predict the text complexity using
a wide range of word level and syntactic features of the text. We show that this
approach results in accurate, generalizable models of text readability that work
across multiple corpora, genres and reading scales. Moving from documents to
sentences, We show that our text complexity features also accurately distinguish
different versions of the same sentence in terms of the degree of simplification
performed. This is useful in evaluating the quality of simplification performed by
a human expert or a machine-generated output and for choosing targets to simplify
in a difficult text. We also experimentally show the effect of text complexity on
readers’ performance outcomes and cognitive processing through an eye-tracking
experiment.
Turning from analyzing text complexity and identifying sentential simplifications
to generating simplified text, one can view automatic text simplification as a
process of translation from English to simple English. In this thesis, we propose
a statistical machine translation based approach for text simplification, exploring
the role of focused training data and language models in the process.
Exploring the linguistic complexity analysis further, we show that our text
complexity features can be useful in assessing the language proficiency of English
learners. Finally, we analyze German school textbooks in terms of their
linguistic complexity, across various grade levels, school types and among different
publishers by applying a pre-existing set of text complexity features developed
for German
The semantic transparency of English compound nouns
What is semantic transparency, why is it important, and which factors play a role in its assessment? This work approaches these questions by investigating English compound nouns. The first part of the book gives an overview of semantic transparency in the analysis of compound nouns, discussing its role in models of morphological processing and differentiating it from related notions. After a chapter on the semantic analysis of complex nominals, it closes with a chapter on previous attempts to model semantic transparency. The second part introduces new empirical work on semantic transparency, introducing two different sets of statistical models for compound transparency. In particular, two semantic factors were explored: the semantic relations holding between compound constituents and the role of different readings of the constituents and the whole compound, operationalized in terms of meaning shifts and in terms of the distribution of specifc readings across constituent families. All semantic annotations used in the book are freely available
The semantic transparency of English compound nouns
What is semantic transparency, why is it important, and which factors play a role in its assessment? This work approaches these questions by investigating English compound nouns. The first part of the book gives an overview of semantic transparency in the analysis of compound nouns, discussing its role in models of morphological processing and differentiating it from related notions. After a chapter on the semantic analysis of complex nominals, it closes with a chapter on previous attempts to model semantic transparency. The second part introduces new empirical work on semantic transparency, introducing two different sets of statistical models for compound transparency. In particular, two semantic factors were explored: the semantic relations holding between compound constituents and the role of different readings of the constituents and the whole compound, operationalized in terms of meaning shifts and in terms of the distribution of specifc readings across constituent families
CLARIN. The infrastructure for language resources
CLARIN, the "Common Language Resources and Technology Infrastructure", has established itself as a major player in the field of research infrastructures for the humanities. This volume provides a comprehensive overview of the organization, its members, its goals and its functioning, as well as of the tools and resources hosted by the infrastructure. The many contributors representing various fields, from computer science to law to psychology, analyse a wide range of topics, such as the technology behind the CLARIN infrastructure, the use of CLARIN resources in diverse research projects, the achievements of selected national CLARIN consortia, and the challenges that CLARIN has faced and will face in the future.
The book will be published in 2022, 10 years after the establishment of CLARIN as a European Research Infrastructure Consortium by the European Commission (Decision 2012/136/EU)
CLARIN
The book provides a comprehensive overview of the Common Language Resources and Technology Infrastructure – CLARIN – for the humanities. It covers a broad range of CLARIN language resources and services, its underlying technological infrastructure, the achievements of national consortia, and challenges that CLARIN will tackle in the future. The book is published 10 years after establishing CLARIN as an Europ. Research Infrastructure Consortium
- …