9 research outputs found
Translation inference through multi-lingual word embedding similarity
This paper describes our contribution to the Shared Task on
Translation Inference across Dictionaries (TIAD-2019). In our approach,
we construct a multi-lingual word embedding space by projecting new
languages in the feature space of a language for which a pretrained embedding model exists. We use the similarity of the word embeddings to
predict candidate translations. Even if our projection methodology is
rather simplistic, our system outperforms the other participating systems with respect to the F1 measure for the language pairs which we
predicted
Intelligente Schleusenzulaufsteuerung zur Effizienzsteigerung des Binnenschiffsverkehrs
Ausführungen zum Poster, siehe Posterbeitrag: https://hdl.handle.net/20.500.11970/11044
Intelligente Schleusenzulaufsteuerung zur Effizienzsteigerung des Binnenschiffsverkehrs
Posterbeitrag bezieht sich auf Poster, siehe https://hdl.handle.net/20.500.11970/11044
Towards LLOD-based language contact studies: a case study in interoperability
We describe a methodological and technical framework for conducting qualitative and quantitative studies of linguistic research
questions over diverse and heterogeneous data sources such as corpora and elicitations. We demonstrate how LLOD formalisms can be employed to develop extraction pipelines for features and linguistic examples from corpora and collections of interlinear glossed text, and furthermore, how SPARQL UPDATE can be employed
(1) to normalize diverse data against a reference data model (here, POWLA),
(2) to harmonize annotation vocabularies by reference to terminology repositories (here, OLiA),
(3) to extract examples from these normalized data structures regardless of their origin, and
(4) to implement this extraction routine in a tool-independent manner for different languages with different annotation schemes.
We demonstrate our approach for language contact studies for genetically unrelated, but neighboring languages from the Caucasus area, Eastern Armenian and Georgian
Universal morphologies for the Caucasus region
The Caucasus region is famed for its rich and diverse arrays of languages and language families, often challenging European-centered
views established in traditional linguistics. In this paper, we describe ongoing efforts to improve the coverage of Universal Morphologies for languages of the Caucasus region. The Universal Morphologies (UniMorph) are a recent community project aiming to complement the Universal Dependencies which focus on morphosyntax and syntax. We describe the development of UniMorph resources for
Nakh-Daghestanian and Kartvelian languages as a well as for Classical Armenian, we discuss challenges that the complex morphology of these and related languages poses to the current design of UniMorph, and suggest possibilities to improve the applicability of UniMorph for languages of the Caucasus region in particular and for low resource languages in general. We also criticize the UniMorph TSV format for its limited expressiveness, and suggest to complement the existing UniMorph workflow with support for additional source formats on grounds of Linked Open Data technology
Using machine learning for translation inference across dictionaries
This paper describes our contribution to the closed track of the Shared Task Translation Inference across Dictionaries (TIAD2017), 1 held in conjunction with the first Conference on Language Data and Knowledge (LDK-2017). In our approach, we use supervised machine learning to predict high-quality candidate translation pairs. We train a Support Vector Machine using several features, mostly of the translation graph, but also taking into consideration string similarity (Levenshtein
distance). As the closed track does not provide manual training data, we
define positive training examples as translation candidate pairs which
occur in a cycle in which there is a direct connection