Search CORE

390 research outputs found

Recommended from our members

Introducing a Romanian Frequency List and the Romanian Vocabulary Levels Test

Author: Szabo Cz.
Publication venue: University of Bucharest: Bucharest University Press
Publication date: 01/01/2015
Field of study

Vocabulary is considered essential to language learning, thus English word lists and tests based on frequency information have become the centre of attention for researchers, teachers and learners alike. As a result, it is argued hereby that frequency based word lists and tests should be adapted and regarded as key elements for teaching and learning Romanian as an additional language as well. Since there are currently no reliable frequency lists and lexical tests in Romanian, this paper aims to bridge this gap by introducing the first Romanian Word List and the Romanian Vocabulary Levels Test. The list contains the 10,000 most frequent Romanian words and is based on the Romanian Balanced Annotated Corpus (ROMBAC, Ion, Irimia, Ștefănescu, Tufiș 2012). The primary objective of the paper is to elaborate on the compilation criteria, the challenges involved and the benefits of such a list in the case of teaching, learning and curriculum design for Romanian as an additional language. The secondary objective is to present a practical application of the word list by introducing an exemplary Romanian lexical test, the Romanian Vocabulary Levels Test and examine its reliability and validity

Open Research Online (The Open University)

Identification of Fertile Translations in Medical Comparable Corpora: a Morpho-Compositional Approach

Author: Daille Béatrice
Delpech Estelle
Lemaire Claire
Morin Emmanuel
Publication venue
Publication date: 11/09/2012
Field of study

This paper defines a method for lexicon in the biomedical domain from comparable corpora. The method is based on compositional translation and exploits morpheme-level translation equivalences. It can generate translations for a large variety of morphologically constructed words and can also generate 'fertile' translations. We show that fertile translations increase the overall quality of the extracted lexicon for English to French translation

arXiv.org e-Print Archive

Hal - Université Grenoble Alpes

Are Automatic Methods for Cognate Detection Good Enough for Phylogenetic Reconstruction in Historical Linguistics?

Author: Gerhard Jäger
Johann-Mattis LIst
Johannes Wahle
Taraka Rama
Publication venue: 'Modern Language Association'
Publication date: 01/01/2018
Field of study

We evaluate the performance of state-of-the-art algorithms for automatic cognate detection by comparing how useful automatically inferred cognates are for the task of phylogenetic inference compared to classical manually annotated cognate sets. Our findings suggest that phylogenies inferred from automated cog- nate sets come close to phylogenies inferred from expert-annotated ones, although on average, the latter are still superior. We con- clude that future work on phylogenetic reconstruction can profit much from automatic cognate detection. Especially where scholars are merely interested in exploring the bigger picture of a language family’s phylogeny, algorithms for automatic cognate detection are a useful complement for current research on language phylogenies

arXiv.org e-Print Archive

Crossref

Humanities Commons

MPG.PuRe

Foundation, Implementation and Evaluation of the MorphoSaurus System: Subword Indexing, Lexical Learning and Word Sense Disambiguation for Medical Cross-Language Information Retrieval

Author: Markó Kornél Géza
Publication venue
Publication date: 05/03/2009
Field of study

Im medizinischen Alltag, zu welchem viel Dokumentations- und Recherchearbeit gehört, ist mittlerweile der überwiegende Teil textuell kodierter Information elektronisch verfügbar. Hiermit kommt der Entwicklung leistungsfähiger Methoden zur effizienten Recherche eine vorrangige Bedeutung zu. Bewertet man die Nützlichkeit gängiger Textretrievalsysteme aus dem Blickwinkel der medizinischen Fachsprache, dann mangelt es ihnen an morphologischer Funktionalität (Flexion, Derivation und Komposition), lexikalisch-semantischer Funktionalität und der Fähigkeit zu einer sprachübergreifenden Analyse großer Dokumentenbestände. In der vorliegenden Promotionsschrift werden die theoretischen Grundlagen des MorphoSaurus-Systems (ein Akronym für Morphem-Thesaurus) behandelt. Dessen methodischer Kern stellt ein um Morpheme der medizinischen Fach- und Laiensprache gruppierter Thesaurus dar, dessen Einträge mittels semantischer Relationen sprachübergreifend verknüpft sind. Darauf aufbauend wird ein Verfahren vorgestellt, welches (komplexe) Wörter in Morpheme segmentiert, die durch sprachunabhängige, konzeptklassenartige Symbole ersetzt werden. Die resultierende Repräsentation ist die Basis für das sprachübergreifende, morphemorientierte Textretrieval. Neben der Kerntechnologie wird eine Methode zur automatischen Akquise von Lexikoneinträgen vorgestellt, wodurch bestehende Morphemlexika um weitere Sprachen ergänzt werden. Die Berücksichtigung sprachübergreifender Phänomene führt im Anschluss zu einem neuartigen Verfahren zur Auflösung von semantischen Ambiguitäten. Die Leistungsfähigkeit des morphemorientierten Textretrievals wird im Rahmen umfangreicher, standardisierter Evaluationen empirisch getestet und gängigen Herangehensweisen gegenübergestellt

Digitale Bibliothek Thüringen

Computational approaches to semantic change (Volume 6)

Author
Publication venue: Language Science Press
Publication date: 16/10/2021
Field of study

Semantic change — how the meanings of words change over time — has preoccupied scholars since well before modern linguistics emerged in the late 19th and early 20th century, ushering in a new methodological turn in the study of language change. Compared to changes in sound and grammar, semantic change is the least understood. Ever since, the study of semantic change has progressed steadily, accumulating a vast store of knowledge for over a century, encompassing many languages and language families. Historical linguists also early on realized the potential of computers as research tools, with papers at the very first international conferences in computational linguistics in the 1960s. Such computational studies still tended to be small-scale, method-oriented, and qualitative. However, recent years have witnessed a sea-change in this regard. Big-data empirical quantitative investigations are now coming to the forefront, enabled by enormous advances in storage capability and processing power. Diachronic corpora have grown beyond imagination, defying exploration by traditional manual qualitative methods, and language technology has become increasingly data-driven and semantics-oriented. These developments present a golden opportunity for the empirical study of semantic change over both long and short time spans

Directory of Open Access Books (DOAB)

Computational Approaches to Historical Language Comparison

Author: Johann-Mattis List
Publication venue: 'Modern Language Association'
Publication date: 01/01/2022
Field of study

The chapter discusses recently developed computational techniques providing concrete help in addressing various tasks in historical language comparison, focusing specifically on those tasks which are typically subsumed under the framework of the comparative method. These include the proof of relationship, cognate and correspondence detection, phonological reconstruction and sound law induction, and the reconstruction of evolutionary scenarios

Humanities Commons

The potential of automatic word comparison for historical linguistics

Author: Gray Russell D
Greenhill Simon
List Johann Mattis
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 23/11/2020
Field of study

The amount of data from languages spoken all over the world is rapidly increasing. Traditional manual methods in historical linguistics need to face the challenges brought by this influx of data. Automatic approaches to word comparison could provide invaluable help to pre-analyze data which can be later enhanced by experts. In this way, computational approaches can take care of the repetitive and schematic tasks leaving experts to concentrate on answering interesting questions. Here we test the potential of automatic methods to detect etymologically related words (cognates) in cross-linguistic data. Using a newly compiled database of expert cognate judgments across five different language families, we compare how well different automatic approaches distinguish related from unrelated words. Our results show that automatic methods can identify cognates with a very high degree of accuracy, reaching 89% for the best-performing method Infomap. We identify the specific strengths and weaknesses of these different methods and point to major challenges for future approaches. Current automatic approaches for cognate detection-although not perfect -could become an important component of future research in historical linguistics.As part of the GlottoBank Project, this work was supported by the Max Planck Institute for the Science of Human History and the Royal Society of New Zealand Marsden Fund grant 13¬UOA-121. This paper was further supported by the DFG research fellowship grant 261553824 “Vertical and lateral aspects of Chinese dialect history”(JML), and the Australian Research Council’s Discovery Projects funding scheme (project number DE120101954, SJG)

The Australian National University