Search CORE

2,640 research outputs found

Recommended from our members

Cognate facilitation effects in bilingual children of varying language dominance

Author: Ramirez Mayra Chantal
Publication venue
Publication date: 02/04/2018
Field of study

A widely accepted theory is that bilinguals activate both of their languages regardless of which is in use. Though there is abundant research on this phenomenon in bilingual adults, less research has focused on bilingual children. Cognates (i.e., words that share meaning and sound across languages) have frequently been used to explore language co-activation. The present study investigates cognate facilitation effects in child bilinguals of varying language dominance. Spanish-English bilingual children between 6 and 10 years old performed a picture-naming task that included pictures of cognates and non-cognates. Children who were more English-dominant experienced larger cognate facilitation effects when producing words in their non-dominant language but not in their dominant language. In contrast, children with more balanced dominance did not experience cognate facilitation effects in either language. The findings from this study may have implications for the development of the bilingual lexicon.Psycholog

Texas ScholarWorks

Recommended from our members

Open educational resources in Europe: A triptych of actions to support participation in higher education

Author: Kirschner Paul
Lane Andrew
van Dorp Kees-Jan
Varvick Peter
Publication venue: Center for Open and Sustainable Learning (COSL)
Publication date: 01/01/2006
Field of study

In contrast to the face-to-face learning of campus based universities and the focus on traditional students, distance teaching universities focus on a mix of distance learning, e-learning, open learning, virtual mobility, learning communities, and the integration of earning and learning. In doing so, they are taking a leading role in helping to increase and widen participation in lifelong open and flexible learning in higher education by non-traditional groups. This paper discusses three leading-edge European Open Educational Resource initiatives. The initiatives are special in nature and differ from the offers of traditional universities in the sense that they: consist of pedagogically-rich learning materials, specifically designed and developed for distance learning and intended for independent self-study; are compiled in the national languages, with the EADTU initiative being multilingual, reflecting the European dimension; and, support and are supported by the policies of the national governments and the European Commission

Open Research Online (The Open University)

Using Global Constraints and Reranking to Improve Cognates Detection

Author: Bloodgood Michael
Strauss Benjamin
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2017
Field of study

Global constraints and reranking have not been used in cognates detection research to date. We propose methods for using global constraints by performing rescoring of the score matrices produced by state of the art cognates detection systems. Using global constraints to perform rescoring is complementary to state of the art methods for performing cognates detection and results in significant performance improvements beyond current state of the art performance on publicly available datasets with different language pairs and various conditions such as different levels of baseline state of the art performance and different data size conditions, including with more realistic large data size conditions than have been evaluated with in the past.Comment: 10 pages, 6 figures, 6 tables; published in the Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, pages 1983-1992, Vancouver, Canada, July 201

arXiv.org e-Print Archive

Crossref

Graphonological Levenshtein Edit Distance: Application for Automated Cognate Identification

Author: Babych B
Publication venue: 'University of Latvia'
Publication date: 02/06/2016
Field of study

This paper presents a methodology for calculating a modified Levenshtein edit distance between character strings, and applies it to the task of automated cognate identification from non-parallel (comparable) corpora. This task is an important stage in developing MT systems and bilingual dictionaries beyond the coverage of traditionally used aligned parallel corpora, which can be used for finding translation equivalents for the ‘long tail’ in Zipfian distribution: low-frequency and usually unambiguous lexical items in closely-related languages (many of those often under-resourced). Graphonological Levenshtein edit distance relies on editing hierarchical representations of phonological features for graphemes (graphonological representations) and improves on phonological edit distance proposed for measuring dialectological variation. Graphonological edit distance works directly with character strings and does not require an intermediate stage of phonological transcription, exploiting the advantages of historical and morphological principles of orthography, which are obscured if only phonetic principle is applied. Difficulties associated with plain feature representations (unstructured feature sets or vectors) are addressed by using linguistically-motivated feature hierarchy that restricts matching of lower-level graphonological features when higher-level features are not matched. The paper presents an evaluation of the graphonological edit distance in comparison with the traditional Levenshtein edit distance from the perspective of its usefulness for the task of automated cognate identification. It discusses the advantages of the proposed method, which can be used for morphology induction, for robust transliteration across different alphabets (Latin, Cyrillic, Arabic, etc.) and robust identification of words with non-standard or distorted spelling, e.g., in user-generated content on the web such as posts on social media, blogs and comments. Software for calculating the modified feature-based Levenshtein distance, and the corresponding graphonological feature representations (vectors and the hierarchies of graphemes’ features) are released on the author’s webpage: http://corpus.leeds.ac.uk/bogdan/phonologylevenshtein/. Features are currently available for Latin and Cyrillic alphabets and will be extended to other alphabets and languages

White Rose Research Online

Linear mappings: semantic transfer from transformer models for cognate detection and coreference resolution

Author: Nath Abhijnan
Publication venue: Colorado State University. Libraries
Publication date: 01/01/2022
Field of study

Includes bibliographical references.2022 Fall.Embeddings or vector representations of language and their properties are useful for understanding how Natural Language Processing technology works. The usefulness of embeddings, however, depends on how contextualized or information-rich such embeddings are. In this work, I apply a novel affine (linear) mapping technique first established in the field of computer vision to embeddings generated from large Transformer-based language models. In particular, I study its use in two challenging linguistic tasks: cross-lingual cognate detection and cross-document coreference resolution. Cognate detection for two Low-Resource Languages (LRL), Assamese and Bengali, is framed as a binary classification problem using semantic (embedding-based), articulatory, and phonetic features. Linear maps for this task are extrinsically evaluated on the extent of transfer of semantic information between monolingual as well as multi-lingual models including those specialized for low-resourced Indian languages. For cross-document coreference resolution, whole-document contextual representations are generated for event and entity mentions from cross- document language models like CDLM and other BERT-variants and then linearly mapped to form coreferring clusters based on their cosine similarities. I evaluate my results on gold output based on established coreference metrics like BCUB and MUC. My findings reveal that linearly transforming vectors from one model's embedding space to another carries certain semantic information with high fidelity thereby revealing the existence of a canonical embedding space and its geometric properties for language models. Interestingly, even for a much more challenging task like coreference resolution, linear maps are able to transfer semantic information between "lighter" models or less contextual models and "larger" models with near-equivalent performance or even improved results in some cases

Mountain Scholar (Digital Collections of Colorado and Wyoming)

Findings of the 2019 Conference on Machine Translation (WMT19)

Author: Barrault Loïc
Bojar Ondřej
Costa-Jussà Marta R.
Federmann Christian
Fishel Mark
Graham Yvette
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/08/2019
Field of study

This paper presents the results of the premier shared task organized alongside the Conference on Machine Translation (WMT) 2019. Participants were asked to build machine translation systems for any of 18 language pairs, to be evaluated on a test set of news stories. The main metric for this task is human judgment of translation quality. The task was also opened up to additional test suites to probe specific aspects of translation

Irish Universities

DCU Online Research Access Service

A Computer-Assisted Approach to Lexical Borrowing in Northeast Caucasian Languages

Author: Wren-Hardin Bonnie Eleanor
Publication venue: UKnowledge
Publication date: 01/01/2024
Field of study

The disambiguation of loanwords and cognates can be a challenge, especially in areas where there has been intense language contact over an extended period of time, when the contact is between genetically related languages, and when the number of languages involved is large Over the past several decades, more and more computational approaches to automatic cognate and borrowing detection have been created in an attempt to ease the load of examining hundreds to thousands of individual lexemes, as well as determine language family relationships with allegedly greater accuracy. While these methods are not perfect and cannot replace the knowledge or skillset of a linguist,, this paper seeks to apply a computer-assisted, as opposed to purely computational, approach to lexical borrowing detection to three Northeast Caucasian languages spoken in a cluster of villages in Dagestan: Avar, Lak, and Archi. In this thesis, I utilize computational methods for cognate detection as a starting point, as well as a lexical distribution approach to borrowing, followed by qualitative methods for determining loanwords from borrowings as applied to the output of the computational methods

University of Kentucky

Recommended from our members

Identifying and Modeling Code-Switched Language

Author: Soto Martinez Victor
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2020
Field of study

Code-switching is the phenomenon by which bilingual speakers switch between multiple languages during written or spoken communication. The importance of developing language technologies that are able to process code-switched language is immense, given the large populations that routinely code-switch. Current NLP and Speech models break down when used on code-switched data, interrupting the language processing pipeline in back-end systems and forcing users to communicate in ways which for them are unnatural. There are four main challenges that arise in building code-switched models: lack of code-switched data on which to train generative language models; lack of multilingual language annotations on code-switched examples which are needed to train supervised models; little understanding of how to leverage monolingual and parallel resources to build better code-switched models; and finally, how to use these models to learn why and when code-switching happens across language pairs. In this thesis, I look into different aspects of these four challenges. The first part of this thesis focuses on how to obtain reliable corpora of code-switched language. We collected a large corpus of code-switched language from social media using a combination of sets of anchor words that exist in one language and sentence-level language taggers. The newly obtained corpus is superior to other corpora collected via different strategies when it comes to the amount and type of bilingualism in it. It also helps train better language tagging models. We also have proposed a new annotation scheme to obtain part-of-speech tags for code-switched English-Spanish language. The annotation scheme is composed of three different subtasks including automatic labeling, word-specific questions labeling and question-tree word labeling. The part-of-speech labels obtained for the Miami Bangor corpus of English-Spanish conversational speech show very high agreement and accuracy. The second section of this thesis focuses on the tasks of part-of-speech tagging and language modeling. For the first task, we proposed a state-of-the-art approach to part-of-speech tagging of code-switched English-Spanish data based on recurrent neural networks.Our models were tested on the Miami Bangor corpus on the task of POS tagging alone, for which we achieved 96.34% accuracy, and joint part-of-speech and language ID tagging,which achieved similar POS tagging accuracy (96.39%) and very high language ID accuracy (98.78%). For the task of language modeling, we first conducted an exhaustive analysis of the relationship between cognate words and code-switching. We then proposed a set of cognate-based features that helped improve language modeling performance by 12% relative points. Furthermore, we showed that these features can also be used across language pairs and still obtain performance improvements. Finally, we tackled the question of how to use monolingual resources for code-switching models by pre-training state-of-the-art cross-lingual language models on large monolingual corpora and fine-tuning them on the tasks of language modeling and word-level language tagging on code-switched data. We obtained state-of-the-art results on both tasks

Columbia University Academic Commons

Code-Switching in Anzaldúa\u27s Borderlands/La Frontera and Walcott’s Omeros: A Literary Device for “New Readability”

Author: Lakhtikova Anastasia
Publication venue: 'University of Iowa Libraries'
Publication date: 31/05/2017
Field of study

Crossref

Poroi

Iowa Research Online

Foundation, Implementation and Evaluation of the MorphoSaurus System: Subword Indexing, Lexical Learning and Word Sense Disambiguation for Medical Cross-Language Information Retrieval

Author: Markó Kornél Géza
Publication venue
Publication date: 05/03/2009
Field of study

Im medizinischen Alltag, zu welchem viel Dokumentations- und Recherchearbeit gehört, ist mittlerweile der überwiegende Teil textuell kodierter Information elektronisch verfügbar. Hiermit kommt der Entwicklung leistungsfähiger Methoden zur effizienten Recherche eine vorrangige Bedeutung zu. Bewertet man die Nützlichkeit gängiger Textretrievalsysteme aus dem Blickwinkel der medizinischen Fachsprache, dann mangelt es ihnen an morphologischer Funktionalität (Flexion, Derivation und Komposition), lexikalisch-semantischer Funktionalität und der Fähigkeit zu einer sprachübergreifenden Analyse großer Dokumentenbestände. In der vorliegenden Promotionsschrift werden die theoretischen Grundlagen des MorphoSaurus-Systems (ein Akronym für Morphem-Thesaurus) behandelt. Dessen methodischer Kern stellt ein um Morpheme der medizinischen Fach- und Laiensprache gruppierter Thesaurus dar, dessen Einträge mittels semantischer Relationen sprachübergreifend verknüpft sind. Darauf aufbauend wird ein Verfahren vorgestellt, welches (komplexe) Wörter in Morpheme segmentiert, die durch sprachunabhängige, konzeptklassenartige Symbole ersetzt werden. Die resultierende Repräsentation ist die Basis für das sprachübergreifende, morphemorientierte Textretrieval. Neben der Kerntechnologie wird eine Methode zur automatischen Akquise von Lexikoneinträgen vorgestellt, wodurch bestehende Morphemlexika um weitere Sprachen ergänzt werden. Die Berücksichtigung sprachübergreifender Phänomene führt im Anschluss zu einem neuartigen Verfahren zur Auflösung von semantischen Ambiguitäten. Die Leistungsfähigkeit des morphemorientierten Textretrievals wird im Rahmen umfangreicher, standardisierter Evaluationen empirisch getestet und gängigen Herangehensweisen gegenübergestellt

Digitale Bibliothek Thüringen