656 research outputs found
Two Uummarmiutun modals – including a brief comparison with Utkuhikšalingmiutut cognates
The paper is concerned with the meaning of two modal
postbases in Uummarmiutun, hungnaq ‘probably’ and ȓukȓau
‘should’. Uummarmiutun is an Inuktut dialect spoken in the
Western Arctic. The analyses are founded on knowledge shared
by native speakers of Uummarmiutun. Their statements and
elaborations are quoted throughout the paper to show how they
have explained the meaning nuances of modal expressions in
their language. The paper also includes a comparison with
cognates in Utkuhikšalingmiutut, which belongs to the eastern
part of the Western Canadian dialect group (Dorais, 2010).
Using categories from Cognitive Functional Linguistics (Boye,
2005, 2012), the paper shows which meanings are covered by
hungnaq and ȓukȓau. This allows us to discover subtle
differences between the meanings of Uummarmiutun hungnaq
and ȓukȓau and their Utkuhikšalingmiutut cognates
respectively
The Role of Antonymy on Semantic Change
The role of antonymy in semantic change is investigated via the etymology of sets of English antonyms. The results show a developmental pattern wherein two words sharing an antonym tend to exhibit similar trajectories of semantic development. Metaphorical extension is proposed as the primary mechanism that produces this regularity with antonymy playing a secondary role. These results further support semantic change as regular, even in contexts not involving grammaticalization, and that furthermore, metaphor is not peripheral to language use. (See Lakoff & Johnson, 1980; Traugott & Dasher, 2002; Hopper & Traugott, 2003.) There are also implications for formal and cognitive representations that rely on antonymous relationships for modeling aspects of gradable predicates (such as Paradis, 2001; Kennedy & McNally, 2005)
An Algorithm For Building Language Superfamilies Using Swadesh Lists
The main contributions of this thesis are the following: i. Developing an algorithm to generate language families and superfamilies given for each input language a Swadesh list represented using the international phonetic alphabet (IPA) notation. ii. The algorithm is novel in using the Levenshtein distance metric on the IPA representation and in the way it measures overall distance between pairs of Swadesh lists. iii. Building a Swadesh list for the author\u27s native Kinyarwanda language because a Swadesh list could not be found even after an extensive search for it.
Adviser: Peter Reves
Recommended from our members
Identifying and Modeling Code-Switched Language
Code-switching is the phenomenon by which bilingual speakers switch between multiple languages during written or spoken communication. The importance of developing language technologies that are able to process code-switched language is immense, given the large populations that routinely code-switch. Current NLP and Speech models break down when used on code-switched data, interrupting the language processing pipeline in back-end systems and forcing users to communicate in ways which for them are unnatural.
There are four main challenges that arise in building code-switched models: lack of code-switched data on which to train generative language models; lack of multilingual language annotations on code-switched examples which are needed to train supervised models; little understanding of how to leverage monolingual and parallel resources to build better code-switched models; and finally, how to use these models to learn why and when code-switching happens across language pairs. In this thesis, I look into different aspects of these four challenges.
The first part of this thesis focuses on how to obtain reliable corpora of code-switched language. We collected a large corpus of code-switched language from social media using a combination of sets of anchor words that exist in one language and sentence-level language taggers. The newly obtained corpus is superior to other corpora collected via different strategies when it comes to the amount and type of bilingualism in it. It also helps train better language tagging models. We also have proposed a new annotation scheme to obtain part-of-speech tags for code-switched English-Spanish language. The annotation scheme is composed of three different subtasks including automatic labeling, word-specific questions labeling and question-tree word labeling. The part-of-speech labels obtained for the Miami Bangor corpus of English-Spanish conversational speech show very high agreement and accuracy.
The second section of this thesis focuses on the tasks of part-of-speech tagging and language modeling. For the first task, we proposed a state-of-the-art approach to part-of-speech tagging of code-switched English-Spanish data based on recurrent neural networks.Our models were tested on the Miami Bangor corpus on the task of POS tagging alone, for which we achieved 96.34% accuracy, and joint part-of-speech and language ID tagging,which achieved similar POS tagging accuracy (96.39%) and very high language ID accuracy (98.78%).
For the task of language modeling, we first conducted an exhaustive analysis of the relationship between cognate words and code-switching. We then proposed a set of cognate-based features that helped improve language modeling performance by 12% relative points. Furthermore, we showed that these features can also be used across language pairs and still obtain performance improvements.
Finally, we tackled the question of how to use monolingual resources for code-switching models by pre-training state-of-the-art cross-lingual language models on large monolingual corpora and fine-tuning them on the tasks of language modeling and word-level language tagging on code-switched data. We obtained state-of-the-art results on both tasks
Language control in bilingual production: Insights from error rate and error type in sentence production
First published online: 16 October 2020Most research showing that cognates are named faster than non-cognates has focused on
isolated word production which might not realistically reflect cognitive demands in
sentence production. Here, we explored whether cognates elicit interference by examining
error rates during sentence production, and how this interference is resolved by language
control mechanisms. Twenty highly proficient Spanish–English bilinguals described visual
scenes with sentence structures ‘NP1-verb-NP2’ (NP = noun-phrase). Half the nouns
and half the verbs were cognates and two manipulations created high control demands.
Both situations that demanded higher inhibitory control pushed the cognate effect from
facilitation towards interference. These findings suggest that cognates, similar to phonologically
similar words within a language, can induce not only facilitation but robust
interference.We thank Michael Freund and Nicholas McCloskey
for their help with data collection. This work was supported in part by the
Therapeutic Cognitive Neuroscience Fund endowed to the Cognitive
Neurology division of the Neurology Department at Johns Hopkins
University. C.D. Martin was supported by the Spanish Ministry of Economy
and Competitiveness (SEV-2015-490; PSI2017-82941-P; Europa-Excelencia
ERC2018-092833), the Basque Government (PIBA18-29), and the European
Research Council (ERC-2018-COG-819093). N. Nozari was also supported
by a NSF grant (NSF BCS-1949631)
The effectsof using cognatesto teach english vocabulary to spanish speakers
Esta síntesis de investigación tuvo como objetivo examinar los efectos del uso de cognados
español-inglés en el aprendizaje de vocabulario de inglés de hispanohablantes. Un total de 21
estudios empíricos recolectados ayudaron a respaldar y responder preguntas sobre los efectos
del uso de los cognados para enseñar vocabulario en inglés a hispanohablantes, la categoría
más efectiva para enseñar vocabulario en inglés, las ventajas y desventajas de usar cognados
y las perspectivas de maestros y estudiantes sobre el uso de cognados como una forma de
adquirir léxico en inglés. Los resultados de este análisis revelaron que, a través del uso de
cognados, específicamente cognados idénticos y similares, la comprensión y el desarrollo del
vocabulario en inglés fue efectivo para los hispanohablantes. De igual manera, se pudo
evidenciar que los cognados no solo ayudan en el aprendizaje y ampliación del léxico, sino
también en el procesamiento del habla, inferencia de significado, reconocimiento de palabras,
procesamiento de palabras, adquisición de léxico y confianza de los estudiantes. Por ende,
tanto profesores como alumnos coinciden en que el uso de cognados en el aula es
fundamental. Se proporcionan recomendaciones para futuras investigaciones sobre los efectos
del uso de cognados español-inglés para enseñar vocabulario en inglés a hispanohablantes y
algunas implicaciones prácticas. Es importante mencionar que también se propusieron
recomendaciones para futuras investigaciones sobre los falsos cognados debido al continuo
debate entre autores sobre su posible eficacia en la adquisición de vocabularioThis research synthesis aimed to examine the effects of the use of Spanish-English cognates
on the learning of English vocabulary by Spanish speakers. A totalof 21 empirical studies
were collected to answer and support questions about the effects of using cognates to teach
English vocabulary toSpanish speakers, the most effective category to teachEnglish
vocabulary, the advantages and disadvantages of using cognates, and the teacher and student
perspectives on the use of cognates as a way of acquiring lexicon in English. The results of
this analysis revealed that through the use of cognates, specifically identical and similar
cognates, the comprehension and development of vocabulary in English were effective for
Spanish speakers. Similarly, it was possible to identify that cognates not only help in learning
and expanding lexicon but also in speech processing, meaning inference, word recognition,
word processing, lexicon acquisition, and student confidence. Therefore, both teachers and
students agree that the use of cognates in the classroom is essential. Recommendations for
future research on the effects of using Spanish-English cognates to teach English vocabulary
to Spanish speakers and some practical implications are provided. It is worth mentioning that
recommendations for future research on false cognates were proposed due to the ongoing
debate among authors about their possible efficacy on vocabulary acquisitionLicenciado en Pedagogía del Idioma InglésCuenc
Bilingual access of homonym meanings : individual differences in bilingual access of homonym meanings
The goal of the present study was to identify the cognitive processes that underlie lexical ambiguity resolution in a second language (L2). We examined which cognitive factors predict the efficiency in accessing subordinate meanings of L2 homonyms in a sample of highly-proficient, Spanish–English bilinguals. The predictive ability of individual differences in (1) homonym processing in the L1, (2) working memory capacity and (3) sensitivity to cross-language form overlap were examined. In two experiments, participants were presented with cognate and noncognate homonyms as either a prime in a lexical decision task (Experiment 1) or embedded in a sentence (Experiment 2). In both experiments speed and accuracy in accessing subordinate meanings in the L1 was the strongest predictor of speed and accuracy in accessing subordinate meanings in the L2. Sensitivity to cross-language form overlap predicted performance in lexical decision while working memory capacity predicted processing in sentence comprehension
Automatic Identification of False Friends in Parallel Corpora: Statistical and Semantic Approach
False friends are pairs of words in two languages that are perceived as
similar but have different meanings. We present an improved
algorithm for acquiring false friends from sentence-level aligned parallel corpus
based on statistical observations of words occurrences and co-occurrences
in the parallel sentences. The results are compared with an entirely semantic
measure for cross-lingual similarity between words based on using the Web
as a corpus through analyzing the words’ local contexts extracted from the
text snippets returned by searching in Google. The statistical and semantic
measures are further combined into an improved algorithm for identification
of false friends that achieves almost twice better results than previously
known algorithms. The evaluation is performed for identifying cognates
between Bulgarian and Russian but the proposed methods could be adopted
for other language pairs for which parallel corpora and bilingual glossaries
are available
Foundation, Implementation and Evaluation of the MorphoSaurus System: Subword Indexing, Lexical Learning and Word Sense Disambiguation for Medical Cross-Language Information Retrieval
Im medizinischen Alltag, zu welchem viel Dokumentations- und Recherchearbeit gehört, ist mittlerweile der überwiegende Teil textuell kodierter Information elektronisch verfügbar. Hiermit kommt der Entwicklung leistungsfähiger Methoden zur effizienten Recherche eine vorrangige Bedeutung zu.
Bewertet man die Nützlichkeit gängiger Textretrievalsysteme aus dem Blickwinkel der medizinischen Fachsprache, dann mangelt es ihnen an morphologischer Funktionalität (Flexion, Derivation und Komposition), lexikalisch-semantischer Funktionalität und der Fähigkeit zu einer sprachübergreifenden Analyse großer Dokumentenbestände.
In der vorliegenden Promotionsschrift werden die theoretischen Grundlagen des MorphoSaurus-Systems (ein Akronym für Morphem-Thesaurus) behandelt. Dessen methodischer Kern stellt ein um Morpheme der medizinischen Fach- und Laiensprache gruppierter Thesaurus dar, dessen Einträge mittels semantischer Relationen sprachübergreifend verknüpft sind. Darauf aufbauend wird ein Verfahren vorgestellt, welches (komplexe) Wörter in Morpheme segmentiert, die durch sprachunabhängige, konzeptklassenartige Symbole ersetzt werden. Die resultierende Repräsentation ist die Basis für das sprachübergreifende, morphemorientierte Textretrieval.
Neben der Kerntechnologie wird eine Methode zur automatischen Akquise von Lexikoneinträgen vorgestellt, wodurch bestehende Morphemlexika um weitere Sprachen ergänzt werden. Die Berücksichtigung sprachübergreifender Phänomene führt im Anschluss zu einem neuartigen Verfahren zur Auflösung von semantischen Ambiguitäten.
Die Leistungsfähigkeit des morphemorientierten Textretrievals wird im Rahmen umfangreicher, standardisierter Evaluationen empirisch getestet und gängigen Herangehensweisen gegenübergestellt
- …