Search CORE

1,708 research outputs found

Multilingual Neural Machine Translation System for Indic to Indic Languages

Author: Das Sudhansu Bala
Ekbal Asif
Mishra Tapas Kumar
Panda Divyajyoti
Patra Bidyut Kr.
Publication venue
Publication date: 22/06/2023
Field of study

This paper gives an Indic-to-Indic (IL-IL) MNMT baseline model for 11 ILs implemented on the Samanantar corpus and analyzed on the Flores-200 corpus. All the models are evaluated using the BLEU score. In addition, the languages are classified under three groups namely East Indo- Aryan (EI), Dravidian (DR), and West Indo-Aryan (WI). The effect of language relatedness on MNMT model efficiency is studied. Owing to the presence of large corpora from English (EN) to ILs, MNMT IL-IL models using EN as a pivot are also built and examined. To achieve this, English- Indic (EN-IL) models are also developed, with and without the usage of related languages. Results reveal that using related languages is beneficial for the WI group only, while it is detrimental for the EI group and shows an inconclusive effect on the DR group, but it is useful for EN-IL models. Thus, related language groups are used to develop pivot MNMT models. Furthermore, the IL corpora are transliterated from the corresponding scripts to a modified ITRANS script, and the best MNMT models from the previous approaches are built on the transliterated corpus. It is observed that the usage of pivot models greatly improves MNMT baselines with AS-TA achieving the minimum BLEU score and PA-HI achieving the maximum score. Among languages, AS, ML, and TA achieve the lowest BLEU score, whereas HI, PA, and GU perform the best. Transliteration also helps the models with few exceptions. The best increment of scores is observed in ML, TA, and BN and the worst average increment is observed in KN, HI, and PA, across all languages. The best model obtained is the PA-HI language pair trained on PAWI transliterated corpus which gives 24.29 BLEU.Comment: 38 pages, 2 figure

arXiv.org e-Print Archive

On the scope of the referential hierarchy in the typology of grammatical relations

Author: Bickel Balthasar
Publication venue
Publication date: 10/08/2010
Field of study

In the late seventies, Bernard Comrie was one of the first linguists to explore the effects of the referential hierarchy (RH) on the distribution of grammatical relations (GRs). The referential hierarchy is also known in the literature as the animacy, empathy or indexibability hierarchy and ranks speech act participants (i.e. first and second person) above third persons, animates above inanimates, or more topical referents above less topical referents. Depending on the language, the hierarchy is sometimes extended by analogy to rankings of possessors above possessees, singulars above plurals, or other notions. In his 1981 textbook, Comrie analyzed RH effects as explaining (a) differential case (or adposition) marking of transitive subject ("A") noun phrases in low RH positions (e.g. inanimate or third person) and of object ("P") noun phrases in high RH positions (e.g. animate or first or second person), and (b) hierarchical verb agreement coupled with a direct vs. inverse distinction, as in Algonquian (Comrie 1981: Chapter 6)

Hochschulschriftenserver - Universität Frankfurt am Main

Exploring SL Writing and SL Sensitivity during Writing Tasks : poor and advanced writing in a context of second language other than English

Author: Figueiredo Sandra
Martins Margarida Alves
Silva C.
Simões C.
Publication venue: World Academy of Science, Engineering and Technology
Publication date: 01/01/2015
Field of study

This study integrates a larger research empirical project that examines second language (SL) learners’ profiles and valid procedures to perform complete and diagnostic assessment in schools. 102 learners of Portuguese as a SL aged 7 and 17 years speakers of distinct home languages were assessed in several linguistic tasks. In this article, we focused on writing performance in the specific task of narrative essay composition. The written outputs were measured using the score in six components adapted from an English SL assessment context (Alberta Education): linguistic vocabulary, grammar, syntax, strategy, socio-linguistic, and discourse. The writing processes and strategies in Portuguese language used by different immigrant students were analysed to determine features and diversity of deficits on authentic texts performed by SL writers. Differentiated performance was based on the diversity of the following variables: grades, previous schooling, home language, instruction in first language, and exposure to Portuguese as Second Language. Indo-Aryan languages speakers showed low writing scores compared to their peers and the type of language and respective cognitive mapping (such as Mandarin and Arabic) was the predictor, not linguistic distance. Home language instruction should also be prominently considered in further research to understand specificities of cognitive academic profile in a Romance languages learning context. Additionally, this study also examined the teachers representations that will be here addressed to understand educational implications of second language teaching in psychological distress of different minorities in schools of specific host countries.info:eu-repo/semantics/publishedVersio

Repositório do ISPA

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

Do linguistically diverse migrants dominate advanced mathematics? Comparing Greater Sydney with the rest of New South Wales

Author: Roberts Philip
Sikora Joanna
Publication venue
Publication date: 19/06/2023
Field of study

University of Canberra Research Repository

Second language education context and home language effect: language dissimilarities and variation in immigrant students’ outcomes.

Author: Figueiredo Sandra
Martins Maria Margarida Alves d'Orey
Silva Carlos Fernandes da
Publication venue: Routledge
Publication date: 01/01/2016
Field of study

Heritage language speakers struggle in European classrooms with insufficient material provided for second language (SL) learning and assessment. Considering the amount of instruments and pertinent studies in English SL, immigrant students are better prepared than their peers in Romance language settings. This study investigates how factors such as age and home language can be used in the teaching environment to predict and examine the development outcomes of SL students in verbal reasoning and vocabulary tasks. Hundred and six Portuguese participants, SL learners, between 8 and 17 years old, were assessed in vocabulary frequency, verbal analogies and morphological extraction tasks. In alphabetic languages (Romance languages), immigrant students (in a SL learning situation) with a strong linguistic distance (a home language with a very different orthographic foundation) are expected to struggle in language learning in spite of being aware of strategies that can improve their skills. The storage and combination of morphemes can be a demanding task for individual speakers at different levels. Cognitive mapping is strongly based on linguistic features of L1 development. Results show that home language, not age, was a significant predictor of variation in student’s outcomes. Speakers of alphasyllabary languages (Indo-Aryan languages as L1) were the poorest performers, the ‘linguistic distance’ of their languages explaining the performance’ result

Camões - Repositório Institucional da Universidade Autónoma de Lisboa

In search of isoglosses: continuous and discrete language embeddings in Slavic historical phonology

Author: Cathcart Chundra A.
Wandl Florian
Publication venue
Publication date: 01/01/2020
Field of study

This paper investigates the ability of neural network architectures to effectively learn diachronic phonological generalizations in a multilingual setting. We employ models using three different types of language embedding (dense, sigmoid, and straight-through). We find that the Straight-Through model outperforms the other two in terms of accuracy, but the Sigmoid model's language embeddings show the strongest agreement with the traditional subgrouping of the Slavic languages. We find that the Straight-Through model has learned coherent, semi-interpretable information about sound change, and outline directions for future research

arXiv.org e-Print Archive

Crossref

ZORA

Etyma for 'chicken', 'duck', and 'goose' among language phyla in China and Southeast Asia

Author: Alves Mark J
Publication venue: Asia-Pacific Linguistics
Publication date: 01/01/2015
Field of study

This paper considers the history of words for domesticated poultry, including ‘chicken’, ‘goose’, and ‘duck’, in China and mainland Southeast Asia to try to relate associated domestication events with specific language groups. Linguistic, archaeological and historical evidence supports Sinitic as one linguistic source, but in other cases, Tai and Austroasiatic form additional centers of lexical forms which were borrowed by neighboring phyla. It is hypothesized that these geographic regions of etyma for domesticated birds may represent instances of bird domestication, or possibly advances in bird husbandry, by speech communities in the region in the Neolithic Era, followed by spread of both words and cultural practices

The Australian National University

Recommended from our members

Comparative philology, French music, and the composition of Indo-Europeanism from Fétis to Messiaen.

Author: Asimov Peter
Publication venue: University of Cambridge
Publication date: 05/06/2020
Field of study

This thesis argues that the disciplines of comparative philology and linguistics exerted significant force on the priorities and techniques of musicologists and composers in fin-de-siècle France, and examines how ideologies of Indo-Europeanism (or aryanism), concomitant with comparative philology, generated efforts to ‘sound out’ Indo-Europeanism in music. Using a relational approach, dense interdisciplinary networks of philologists/linguists, musicologists, and composers are reconstructed to demonstrate how musicological appropriations of linguistic research reverberated in musical composition right through the 1950s. These contexts reveal how wide-ranging repertories emerged from ethnic-nationalist projects of reclaiming Indo-European ‘patrimony’. The thesis is in two Parts. Part I, ‘Philologie comparée, musicologie, and Indo-European hypotheses’, is organised around four overlapping intellectual networks comprising comparative philologists and musicologists. Francophone musicologists’ efforts to model their discipline on that of comparative philology are surveyed. Scholars discussed include Fétis, Gevaert, Bourgault-Ducoudray, Burnouf, Meillet, Aubry, Emmanuel, and Grosset. Arguments concerning the place of music between concepts of ‘language’ and ‘race’ are retraced, with special attention paid to musicologists’ efforts to pinpoint quasi-morphological ‘Indo-European’ musical structures – in particular, ‘modes’ and ‘metres’ – construed as ‘essential’ and ‘ancestral’. Part II, ‘Composing with philology: performances of authenticity and innovation’, describes how the intellectual project elaborated in Part I infiltrated compositional practices. Close musical and paratextual readings show how composers legitimated experimentalism through ‘performances’ of philological ‘authenticity’. Over time, musical parameters such as modes and metres are abstracted and assimilated into compositional lexicons. Composers discussed include Bourgault-Ducoudray, Saint-Saëns, Séverac, Roussel, and Emmanuel. This root system flourishes in the music of Olivier Messiaen, whose rhythmic technique is revisited in light of manuscript materials. From his borrowings of early Indian metres (deśītālas) through his hyperformalist ‘Mode de valeurs et d’intensités’, Messiaen’s rhythmic style is radically reinterpreted as a logical extension of francophone musicology’s disciplinary and epistemological inheritance from comparative philology.Gates Cambridge Scholarshi

Apollo (Cambridge)