1,708 research outputs found

    Multilingual Neural Machine Translation System for Indic to Indic Languages

    Full text link
    This paper gives an Indic-to-Indic (IL-IL) MNMT baseline model for 11 ILs implemented on the Samanantar corpus and analyzed on the Flores-200 corpus. All the models are evaluated using the BLEU score. In addition, the languages are classified under three groups namely East Indo- Aryan (EI), Dravidian (DR), and West Indo-Aryan (WI). The effect of language relatedness on MNMT model efficiency is studied. Owing to the presence of large corpora from English (EN) to ILs, MNMT IL-IL models using EN as a pivot are also built and examined. To achieve this, English- Indic (EN-IL) models are also developed, with and without the usage of related languages. Results reveal that using related languages is beneficial for the WI group only, while it is detrimental for the EI group and shows an inconclusive effect on the DR group, but it is useful for EN-IL models. Thus, related language groups are used to develop pivot MNMT models. Furthermore, the IL corpora are transliterated from the corresponding scripts to a modified ITRANS script, and the best MNMT models from the previous approaches are built on the transliterated corpus. It is observed that the usage of pivot models greatly improves MNMT baselines with AS-TA achieving the minimum BLEU score and PA-HI achieving the maximum score. Among languages, AS, ML, and TA achieve the lowest BLEU score, whereas HI, PA, and GU perform the best. Transliteration also helps the models with few exceptions. The best increment of scores is observed in ML, TA, and BN and the worst average increment is observed in KN, HI, and PA, across all languages. The best model obtained is the PA-HI language pair trained on PAWI transliterated corpus which gives 24.29 BLEU.Comment: 38 pages, 2 figure

    On the scope of the referential hierarchy in the typology of grammatical relations

    Get PDF
    In the late seventies, Bernard Comrie was one of the first linguists to explore the effects of the referential hierarchy (RH) on the distribution of grammatical relations (GRs). The referential hierarchy is also known in the literature as the animacy, empathy or indexibability hierarchy and ranks speech act participants (i.e. first and second person) above third persons, animates above inanimates, or more topical referents above less topical referents. Depending on the language, the hierarchy is sometimes extended by analogy to rankings of possessors above possessees, singulars above plurals, or other notions. In his 1981 textbook, Comrie analyzed RH effects as explaining (a) differential case (or adposition) marking of transitive subject ("A") noun phrases in low RH positions (e.g. inanimate or third person) and of object ("P") noun phrases in high RH positions (e.g. animate or first or second person), and (b) hierarchical verb agreement coupled with a direct vs. inverse distinction, as in Algonquian (Comrie 1981: Chapter 6)

    Exploring SL Writing and SL Sensitivity during Writing Tasks : poor and advanced writing in a context of second language other than English

    Get PDF
    This study integrates a larger research empirical project that examines second language (SL) learners’ profiles and valid procedures to perform complete and diagnostic assessment in schools. 102 learners of Portuguese as a SL aged 7 and 17 years speakers of distinct home languages were assessed in several linguistic tasks. In this article, we focused on writing performance in the specific task of narrative essay composition. The written outputs were measured using the score in six components adapted from an English SL assessment context (Alberta Education): linguistic vocabulary, grammar, syntax, strategy, socio-linguistic, and discourse. The writing processes and strategies in Portuguese language used by different immigrant students were analysed to determine features and diversity of deficits on authentic texts performed by SL writers. Differentiated performance was based on the diversity of the following variables: grades, previous schooling, home language, instruction in first language, and exposure to Portuguese as Second Language. Indo-Aryan languages speakers showed low writing scores compared to their peers and the type of language and respective cognitive mapping (such as Mandarin and Arabic) was the predictor, not linguistic distance. Home language instruction should also be prominently considered in further research to understand specificities of cognitive academic profile in a Romance languages learning context. Additionally, this study also examined the teachers representations that will be here addressed to understand educational implications of second language teaching in psychological distress of different minorities in schools of specific host countries.info:eu-repo/semantics/publishedVersio

    Second language education context and home language effect: language dissimilarities and variation in immigrant students’ outcomes.

    Get PDF
    Heritage language speakers struggle in European classrooms with insufficient material provided for second language (SL) learning and assessment. Considering the amount of instruments and pertinent studies in English SL, immigrant students are better prepared than their peers in Romance language settings. This study investigates how factors such as age and home language can be used in the teaching environment to predict and examine the development outcomes of SL students in verbal reasoning and vocabulary tasks. Hundred and six Portuguese participants, SL learners, between 8 and 17 years old, were assessed in vocabulary frequency, verbal analogies and morphological extraction tasks. In alphabetic languages (Romance languages), immigrant students (in a SL learning situation) with a strong linguistic distance (a home language with a very different orthographic foundation) are expected to struggle in language learning in spite of being aware of strategies that can improve their skills. The storage and combination of morphemes can be a demanding task for individual speakers at different levels. Cognitive mapping is strongly based on linguistic features of L1 development. Results show that home language, not age, was a significant predictor of variation in student’s outcomes. Speakers of alphasyllabary languages (Indo-Aryan languages as L1) were the poorest performers, the ‘linguistic distance’ of their languages explaining the performance’ result

    In search of isoglosses: continuous and discrete language embeddings in Slavic historical phonology

    Full text link
    This paper investigates the ability of neural network architectures to effectively learn diachronic phonological generalizations in a multilingual setting. We employ models using three different types of language embedding (dense, sigmoid, and straight-through). We find that the Straight-Through model outperforms the other two in terms of accuracy, but the Sigmoid model's language embeddings show the strongest agreement with the traditional subgrouping of the Slavic languages. We find that the Straight-Through model has learned coherent, semi-interpretable information about sound change, and outline directions for future research

    Etyma for 'chicken', 'duck', and 'goose' among language phyla in China and Southeast Asia

    Get PDF
    This paper considers the history of words for domesticated poultry, including ‘chicken’, ‘goose’, and ‘duck’, in China and mainland Southeast Asia to try to relate associated domestication events with specific language groups. Linguistic, archaeological and historical evidence supports Sinitic as one linguistic source, but in other cases, Tai and Austroasiatic form additional centers of lexical forms which were borrowed by neighboring phyla. It is hypothesized that these geographic regions of etyma for domesticated birds may represent instances of bird domestication, or possibly advances in bird husbandry, by speech communities in the region in the Neolithic Era, followed by spread of both words and cultural practices
    • 

    corecore