Search CORE

9 research outputs found

SIGMORPHON 2021 Shared Task on Morphological Reinflection: Generalization Across Languages

Author: Pimentel Tiago
Ryskina Maria
Straughn Christopher
Publication venue: NEIU Digital Commons
Publication date: 01/01/2021
Field of study

This year’s iteration of the SIGMORPHON Shared Task on morphological reinflection focuses on typological diversity and cross-lingual variation of morphosyntactic features. In terms of the task, we enrich UniMorph with new data for 32 languages from 13 language families, with most of them being under-resourced: Kunwinjku, Classical Syriac, Arabic (Modern Standard, Egyptian, Gulf), Hebrew, Amharic, Aymara, Magahi, Braj, Kurdish (Central, Northern, Southern), Polish, Karelian, Livvi, Ludic, Veps, Võro, Evenki, Xibe, Tuvan, Sakha, Turkish, Indonesian, Kodi, Seneca, Asháninka, Yanesha, Chukchi, Itelmen, Eibela. We evaluate six systems on the new data and conduct an extensive error analysis of the systems’ predictions. Transformer-based models generally demonstrate superior performance on the majority of languages, achieving \u3e90% accuracy on 65% of them. The languages on which systems yielded low accuracy are mainly under-resourced, with a limited amount of data. Most errors made by the systems are due to allomorphy, honorificity, and form variation. In addition, we observe that systems especially struggle to inflect multiword lemmas. The systems also produce misspelled forms or end up in repetitive loops (e.g., RNN-based models). Finally, we report a large drop in systems’ performance on previously unseen lemmas

NEIU Digital Commons (Northeastern Illinois University)

SIGMORPHON 2021 Shared Task on Morphological Reinflection: Generalization Across Languages

Author: Aiton Grant
Ambridge Ben
Ataman Duygu
Ate Yustinus Ghanggo
Barta Botond
Bayyr-ool Aziyana
Bernardy Jean-Philippe
Chodroff Eleanor
Coler Matt
Cotterell Ryan
Ek Adam
El-Khaissi Charbel
Ganieva Sofya
Gasser Michael
Goldman Omer
Habash Nizar
Hatcher Richard J.
Hulden Mans
Ivanova Sardana
Khalifa Salam
Kieraś Witold
Klyachko Elena
Krizhanovsky Andrew
Krizhanovsky Natalia
Kumar Ritesh
Lakatos Dorina
Lane William
Leonard Brian
Liu Zoey
Mielke Sabrina J.
Montoya Samame Jaime Rafael
Nicolai Garett
Nuriah Zahroh
Oncevay Arturo
Pimentel Tiago
Plugaryov Matvey
Ponti Edoardo M.
Prud'hommeaux Emily
Raj Mohit
Ratan Shyam
Ryskina Maria
Salchak Aelita
Salehi Ali
Shcherbakov Andrey
Sheifer Karina
Silva Villegas Gema Celeste
Stoehr Niklas
Straughn Christopher
Suhardijanto Totok
Szolnok Gábor
Tyers Francis M.
Vania Clara
Vylomova Ekaterina
Washington Jonathan
Woliński Marcin
Wu Shijie
Yarowsky David
Ács Judit
Publication venue: The Association for Computational Linguistics
Publication date: 01/08/2021
Field of study

This year's iteration of the SIGMORPHON Shared Task on morphological reinflection focuses on typological diversity and cross-lingual variation of morphosyntactic features. In terms of the task, we enrich UniMorph with new data for 32 languages from 13 language families, with most of them being under-resourced: Kunwinjku, Classical Syriac, Arabic (Modern Standard, Egyptian, Gulf), Hebrew, Amharic, Aymara, Magahi, Braj, Kurdish (Central, Northern, Southern), Polish, Karelian, Livvi, Ludic, Veps, Võro, Evenki, Xibe, Tuvan, Sakha, Turkish, Indonesian, Kodi, Seneca, Asháninka, Yanesha, Chukchi, Itelmen, Eibela. We evaluate six systems on the new data and conduct an extensive error analysis of the systems' predictions. Transformer-based models generally demonstrate superior performance on the majority of languages, achieving >90% accuracy on 65% of them. The languages on which systems yielded low accuracy are mainly under-resourced, with a limited amount of data. Most errors made by the systems are due to allomorphy, honorificity, and form variation. In addition, we observe that systems especially struggle to inflect multiword lemmas. The systems also produce misspelled forms or end up in repetitive loops (e.g., RNN-based models). Finally, we report a large drop in systems' performance on previously unseen lemmas.Peer reviewe

Edinburgh Research Explorer

Helsingin yliopiston digitaalinen arkisto

The Universal Morphology (UniMorph) project is a collaborative effort providing broad-coverage instantiated normalized morphological inflection tables for hundreds of diverse world languages. The project comprises two major thrusts: a language-independent feature schema for rich morphological annotation and a type-level resource of annotated data in diverse languages realizing that schema. This paper presents the expansions and improvements made on several fronts over the last couple of years (since McCarthy et al. (2020)). Collaborative efforts by numerous linguists have added 67 new languages, including 30 endangered languages. We have implemented several improvements to the extraction pipeline to tackle some issues, e.g. missing gender and macron information. We have also amended the schema to use a hierarchical structure that is needed for morphological phenomena like multiple-argument agreement and case stacking, while adding some missing morphological features to make the schema more inclusive. In light of the last UniMorph release, we also augmented the database with morpheme segmentation for 16 languages. Lastly, this new release makes a push towards inclusion of derivational morphology in UniMorph by enriching the data and annotation schema with instances representing derivational processes from MorphyNet

Proceedings - University of Groningen

University of Groningen

ARTS repository - University of Groningen

Dissertations of the University of Groningen

UniMorph 4.0:Universal Morphology

Author: Aiton Grant
Anastasopoulos Antonios
Andrushko Taras
Angulo Candy
Arora Aryaman
Ataman Duygu
Ate Yustinus Ghanggo
Batsuren Khuyagbaatar
Bautista Juan López
Baxi Jatayu
Bayyr-ool Aziyana
Bella Gábor
Bernardy Jean-Philippe
Bhatt Brijesh
Budianskaya Elena
Camaiteri Delio Siticonatzi
Chodroff Eleanor
Coler Matt
Cotterell Ryan
Cruz Hilaria
Czarnowska Paula
Dirix Peter
Dolatian Hossep
Ek Adam
El-Khaissi Charbel
Francis Didier López
Ganieva Sofya
Gasser Michael
Giunchiglia Fausto
Goldman Omer
Gorman Kyle
Guriel David
Habash Nizar
Hatcher Richard J.
Hennigen Lucas Torroba
Hulden Mans
Ivanova Sardana
Karahóǧa Ritván
Khalifa Salam
Kieraś Witold
Klyachko Elena
Krizhanovskaya Natalia
Krizhanovsky Andrew
Kumar Ritesh
Lane William
Leonard Brian
Liu Zoey
Marchenko Igor
Markantonatou Stella
Mashkovtseva Polina
Maudslay Rowan Hall
McCarthy Arya D.
Mielke Sabrina J.
Nepomniashchaya Maria
Nicolai Garrett
Nikkarinen Irene
Nuriah Zahroh
Oncevay Arturo
Pavlidis George
Pimentel Tiago
Pinter Yuval
Plugaryov Matvey
Ponti Edoardo M.
Prud'hommeaux Emily
Raj Mohit
Ratan Shyam
Rodionova Daria
Rojas Esaú Zumaeta
Ryskina Maria
Salchak Aelita
Salehi Ali
Salesky Elizabeth
Samame Jaime Rafael Montoya
Scherbakov Andrey
Serova Alexandra
Sheifer Karina
Silfverberg Miikka
Stoehr Niklas
Straughn Christopher
Suhardijanto Totok
Tsarfaty Reut
Tyers Francis M.
Valvoda Josef
Vania Clara
Villegas Gema Celeste Silva
Vylomova Ekaterina
Washington Jonathan North
White Jennifer
Wolinski Marcin
Yablonskaya Anna
Yarowsky David
Yemelina Anastasia
Young Jeremiah
Zariquiey Roberto
Zmigrod Ran
Publication venue: 'Center for Open Science'
Publication date: 07/05/2022
Field of study

University of Groningen

Entanglements of digital technologies and Indigenous language work in the Northern Territory

Author: Bow Cathy
Publication venue
Publication date: 01/01/2021
Field of study

This thesis addresses the question of what happens when digital language resources are developed and become entangled with different types of language work in Indigenous languages of Australia's Northern Territory. It explores three specific sociotechnical assemblages, defined as heterogeneous sets of social and technical resources functioning together for various purposes. The types of language work that emerged were the role of language in practices of documentation, pedagogy and identity-making. The three projects under consideration respond to different motivations: the Living Archive of Aboriginal Languages is a digital archive of endangered literature in languages of the Northern Territory, motivated by a concern for the fate of materials produced in bilingual education programs in remote schools. The Digital Language Shell is a resource for developing and mobilising curricula in Indigenous languages and cultures, motivated by a need for a low-cost and low-tech template for sharing content under Indigenous authority. The Bininj Kunwok online course is a specific implementation of the Digital Language Shell, teaching an Indigenous language of West Arnhem land in a university context. Each project was created by the author working collaboratively with different teams, to support various types of language work. This PhD by publication offers a set of seven academic papers, each focusing on different aspects of the projects, and written for distinct audiences. The methods entailed iterative inquiry, as I reflected on my work as project manager in developing these digital resources, first addressing the technical and practical considerations, then through the lenses of various academic disciplines, and finally in a meta-analysis of the various heterogeneous elements that make up the research. The thesis emerges as an assemblage of heterogeneities – projects, papers, concepts, academic references, and auto-ethnographic stories – that is in itself a sociotechnical assemblage

The Australian National University

Polysynthetic sociolinguistics: the language and culture of Murrinh Patha Youth

Author: Mansfield John Basil
Publication venue
Publication date: 01/01/2014
Field of study

This thesis is about the life and language of kardu kigay – young Aboriginal men in the town of Wadeye, northern Australia. Kigay have attained some notoriety within Australia for their participation in “heavy metal gangs”, which periodically cause havoc in the town. But within Australianist linguistics circles, they are additionally known for speaking Murrinh Patha, a polysynthetic language that has a number of unique grammatical structures, and which is one of the few Aboriginal languages still being learnt by children. My core interest is to understand how people’s lives shape their language, and how their language shapes their lives. In this thesis these interests are focused around the following research goals: (1) To document the social structures of kigay’s day-‐to-‐day lives, including the subcultural “metal gang” dimension of their sociality; (2) To document the language that kigay speak, focusing in particular in aspects of their speech that differ from what has been documented in previous descriptions of Murrinh Patha; (3) To analyse which features of kigay speech might be socially salient linguistic markers, and which are more likely to reflect processes of grammatical change that run below the level of social or cognitive salience; (4) To analyse how kigay speech compares to other youth Aboriginal language varieties documented in northern Australia, and argue that together these can be described as a phenomenon of linguistic urbanisation. I will show that the “heavy metal gangs” are an idiosyncratic local subculture that uses foreign heavy metal bands as group totems. Social connections and loyalties are formed on the basis of peer solidarity, as opposed to the traditional iv totemic system, which is structured around ancestry. Lives are now shaped by the dense (and often conflict-‐riven) town environment, as opposed to bush life, which was inseparable from the land. Kigay’s in-‐group language is a “slang” variety of Murrinh Patha (MP), which deploys new words and phrases by borrowing and reinterpreting English vocabulary. It is also characterised by substantial lenitions and deletions in the pronunciation. The MP grammatical system still underlies this speech, but some of its more complex morphosyntactic forms are restricted to the “heavy” speech of older people, and there are various mergers and reconfigurations occurring in the verb morphology. This thesis adds to the growing body of work describing how language contact and changing sociolinguistic dynamics are radically restructuring the linguistic repertoire of Aboriginal communities in northern and central Australia. At the same time, it is one of very few studies providing sociolinguistic description of a polysynthetic language, and is therefore an innovative study in polysynthetic sociolinguistics

The Australian National University