Search CORE

21 research outputs found

A Computational Model for the Linguistic Notion of Morphological Paradigm

Author: Hulden Mans
Liu Ling
Silfverberg Miikka
Publication venue: The Association for Computational Linguistics
Publication date: 01/01/2018
Field of study

Peer reviewe

Helsingin yliopiston digitaalinen arkisto

Read my points:Effect of animation type when speech-reading from EMA data

Author: James Kristy
Wieling Martijn
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2016
Field of study

ARTS repository - University of Groningen

Counting the Bugs in ChatGPT's Wugs: A Multilingual Investigation into the Morphological Capabilities of a Large Language Model

Author: Cai Anna
Dutt Ritam
Hengle Amey
Hofmann Valentin
Kabra Anubha
Kantharuban Anjali
Kulkarni Atharva
Mortensen David R.
Oflazer Kemal
Schütze Hinrich
Vijayakumar Abhishek
Weissweiler Leonie
Yu Haofei
Publication venue
Publication date: 26/10/2023
Field of study

Large language models (LLMs) have recently reached an impressive level of linguistic capability, prompting comparisons with human language skills. However, there have been relatively few systematic inquiries into the linguistic capabilities of the latest generation of LLMs, and those studies that do exist (i) ignore the remarkable ability of humans to generalize, (ii) focus only on English, and (iii) investigate syntax or semantics and overlook other capabilities that lie at the heart of human language, like morphology. Here, we close these gaps by conducting the first rigorous analysis of the morphological capabilities of ChatGPT in four typologically varied languages (specifically, English, German, Tamil, and Turkish). We apply a version of Berko's (1958) wug test to ChatGPT, using novel, uncontaminated datasets for the four examined languages. We find that ChatGPT massively underperforms purpose-built systems, particularly in English. Overall, our results -- through the lens of morphology -- cast a new light on the linguistic capabilities of ChatGPT, suggesting that claims of human-like language skills are premature and misleading.Comment: EMNLP 202

arXiv.org e-Print Archive

UniMorph 4.0:Universal Morphology

Author: Aiton Grant
Anastasopoulos Antonios
Andrushko Taras
Angulo Candy
Arora Aryaman
Ataman Duygu
Ate Yustinus Ghanggo
Batsuren Khuyagbaatar
Bautista Juan López
Baxi Jatayu
Bayyr-ool Aziyana
Bella Gábor
Bernardy Jean-Philippe
Bhatt Brijesh
Budianskaya Elena
Camaiteri Delio Siticonatzi
Chodroff Eleanor
Coler Matt
Cotterell Ryan
Cruz Hilaria
Czarnowska Paula
Dirix Peter
Dolatian Hossep
Ek Adam
El-Khaissi Charbel
Francis Didier López
Ganieva Sofya
Gasser Michael
Giunchiglia Fausto
Goldman Omer
Gorman Kyle
Guriel David
Habash Nizar
Hatcher Richard J.
Hennigen Lucas Torroba
Hulden Mans
Ivanova Sardana
Karahóǧa Ritván
Khalifa Salam
Kieraś Witold
Klyachko Elena
Krizhanovskaya Natalia
Krizhanovsky Andrew
Kumar Ritesh
Lane William
Leonard Brian
Liu Zoey
Marchenko Igor
Markantonatou Stella
Mashkovtseva Polina
Maudslay Rowan Hall
McCarthy Arya D.
Mielke Sabrina J.
Nepomniashchaya Maria
Nicolai Garrett
Nikkarinen Irene
Nuriah Zahroh
Oncevay Arturo
Pavlidis George
Pimentel Tiago
Pinter Yuval
Plugaryov Matvey
Ponti Edoardo M.
Prud'hommeaux Emily
Raj Mohit
Ratan Shyam
Rodionova Daria
Rojas Esaú Zumaeta
Ryskina Maria
Salchak Aelita
Salehi Ali
Salesky Elizabeth
Samame Jaime Rafael Montoya
Scherbakov Andrey
Serova Alexandra
Sheifer Karina
Silfverberg Miikka
Stoehr Niklas
Straughn Christopher
Suhardijanto Totok
Tsarfaty Reut
Tyers Francis M.
Valvoda Josef
Vania Clara
Villegas Gema Celeste Silva
Vylomova Ekaterina
Washington Jonathan North
White Jennifer
Wolinski Marcin
Yablonskaya Anna
Yarowsky David
Yemelina Anastasia
Young Jeremiah
Zariquiey Roberto
Zmigrod Ran
Publication venue: 'Center for Open Science'
Publication date: 07/05/2022
Field of study

The Universal Morphology (UniMorph) project is a collaborative effort providing broad-coverage instantiated normalized morphological inflection tables for hundreds of diverse world languages. The project comprises two major thrusts: a language-independent feature schema for rich morphological annotation and a type-level resource of annotated data in diverse languages realizing that schema. This paper presents the expansions and improvements made on several fronts over the last couple of years (since McCarthy et al. (2020)). Collaborative efforts by numerous linguists have added 67 new languages, including 30 endangered languages. We have implemented several improvements to the extraction pipeline to tackle some issues, e.g. missing gender and macron information. We have also amended the schema to use a hierarchical structure that is needed for morphological phenomena like multiple-argument agreement and case stacking, while adding some missing morphological features to make the schema more inclusive. In light of the last UniMorph release, we also augmented the database with morpheme segmentation for 16 languages. Lastly, this new release makes a push towards inclusion of derivational morphology in UniMorph by enriching the data and annotation schema with instances representing derivational processes from MorphyNet

Proceedings - University of Groningen

University of Groningen

ARTS repository - University of Groningen

Dissertations of the University of Groningen

Read my points: Effect of animation type when speech-reading from EMA data

Author: James Kristy
Wieling Martijn
Publication venue: Association for Computational Linguistics (ACL)
Publication date: 01/01/2016
Field of study

Three popular vocal-tract animation paradigms were tested for intelligibility when displaying videos of pre-recorded Electromagnetic Articulography (EMA) data in an online experiment. EMA tracks the position of sensors attached to the tongue. The conditions were dots with tails (where only the coil location is presented), 2D animation (where the dots are connected to form 2D representations of the lips, tongue surface and chin), and a 3D model with coil locations driving facial and tongue rigs. The 2D animation (recorded in VisArtico) showed the highest identification of the prompts

Proceedings - University of Groningen

University of Groningen

ARTS repository - University of Groningen

Dissertations of the University of Groningen