Charles University

LINDAT/CLARIN digital library at the Institute of Formal and Applied Linguistics (ÚFAL), Faculty of Mathematics and Physics, Charles University

CEC6-Converter (2025-05-29)

Author: Rüdiger Jan Oliver
Publication venue: Rüdiger, Jan Oliver
Publication date: 29/05/2025
Field of study

Diese Software erlaubt eine Konvertierung von *.cec6-Dateien in 24 Formate, die in der Korpuslinguistik / NLProc üblich sind. Die Ausführung ist unter allen modernen Betriebssystemen möglich (Windows, Linux, MacOS). Die Binärdateien wurden für die x64-Architektur kompiliert. Sollten Sie einen Prozessor (CPU) verwenden, der eine x86- oder ARM-Architektur hat, dann nutzen Sie bitte die Anleitung: andere Betriebssysteme bzw. x86 / ARM / ARM64. --- This software allows the conversion of *.cec6 files into 24 formats that are commonly used in corpus linguistics / NLProc. Execution is possible under all modern operating systems (Windows, Linux, MacOS). The binary files have been compiled for the x64 architecture. If you are using a processor (CPU) with x86 or ARM architecture, please use the instructions for "other operating systems or x86 / ARM / ARM64"

Uniform Meaning Representation 2.0

Author: Bonn Julia
Bonial Claire
Buchholz Matt
Cheng Hsiao-Jung
Chen Alvin
Chen Ching-wen
Cowell Andrew
Croft William
Denk Lukas
Elsayed Ahmed
Fučíková Eva
Gamba Federica
Gomez Carlos
Hajič Jan
Hajičová Eva
Havelka Jiří
Havenmeier Loden
Kilgore Ath
Kolářová Veronika
Kučová Lucie
Lai Kenneth
Li Bin
Li Jingyi
Lopatková Markéta
MacGregor Marie
Mikulová Marie
Mírovský Jiří
Nedoluzhko Anna
Myers Skatje
Novák Michal
O’Gorman Tim
Pajas Petr
Palmer Alexis
Palmer Martha
Panevová Jarmila
Post Benét
Pustejovsky James
Sgall Petr
Song Jialin
Song Li
Ševčíková Magda
Štěpánek Jan
Urešová Zdeňka
Sun Haibo
Sun Yao
Vallejos Yopán Rosa
Van Gysel Jens
Vigus Meagan
Wright‑Bettner Kristin
Wu Jiawei
Xue Nianwen
Xing Dan
Xu Keer
Xu Zhixing
Yue Liulu
Zeman Daniel
Zhao Jin
Zikánová Šárka
Žabokrtský Zdeněk
Publication venue: UMR Consortium
Publication date: 17/05/2025
Field of study

The goal of the Uniform Meaning Representation (UMR) project is to design a meaning representation that can be used to annotate the semantic content of a text. UMR is primarily based on Abstract Meaning Representation (AMR), an annotation framework initially designed for English, but also draws from other meaning representations. UMR extends AMR to other languages, particularly morphologically complex, low-resource languages. UMR also adds features to AMR that are critical to semantic interpretation and enhances AMR by proposing a companion document-level representation that captures linguistic phenomena such as coreference as well as temporal and modal dependencies that potentially go beyond sentence boundaries. UMR is intended to be scalable, learnable, and cross-linguistically plausible. It is designed to support both lexical and logical inference

EduPo: Analysis and Generation of Czech Poetry, v0.5

Author: Rosa Rudolf
Musil Tomáš
Mareček David
Chudoba Michal
Landsperský Jakub
Plecháč Petr
Dosoudil Jiří
Publication venue: Institute of Czech Literature, Czech Academy of Sciences
Publication date: 17/03/2025
Field of study

A suite of tools for analysis and generation of Czech poetry. This is a snapshot of the public Github repository at https://github.com/ufal/edupo -- the beta-version of the tool suite, released together with a scientific paper at the NLP4DH 2025 conference. Sada nástrojů pro analýzu a generování české poezie. Tato verze veřejného repozitáře na Githubu https://github.com/ufal/edupo je beta-verzí doprovázející vydání vědeckého článku na konferenci NLP4DH 2025

Diadem Speech-Cognitive Dataset (DSCD-CZ)

Author: Šmídl Luboš
Krejčová Marie
Zapletalová Michaela
Polák Filip
Zajícová Lucie
Švec Jan
Víta Martin
Bartoš Aleš
Publication venue: Institute of Physics, Czech Academy of Sciences
Publication date: 29/05/2025
Field of study

The dataset was created to investigate the speech and cognitive performance of people with varying degrees of cognitive impairment, primarily dementia. The dataset contains a comprehensive set of data including the results of standardized neuropsychological tests (RBANS, ALBA, POBAV, MASTCZ), speech tasks focused on comprehension, memory, naming, and repetition, and demographic data (age, gender, education). Participants were divided into four groups based on clinical assessment: healthy individuals, healthy individuals with possible mild cognitive impairment, patients with mild cognitive impairment, and patients with dementia. All recordings and examinations were managed as part of routine clinical practice in the neurological outpatient clinic – Memory Disorders Advisory Unit, at the Neurological Clinic of the Faculty Hospital Královské Vinohrady. The dataset containing 268 examinations was divided into a training and test part using stratification by clinical group, age, gender, and level of education to ensure an even distribution of these key characteristics in both parts of the data. The aim of the dataset is to support the development of methods for automated detection of cognitive disorders based on speech analysis and cognitive performance. The data are suitable for research in the areas of clinical neuropsychology, computational linguistics, and machine learning. The dataset is intended for non-commercial research purposes

LatinISE corpus (version 5)

Author: McGillivray Barbara
Publication venue: Lexical Computing
Publication date: 25/03/2025
Field of study

The LatinISE corpus is a text corpus collected from the LacusCurtius, Intratext and Musisque Deoque websites. Corpus texts have rich metadata containing information as genre, title, century or specific date. This Latin corpus was built by Barbara McGillivray. In the version 5 of the corpus the author names and datings of texts before 600 CE have been manually corrected and duplicates of texts have been removed. Thanks to Valentina Lunardi for this data curation

Coreference in Universal Dependencies 1.3 (CorefUD 1.3)

Author: Novák Michal
Popel Martin
Zeman Daniel
Žabokrtský Zdeněk
Nedoluzhko Anna
Acar Kutay
Bamman David
Bourgonje Peter
Cinková Silvie
Eckhoff Hanne
Cebiroğlu Eryiğit Gülşen
Hajič Jan
Hardmeier Christian
Haug Dag
Jørgensen Tollef
Kåsen Andre
Krielke Pauline
Landragin Frédéric
Lapshinova-Koltunski Ekaterina
Mæhlum Petter
Martí M. Antònia
Mikulová Marie
Milintsevich Kirill
Mujadia Vandan
Muzerelle Judith
Nam Sangha
Nøklestad Anders
Ogrodniczuk Maciej
Øvrelid Lilja
Pamay Arslan Tuğba
Porada Ian
Recasens Marta
Solberg Per Erik
Stede Manfred
Straka Milan
Swanson Daniel
Toldova Svetlana
Vadász Noémi
Velldal Erik
Vincze Veronika
Zeldes Amir
Žitkus Voldemaras
Publication venue: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Publication date: 17/04/2025
Field of study

CorefUD is a collection of previously existing datasets annotated with coreference, which we converted into a common annotation scheme. In total, CorefUD in its current version 1.3 consists of 28 datasets for 18 languages. The datasets are enriched with automatic morphological and syntactic annotations that are fully compliant with the standards of the Universal Dependencies project. All the datasets are stored in the CoNLL-U format, with coreference- and bridging-specific information captured by attribute-value pairs located in the MISC column. The collection is divided into a public edition and a non-public (ÚFAL-internal) edition. The publicly available edition is distributed via LINDAT-CLARIAH-CZ and contains 24 datasets for 17 languages (1 dataset for Ancient Greek, 1 for Ancient Hebrew, 1 for Catalan, 2 for Czech, 3 for English, 2 for French, 2 for German, 1 for Hindi, 2 for Hungarian, 1 for Korean, 1 for Lithuanian, 2 for Norwegian, 1 for Old Church Slavonic, 1 for Polish, 1 for Russian, 1 for Spanish, and 1 for Turkish), excluding the test data. The non-public edition is available internally to ÚFAL members and contains additional 4 datasets for 2 languages (1 dataset for Dutch, and 3 for English), which we are not allowed to distribute due to their original license limitations. It also contains the test data portions for all datasets. When using any of the harmonized datasets, please get acquainted with its license (placed in the same directory as the data) and cite the original data resource too. Compared to the previous version 1.2, the version 1.3 comprises new languages and corpora, namely French-ANCOR, Hindi-HDTB, and Korean-ECMT. In addition, English-GUM and Czech-PDT have been updated to newer versions and conversion of zeros in Hungarian-KorKor has been improved (a list of all changes in each dataset can be found in the corresponding README file)

NameTag 3 Multilingual Model 250203

Author: Straková Jana
Straka Milan
Publication venue: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Publication date: 03/02/2025
Field of study

This is a trained model for the supervised machine learning tool NameTag 3 (https://ufal.mff.cuni.cz/nametag/3/). NameTag 3 is an open-source tool for both flat and nested named entity recognition (NER). NameTag 3 identifies proper names in text and classifies them into a set of predefined categories, such as names of persons, locations, organizations, etc. The model was trained jointly on 21 flat NE corpora of 17 languages: Arabic, Chinese, Croatian, Czech, Danish, Dutch, English, German, Maghrebi Arabic French, Norwegian Bokmaal, Norwegian Nynorsk, Portuguese, Serbian, Slovak, Spanish, Swedish, and Ukrainian. The model documentation can be found at https://ufal.mff.cuni.cz/nametag/3/models#multilingual

Czech Etymological Lexicon 1.0

Author: Rejzek Jiří
Papáček Aleš
Brezinová Viktória
Žabokrtský Zdeněk
Publication venue: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Publication date: 28/01/2025
Field of study

The Czech Etymological Lexicon, version 1.0, contains 10,502 Czech words, each annotated with a sequence of ISO 639-3 language codes representing its etymological origin. The dataset is provided in a simple tab-separated format, with the first column containing the lemma and the second listing the language codes separated by commas. Example entry: architekt deu,lat,ell loan The word architekt originated from Greek, and came to Czech through Latin and German. The third column indicates whether the word is a loanword (marked as "loan") or a native word (marked as "native"). Note that "native" refers to inherited words as opposed to loanwords. The language sequences were extracted from the printed dictionary REJZEK, Jiří. Český etymologický slovník [Czech etymological dictionary]. LEDA, 2015. The extraction of language sequences from the entries in the original dictionary was fully automated and, therefore, may contain imperfections. Please refer to the original dictionary for highly precise information

Debiasing Algorithm through Model Adaptation

Author: Limisiewicz Tomasz
Mareček David
Musil Tomáš
Publication venue: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Publication date: 31/01/2025
Field of study

Debiasing Algorithm through Model Adaptation (DAMA) is based on guarding stereotypical gender signals and model editing. DAMA is performed on specific modules prone to convey gender bias, as shown by causal tracing. Our novel method effectively reduces gender bias in LLaMA models in three diagnostic tests: generation, coreference (WinoBias), and stereotypical sentence likelihood (StereoSet). The method does not change the model’s architecture, parameter count, or inference cost. We have also shown that the model’s performance in language modeling and a diverse set of downstream tasks is almost unaffected. This package contains both the source codes and English, English-to-Czech, and English-to-German datasets

Universal Dependencies 2.16

Author: Zeman Daniel
Nivre Joakim
Abrams Mitchell
Ackermann Elia
Adolphe Jephtey
Aepli Noëmi
Aghaei Hamid
Agić Željko
Ahmadi Amir
Ahrenberg Lars
Ajede Chika Kennedy
Akhundjanova Arofat
Akkurt Furkan
Aleksandravičiūtė Gabrielė
Alfina Ika
Algom Avner
Alnajjar Khalid
Alzetta Chiara
Anastasopoulos Antonios
Andersen Erik
Andrews Matthew
Antonsen Lene
Aoyama Tatsuya
Aplonova Katya
Aquino Angelina
Aragon Carolina
Aranes Glyd
Aranzabe Maria Jesus
Arıcan Bilge Nas
Arnardóttir Þórunn
Arutie Gashaw
Arwidarasti Jessica Naraiswari
Asahara Masayuki
Ásgeirsdóttir Katla
Aslan Deniz Baran
Asmazoğlu Cengiz
Ateyah Luma
Atmaca Furkan
Attia Mohammed
Atutxa Aitziber
Augustinus Liesbeth
Avelãs Mariana
Badmaeva Elena
Bajorat Jana
Balasubramani Keerthana
Ballesteros Miguel
Banerjee Esha
Bank Sebastian
Barbosa Bryan Khelven da Silva
Barbu Mititelu Verginica
Barkarson Starkaður
Basile Rodolfo
Basmov Victoria
Batchelor Colin
Bauer John
Bedir Seyyit Talha
Behzad Shabnam
Belieni Juan
Bémová Alevtina
Bengoetxea Kepa
Benli İbrahim
Ben Moshe Yifat
Benzerrak Marie
Berg Ansu
Berk Gözde
Bhat Riyaz Ahmad
Biagetti Erica
Bick Eckhard
Bielinskienė Agnė
Bilgin Taşdemir Esma Fatıma
Binici Helin
Bjarnadóttir Kristín
Blaschke Verena
Blokland Rogier
Böbel Nina
Bobicev Victoria
Boizou Loïc
Bompolas Stavros
Bonilla Johnatan
Borges Völker Emanuel
Börstell Carl
Bosco Cristina
Bouma Gosse
Bowman Sam
Boyd Adriane
Braggaar Anouck
Branco António
Bras Myriam
Brokaitė Kristina
Bu Lanni
Buráňová Eva
Burchardt Aljoscha
Cabeza Carmen
Cáceres Arandia Natalia
Campos Marisa
Candito Marie
Caron Bernard
Caron Gauthier
Carvalheiro Catarina
Carvalho Rita
Cassidy Lauren
Castro Maria Clara
Castro Sérgio
Cavalcanti Tatiana
Cebiroğlu Eryiğit Gülşen
Cecchini Flavio Massimiliano
Celano Giuseppe G. A.
Çepani Anila
Čéplö Slavomír
Cesur Neslihan
Cetin Savas
Çetinoğlu Özlem
Chalub Fabricio
Chamila Liyanage
Chamoreau Claudine
Chauhan Shweta
Chen Yifei
Chi Ethan
Chika Taishi
Cho Yongseok
Choi Jinho
Chontaeva Bermet
Chun Jayeol
Chung Juyeon
Cignarella Alessandra T.
Cinková Silvie
Collomb Aurélie
Çöltekin Çağrı
Connor Miriam
Corbetta Claudia
Corbetta Daniela
Costa Francisco
Courtin Marine
Crabbé Benoît
Cristescu Mihaela
Cvetkoski Vladimir
Dahan Netanel
Dale Ingerid Løyning
Daniel Philemon
Daoudi Khensa
Dash Bijayalaxmi
Dash Satya Ranjan
Davidson Elizabeth
de Alencar Leonel Figueiredo
Dehouck Mathieu
de Laurentiis Martina
de Marneffe Marie-Catherine
Demir Ahmet
de Paiva Valeria
Derin Mehmet Oguz
de Souza Elvis
Diaz de Ilarraza Arantza
Díaz Hernández Roberto Antonio
Dickerson Carly
Di Felippo Ariani
Dinakaramani Arawinda
Di Nuovo Elisa
Dione Bamba
Dirix Peter
Do Hoa
Dobrovoljc Kaja
Döhmer Caroline
Doyle Adrian
Dozat Timothy
Droganova Kira
Duran Magali Sanches
Dwivedi Puneet
Ebert Christian
Eckhoff Hanne
Eguchi Masaki
Eiche Sandra
Eiselen Roald
Eli Marhaba
Elkahky Ali
Ephrem Binyam
Erina Olga
Erjavec Tomaž
Esher Louise
Eslami Soudabeh
Essaidi Farah
Etienne Aline
Evelyn Wograine
Facundes Sidney
Farkas Richárd
Faryad Ján
Favero Federica
Ferdaousi Jannatul
Fernanda Marília
Fernandez Alcalde Hector
Fethi Amal
Foster Jennifer
Francioni Barbara
Fransen Theodorus
Freitas Cláudia
Fujita Kazunori
Gajdošová Katarína
Galbraith Daniel
Galy Edith
Gamba Federica
Garcia Marcos
García-Miguel José María
Gärdenfors Moa
Gaustad Tanja
Genç Efe Eren
Gerardi Fabrício Ferraz
Gerdes Kim
Gessler Luke
Ginter Filip
Godoy Gustavo
Goenaga Iakes
Gojenola Koldo
Gökırmak Memduh
Goldberg Yoav
Goldin Gili
Gómez Guinovart Xavier
González Saavedra Berta
Griciūtė Bernadeta
Grioni Matias
Grobol Loïc
Grūzītis Normunds
Guillaume Bruno
Guiller Kirian
Guillot-Barbance Céline
Güngör Tunga
Gurevich Vladimir
Habash Nizar
Hafsteinsson Hinrik
Hahn Michael
Hajič Jan
Hajič jr. Jan
Hajičová Eva
Hämäläinen Mika
Hà Mỹ Linh
Han Na-Rae
Hanifmuti Muhammad Yudistira
Harada Takahiro
Hardwick Sam
Harris Kim
Hassert Naïma
Haug Dag
Havelka Jiří
Heinecke Johannes
Hellwig Oliver
Hennig Felix
Hladká Barbora
Hlaváčová Jaroslava
Hociung Florinel
Hoefels Diana
Hohle Petter
Howell Nick
Huang Yidi
Huerta Mendez Marivel
Hwang Jena
Ikeda Takumi
Iliadou Inessa
Ingason Anton Karl
Ion Radu
Irimia Elena
Ishola Ọlájídé
Islamaj Artan
Ito Kaoru
Iurescia Federica
Ivani Jessica K.
Jagodzińska Sandra
Jannat Siratun
Jelínek Tomáš
Jha Apoorva
Jiang Katharine
Job Sylvanus
Jobanputra Mayank
Johannsen Anders
Jónsdóttir Hildur
Jørgensen Fredrik
Ju Zhuoxuan
Juutinen Markus
Kaşıkara Hüner
Kabaeva Nadezhda
Kahane Sylvain
Kanayama Hiroshi
Kanerva Jenna
Kara Neslihan
Karahóǧa Ritván
Kárník Jiří
Kåsen Andre
Kayadelen Tolga
Kengatharaiyer Sarveswaran
Kettnerová Václava
Kharatyan Lilit
Kirchner Jesse
Klementieva Elena
Klyachko Elena
Kocharov Petr
Köhn Arne
Köksal Abdullatif
Kolářová Veronika
Kopacewicz Kamil
Korkiakangas Timo
Köse Mehmet
Koshevoy Alexey
Kote Nelda
Kotsyba Natalia
Kovačić Barbara
Kovalevskaitė Jolanta
Kowner Emmanuelle
Krek Simon
Krishnamurthy Parameswari
Kübler Sandra
Kučová Lucie
Kuqi Adrian
Kuyrukçu Oğuzhan
Kuzgun Aslı
Kwak Sookyoung
Kyle Kris
Laan Käbi
Laippala Veronika
Lambertino Lorenzo
Landau Israel
Lando Tatiana
Larasati Septina Dian
Larrivée Pierre
Lavrentiev Alexei
Lee John
Lê Hồng Phương
Lenci Alessandro
Lertpradit Saran
Leung Herman
Levina Maria
Levine Lauren
Li Cheuk Ying
Li Josie
Li Keying
Li Yixuan
Li Yuan
Lim KyungTae
Lima Padovani Bruna
Lin Yi-Ju Jessica
Lindén Krister
Liu Yang Janet
Liu Zoey
Ljubešić Nikola
Lobzhanidze Irina
Loginova Olga
Lopatková Markéta
Lopes Lucelene
Luftiu Edita
Lukashevskyi Arsenii
Lusito Stefano
Lutgen Anne-Marie
Luthfi Andry
Luukko Mikko
Lyashevskaya Olga
Lynn Teresa
Macketanz Vivien
Mahamdi Menel
Maillard Jean
Makarchuk Ilya
Makazhanov Aibek
Mambrini Francesco
Mandl Michael
Manning Christopher
Manurung Ruli
Marşan Büşra
Mărănduc Cătălina
Mareček David
Marheinecke Katrin
Markantonatou Stella
Martínez Alonso Héctor
Martín Rodríguez Lorena
Martins André
Martins Cláudia
Mašek Jan
Matsuda Hiroshi
Matsumoto Yuji
Mazzei Alessandro
McDonald Ryan
McGuinness Sarah
Mehta Maitrey
Ménard Pierre André
Mendonça Gustavo
Merhav Hilla
Merzhevich Tatiana
Meurer Paul
Miekka Niko
Mikulová Marie
Milano Emilia
Miletić Aleksandra
Miller Aaron
Min Junghyun
Minerbi Yael
Mírovský Jiří
Mischenkova Karina
Missilä Anna
Mititelu Cătălin
Mitrofan Maria
Miyao Yusuke
Mohapatra Biswakalpita
Mojiri Foroushani AmirHossein
Molnár Judit
Moloodi Amirsaeid
Montemagni Simonetta
More Amir
Moreno Romero Laura
Moretti Giovanni
Mori Shinsuke
Morioka Tomohiko
Moro Shigeki
Mortensen Bjartur
Moskalevskyi Bohdan
Muischnek Kadri
Munro Robert
Murawaki Yugo
Mus Nikolett
Müürisep Kaili
Nainwani Pinkey
Nakhlé Mariam
Navarro Horñiacek Juan Ignacio
Nedoluzhko Anna
Nešpore-Bērzkalne Gunta
Nevaci Manuela
Nguyễn Thị Lương
Nguyễn Thị Minh Huyền
Nikaido Yoshihiro
Nikolaev Vitaly
Nitisaroj Rattima
Norrman Victor
Nourian Alireza
Novák Michal
Nunes Maria das Graças Volpe
Nurmi Hanna
Ojala Stina
Ojha Atul Kr.
Óladóttir Hulda
Olúòkun Adédayọ̀
Omura Mai
Onwuegbuzia Emeka
Ordan Noam
Osenova Petya
Östling Robert
Ott Annika
Øvrelid Lilja
Oya Masanori
Özateş Şaziye Betül
Özçelik Merve
Özgür Arzucan
Öztürk Başaran Balkız
Paccosi Teresa
Pajas Petr
Palmero Aprosio Alessio
Panevová Jarmila
Panova Anastasia
Pardo Thiago Alexandre Salgueiro
Parida Shantipriya
Park Hyunji Hayley
Partanen Niko
Pascual Elena
Passarotti Marco
Patejuk Agnieszka
Paulino-Passos Guilherme
Pedonese Giulia
Peeters Oggi
Peljak-Łapińska Angelika
Peng Siyao
Peng Siyao Logan
Pereira Rita
Pereira Sílvia
Perez Cenel-Augusto
Perkova Natalia
Perrier Guy
Petrov Slav
Petrova Daria
Peverelli Andrea
Phelan Jason
Pierre-Louis Claudel
Piitulainen Jussi
Pinter Yuval
Pinto Clara
Pintucci Rodrigo
Pirinen Tommi A
Pitler Emily
Plamada Magdalena
Plank Barbara
Plum Alistair
Poibeau Thierry
Ponomareva Larisa
Popel Martin
Poujade Clamença
Pretkalniņa Lauma
Pretorius Rigardt
Prévost Sophie
Prokopidis Prokopis
Przepiórkowski Adam
Pugh Robert
Puolakainen Tiina
Purschke Christoph
Pyysalo Sampo
Qi Peng
Querido Andreia
Rääbis Andriela
Rabinovich Ella
Rademaker Alexandre
Rahman Mutee-u
Rahoman Mizanur
Rama Taraka
Ramasamy Loganathan
Ramisch Carlos
Ramos Joana
Rashel Fam
Rasooli Mohammad Sadegh
Ravishankar Vinit
Real Livy
Rebeja Petru
Reddy Siva
Regnault Mathilde
Rehm Georg
Riabi Arij
Riabov Ivan
Rießler Michael
Rimkutė Erika
Rinaldi Larissa
Rituma Laura
Rizqiyah Putri
Rocha Luisa
Rögnvaldsson Eiríkur
Roksandic Ivan
Roman Norton Trevisan
Romanenko Mykhailo
Romanova Natalia
Rosa Rudolf
Roșca Valentin
Roulon Paulette
Rovati Davide
Rozonoyer Ben
Rudina Olga
Rueter Jack
Ruffolo Paolo
Rúnarsson Kristján
Rushiti Rozana
Sadde Shoval
Safari Pegah
Sahala Aleksi
Sahoo Kalyanamalini
Sahoo Saraswati
Saleh Shadi
Salomoni Alessio
Samardžić Tanja
Sampanis Konstantinos
Samson Stephanie
Sánchez-Rodríguez Xulia
Sanguinetti Manuela
Sanıyar Ezgi
Särg Dage
Sartor Marta
Sarymsakova Albina
Sasaki Mitsuya
Saulīte Baiba
Savary Agata
Sawanakunanon Yanin
Saxena Shefali
Scannell Kevin
Scarlata Salvatore
Schang Emmanuel
Schneider Nathan
Schuster Sebastian
Schwartz Lane
Seddah Djamé
Seeker Wolfgang
Sellmer Sven
Seraji Mojgan
Ševčíková Magda
Sgall Petr
Shahzadi Syeda
Shen Mo
Shimada Atsuko
Shin Gyu-Ho
Shirasu Hiroyuki
Shishkina Yana
Shohibussirri Muh
Shvedova Maria
Sibille Jean
Siewert Janine
Sigurðsson Einar Freyr
Silva João
Silveira Aline
Silveira Natalia
Silveira Sara
Simi Maria
Simionescu Radu
Simkó Katalin
Šimková Mária
Símonarson Haukur Barri
Simov Kiril
Sitchinava Dmitri
Sither Ted
Smith Aaron
Soares-Bastos Isabela
Solberg Per Erik
Sollberger Dolores
Sonnenhauser Barbara
Sourov Shafi
Speransky Nina
Sprugnoli Rachele
Stamou Vivian
Steingrímsson Steinþór
Stella Antonio
Štěpánek Jan
Štěpánková Barbora
Stephen Abishek
Straka Milan
Strass Omer
Strickland Emmett
Strnadová Jana
Suhr Alane
Sulestio Yogi Lesmana
Sulubacak Umut
Sung Hakyung
Suzuki Shingo
Swanson Daniel
Szántó Zsolt
Taguchi Chihiro
Taji Dima
Talamo Luigi
Tamburini Fabio
Tan Mary Ann C.
Tanaka Takaaki
Tanaya Dipta
Tavoni Mirko
Teker Nursena
Tella Samson
Tellier Isabelle
Testori Marinella
Thomas Guillaume
Tıraş Tarık Emre
Tollersrud Thea
Tonelli Sara
Torga Liisi
Toribio Lucas
Toska Marsida
Trosterud Trond
Trukhina Anna
Tsarfaty Reut
Tulchynska Kira
Türk Utku
Tyers Francis
Þórðarson Sveinbjörn
Þorsteinsson Vilhjálmur
Uematsu Sumire
Untilov Roman
Urešová Zdeňka
Uria Larraitz
Uszkoreit Hans
Utka Andrius
Vagnoni Elena
Vajjala Sowmya
Vak Socrates
Vakirtzian Socrates
van der Goot Rob
Vanhove Martine
van Niekerk Daniel
van Noord Gertjan
Varga Viktor
Vedenina Uliana
Venturi Giulia
Vergez-Couret Marianne
Vidová Hladká Barbora
Villemonte de la Clergerie Eric
Vincze Veronika
Vissamsetty Anishka
Vlasova Natalia
Vligouridou Eleni
Wakasa Aya
Wallenberg Joel C.
Wallin Lars
Walsh Abigail
Wang John
Washington Jonathan North
Weissweiler Leonie
Wendt Maximilan
Widmer Paul
Wigderson Shira
Wijono Sri Hartati
Wille Vanessa Berwanger
Williams Seyi
Winkler Miriam
Wintner Shuly
Wirén Mats
Wittern Christian
Witzlack-Makarevich Alena
Woldemariam Tsegay
Wong Tak-sum
Wróblewska Alina
Wu Qishen
Yako Mary
Yamashita Kayo
Yamazaki Naoki
Yan Chunxiao
Yang Xiulin
Yasuoka Koichi
Yavrumyan Marat M.
Yenice Arife Betül
Yılandiloğlu Enes
Yıldız Olcay Taner
Yu Zhuoran
Yuliawati Arlisa
Žabokrtský Zdeněk
Zahra Shorouq
Zeldes Amir
Zhou He
Zhu Hanzhi
Zhu Yilun
Zhuravleva Anna
Ziane Rayan
Znotiņš Artūrs
Publication venue: Universal Dependencies Consortium
Publication date: 15/05/2025
Field of study

Universal Dependencies is a project that seeks to develop cross-linguistically consistent treebank annotation for many languages, with the goal of facilitating multilingual parser development, cross-lingual learning, and parsing research from a language typology perspective. The annotation scheme is based on (universal) Stanford dependencies (de Marneffe et al., 2006, 2008, 2014), Google universal part-of-speech tags (Petrov et al., 2012), and the Interset interlingua for morphosyntactic tagsets (Zeman, 2008)

0

full texts

2,527

metadata records

Updated in last 30 days.

LINDAT/CLARIN digital library at the Institute of Formal and Applied Linguistics (ÚFAL), Faculty of Mathematics and Physics, Charles University

Access Repository Dashboard

Do you manage Open Research Online? Become a CORE Member to access insider analytics, issue reports and manage access to outputs from your repository in the CORE Repository Dashboard! 👇