Verbs annotated for morphemic structure in Czech, English, German, Spanish v2

Hledíková, Hana

Search results>Research output from LINDAT/CLARIN digital library at the Institute of Formal and Applied Linguistics (ÚFAL), Faculty of Mathematics and Physics, Charles University

oai:lindat.mff.cuni.cz:11234/1-6124

Verbs annotated for morphemic structure in Czech, English, German, Spanish v2

Authors: Hana Hledíková
Publication date: 12 March 2026
Publisher: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)

Abstract

A sample of verb lemmas in four languages: Czech (19,040 lemmas), English (9,969 lemmas), German (27,158 lemmas), Spanish (11,768 lemmas). Each verb lemma is annotated for its morphemic structure (i.e., segmented into the prefiex(es), root(s), suffix(es) and ending(s) that the given lemma contains), classification of its root morph to a root morpheme where needed (to facilitate grouping of verbs with the same root morpheme), and its frequency of the verb in a 100 M corpus. Two versions are available for each language: one with a more coarse-grained segmentation, which captures the morphemic structure that is synchronically available, and a version with a more fine-grained segmentation, which also captures the word's etymology

Similar works

Full text

LINDAT/CLARIN digital library at the Institute of Formal and Applied Linguistics (ÚFAL), Faculty of Mathematics and Physics, Charles University

oai:lindat.mff.cuni.cz:11234/1...

Last time updated on 22/03/2026

This paper was published in LINDAT/CLARIN digital library at the Institute of Formal and Applied Linguistics (ÚFAL), Faculty of Mathematics and Physics, Charles University.

Having an issue?

Is data on this page outdated, violates copyrights or anything else? Report the problem now and we will take corresponding actions after reviewing your request.