12 research outputs found
A classification of the Celtic languages based on grapheme frequencies
Grapheme frequencies from a small parallel corpus of psalms translated into Breton, Cornish, Irish (both the Early Modern and present-day versions of the language), Manx, Scottish Gaelic, and Welsh are analyzed. They can be modelled – as with many other languages – by the negative hypergeometric distribution. Based on the modified Ord graph constructed from the grapheme frequencies, the Celtic languages can be divided into two groups which differ slightly from the traditional Celtic language classification (Manx is placed among the P-Celtic rather than the Q-Celtic languages); however, the difference can be explained by the English (or Scots) influences on the Manx orthography
Frequency and morphological behaviour of nouns in Czech and Russian
Declensional morphology of nouns in Czech and Russian is investigated and compared. It is shown that, in general, word forms which are more similar to their lemmas are preferred, but there are differences between animate and inanimate nouns and also among grammatical genders. The frequency distribution of grammatical cases is also studied, with animacy and gender being again important factors
MorfoCzech
A dictionary of morphologically segmented word forms in Czech. Rules of manual segmentation are described in Pelegrinová, K., Mačutek, J., Čech, R. (2021). The Menzerath-Altmann law as the relation between lengths of words and morphemes in Czech. Jazykovedný časopis, 72, 405-414. The dictionary is based on short stories, fairy tales, letters and studies written by Karel Čapek
MorfoCzech 1.1
A dictionary of morphologically segmented word forms in Czech. Rules of manual segmentation are described in Pelegrinová, K., Mačutek, J., Čech, R. (2021). The Menzerath-Altmann law as the relation between lengths of words and morphemes in Czech. Jazykovedný časopis, 72, 405-414. The dictionary is based on short stories, fairy tales, letters and studies written by Karel Čapek