10 research outputs found
Writer and journalist Edvard Valenta after 1945
This thesis examines the role of Czech writer Edvard Valenta (1901-1978) in post-war Czechoslovakia journalism. The overview part of the paper tracks Valenta's texts published from 1945 to 1948 in Svobodné noviny and in magazine Dnešek. Attention is also devoted to Edvard Valenta's correspondence not only with many Czech writers, but also with his own family. Based on archival research and eyewitness accounts, this work tries to capture the life and the career of Czech journalist and writer Edvard Valenta, including his imprisonment after february 1948. The work focuses on the facts that were not known about Edvard Valenta yet, trying to organize them and to make them available for any further research or to organize them and to make them available for any further research or writing of Valenta's monography. Furthermore, the contemporary journalism situation is outlined as well as the political interventions into cultural issues. The selection part of the paper deals with the analysis of specific polemical texts among literary groups and authors with different political beliefs. The aim is to highlight the importance of Edvard Valenta in journalism field in the Third Republic and to highlight the inextricable connection of literature and journalism. Keywords: Czech literature of the second half of..
AKCES 4
Corpus AKCES 4 includes texts written in czech by youth growing up in locations at risk of social exclusion (AKCES/CLAC - Czech Language Acquisition Corpora
AKCES 3
Corpus AKCES 3 includes texts written in czech by non-native speakers (AKCES/CLAC - Czech Language Acquisition Corpora
AKCES 3
Corpus AKCES 3 includes texts written in czech by non-native speakers (AKCES/CLAC - Czech Language Acquisition Corpora
AKCES-GEC Grammatical Error Correction Dataset for Czech
AKCES-GEC is a grammar error correction corpus for Czech generated from a subset of AKCES. It contains train, dev and test files annotated in M2 format.
Note that in comparison to CZESL-GEC dataset, this dataset contains separated edits together with their type annotations in M2 format and also has two times more sentences.
If you use this dataset, please use following citation:
@article{naplava2019wnut,
title={Grammatical Error Correction in Low-Resource Scenarios},
author={N{\'a}plava, Jakub and Straka, Milan},
journal={arXiv preprint arXiv:1910.00353},
year={2019}
AKCES 5 (CzeSL-SGT) Release 2
Essays written by non-native learners of Czech, a part of AKCES/CLAC – Czech Language Acquisition Corpora. CzeSL-SGT stands for Czech as a Second Language with Spelling, Grammar and Tags. Extends the “foreign” (ciz) part of AKCES 3 (CzeSL-plain) by texts collected in 2013. Original forms and automatic corrections are tagged, lemmatized and assigned erros labels. Most texts have metadata attributes (30 items) about the author and the text.
In addition to a few minor bugs, fixes a critical issue in Release 1: the native speakers of Ukrainian (s_L1:"uk") were wrongly labelled as speakers of "other European languages" (s_L1_group="IE"), instead of speakers of a Slavic language (s_L1_group="S"). The file is now a regular XML document, with all annotation represented as XML attributes
AKCES-GEC Grammatical Error Correction Dataset for Czech
AKCES-GEC is a grammar error correction corpus for Czech generated from a subset of AKCES. It contains train, dev and test files annotated in M2 format.
Note that in comparison to CZESL-GEC dataset, this dataset contains separated edits together with their type annotations in M2 format and also has two times more sentences.
If you use this dataset, please use following citation:
@article{naplava2019wnut,
title={Grammatical Error Correction in Low-Resource Scenarios},
author={N{\'a}plava, Jakub and Straka, Milan},
journal={arXiv preprint arXiv:1910.00353},
year={2019}
CzeSL Grammatical Error Correction Dataset (CzeSL-GEC)
CzeSL-GEC is a corpus containing sentence pairs of original and corrected versions of Czech sentences collected from essays written by both non-native learners of Czech and Czech pupils with Romani background. To create this corpus, unreleased CzeSL-man corpus (http://utkl.ff.cuni.cz/learncorp/) was utilized. All sentences in the corpus are word tokenized
AKCES 5 (CzeSL-SGT)
Essays written by non-native learners of Czech, a part of AKCES/CLAC – Czech Language Acquisition Corpora. CzeSL-SGT stands for Czech as a Second Language with Spelling, Grammar and Tags. Extends the “foreign” (ciz) part of AKCES 3 (CzeSL-plain) by texts collected in 2013. Original forms and automatic corrections are tagged, lemmatized and assigned erros labels. Most texts have metadata attributes (30 items) about the author and the text
AKCES 5 (CzeSL-SGT)
Essays written by non-native learners of Czech, a part of AKCES/CLAC – Czech Language Acquisition Corpora. CzeSL-SGT stands for Czech as a Second Language with Spelling, Grammar and Tags. Extends the “foreign” (ciz) part of AKCES 3 (CzeSL-plain) by texts collected in 2013. Original forms and automatic corrections are tagged, lemmatized and assigned erros labels. Most texts have metadata attributes (30 items) about the author and the text