Search CORE

9 research outputs found

AKCES 4

Author: Bedřichová Zuzanna
Hana Jiří
Hrdlička Milan
Hrdličková Tereza
Janeš Petr
Jelínek Tomáš
Lundáková Kateřina
Petkevič Vladimír
Rosen Alexandr
Skoumalová Hana
Sládek Šimon
Šebesta Karel
Škodová Svatava
Štindlová Barbora
Šťastný Klement
Publication venue: 'Charles University in Prague, Karolinum Press'
Publication date: 12/12/2012
Field of study

Corpus AKCES 4 includes texts written in czech by youth growing up in locations at risk of social exclusion (AKCES/CLAC - Czech Language Acquisition Corpora

LINDAT/CLARIN digital library at the Institute of Formal and Applied Linguistics (ÚFAL), Faculty of Mathematics and Physics, Charles University

AKCES 3

Author: Bedřichová Zuzanna
Hana Jiří
Hrdlička Milan
Hrdličková Tereza
Janeš Petr
Jelínek Tomáš
Lundáková Kateřina
Petkevič Vladimír
Pierscieniak Piotr
Poláčková Marie
Rosen Alexandr
Skoumalová Hana
Sládek Šimon
Šebesta Karel
Škodová Svatava
Šormová Kateřina
Štindlová Barbora
Šťastný Klement
Publication venue: Charles University in Prague, ÚČJTK
Publication date: 12/12/2012
Field of study

Corpus AKCES 3 includes texts written in czech by non-native speakers (AKCES/CLAC - Czech Language Acquisition Corpora

LINDAT/CLARIN digital library at the Institute of Formal and Applied Linguistics (ÚFAL), Faculty of Mathematics and Physics, Charles University

AKCES 3

Author: Bedřichová Zuzanna
Hana Jiří
Hrdlička Milan
Hrdličková Tereza
Janeš Petr
Jelínek Tomáš
Lundáková Kateřina
Petkevič Vladimír
Pierscieniak Piotr
Poláčková Marie
Rosen Alexandr
Skoumalová Hana
Sládek Šimon
Šebesta Karel
Škodová Svatava
Šormová Kateřina
Štindlová Barbora
Šťastný Klement
Publication venue: Charles University in Prague, ÚČJTK
Publication date: 12/12/2012
Field of study

Corpus AKCES 3 includes texts written in czech by non-native speakers (AKCES/CLAC - Czech Language Acquisition Corpora

LINDAT/CLARIN digital library at the Institute of Formal and Applied Linguistics (ÚFAL), Faculty of Mathematics and Physics, Charles University

AKCES 5 (CzeSL-SGT) Release 2

Author: Bedřichová Zuzanna
Hana Jiří
Hrdlička Milan
Hrdličková Tereza
Janeš Petr
Jelínek Tomáš
Lundáková Kateřina
Petkevič Vladimír
Pierscieniak Piotr
Poláčková Marie
Richter Michal
Rosen Alexandr
Skoumalová Hana
Sládek Šimon
Straka Milan
Toufarová Dagmar
Šebesta Karel
Škodová Svatava
Šormová Kateřina
Štindlová Barbora
Publication venue: 'Charles University in Prague, Karolinum Press'
Publication date: 27/07/2014
Field of study

Essays written by non-native learners of Czech, a part of AKCES/CLAC – Czech Language Acquisition Corpora. CzeSL-SGT stands for Czech as a Second Language with Spelling, Grammar and Tags. Extends the “foreign” (ciz) part of AKCES 3 (CzeSL-plain) by texts collected in 2013. Original forms and automatic corrections are tagged, lemmatized and assigned erros labels. Most texts have metadata attributes (30 items) about the author and the text. In addition to a few minor bugs, fixes a critical issue in Release 1: the native speakers of Ukrainian (s_L1:"uk") were wrongly labelled as speakers of "other European languages" (s_L1_group="IE"), instead of speakers of a Slavic language (s_L1_group="S"). The file is now a regular XML document, with all annotation represented as XML attributes

LINDAT/CLARIN digital library at the Institute of Formal and Applied Linguistics (ÚFAL), Faculty of Mathematics and Physics, Charles University

AKCES-GEC Grammatical Error Correction Dataset for Czech

Author: Bedřichová Zuzanna
Hana Jiří
Hrdlička Milan
Hrdličková Tereza
Janeš Petr
Jelínek Tomáš
Lundáková Kateřina
Náplava Jakub
Petkevič Vladimír
Pierscieniak Piotr
Poláčková Marie
Rosen Alexandr
Skoumalová Hana
Sládek Šimon
Straka Milan
Toufarová Dagmar
Šebesta Karel
Škodová Svatava
Šormová Kateřina
Štindlová Barbora
Publication venue: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Publication date: 27/09/2019
Field of study

AKCES-GEC is a grammar error correction corpus for Czech generated from a subset of AKCES. It contains train, dev and test files annotated in M2 format. Note that in comparison to CZESL-GEC dataset, this dataset contains separated edits together with their type annotations in M2 format and also has two times more sentences. If you use this dataset, please use following citation: @article{naplava2019wnut, title={Grammatical Error Correction in Low-Resource Scenarios}, author={N{\'a}plava, Jakub and Straka, Milan}, journal={arXiv preprint arXiv:1910.00353}, year={2019}

LINDAT/CLARIN digital library at the Institute of Formal and Applied Linguistics (ÚFAL), Faculty of Mathematics and Physics, Charles University

AKCES-GEC Grammatical Error Correction Dataset for Czech

Author: Bedřichová Zuzanna
Hana Jiří
Hrdlička Milan
Hrdličková Tereza
Janeš Petr
Jelínek Tomáš
Lundáková Kateřina
Náplava Jakub
Petkevič Vladimír
Pierscieniak Piotr
Poláčková Marie
Rosen Alexandr
Skoumalová Hana
Sládek Šimon
Straka Milan
Toufarová Dagmar
Šebesta Karel
Škodová Svatava
Šormová Kateřina
Štindlová Barbora
Publication venue: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Publication date: 27/09/2019
Field of study

LINDAT/CLARIN digital library at the Institute of Formal and Applied Linguistics (ÚFAL), Faculty of Mathematics and Physics, Charles University

CzeSL Grammatical Error Correction Dataset (CzeSL-GEC)

Author: Bedřichová Zuzanna
Hana Jiří
Hrdlička Milan
Hrdličková Tereza
Janeš Petr
Jelínek Tomáš
Lundáková Kateřina
Náplava Jakub
Petkevič Vladimír
Pierscieniak Piotr
Poláčková Marie
Rosen Alexandr
Skoumalová Hana
Sládek Šimon
Straka Milan
Toufarová Dagmar
Šebesta Karel
Škodová Svatava
Šormová Kateřina
Štindlová Barbora
Publication venue: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Publication date: 30/04/2017
Field of study

CzeSL-GEC is a corpus containing sentence pairs of original and corrected versions of Czech sentences collected from essays written by both non-native learners of Czech and Czech pupils with Romani background. To create this corpus, unreleased CzeSL-man corpus (http://utkl.ff.cuni.cz/learncorp/) was utilized. All sentences in the corpus are word tokenized

LINDAT/CLARIN digital library at the Institute of Formal and Applied Linguistics (ÚFAL), Faculty of Mathematics and Physics, Charles University

AKCES 5 (CzeSL-SGT)

Author: Bedřichová Zuzanna
Hana Jiří
Hrdlička Milan
Hrdličková Tereza
Janeš Petr
Jelínek Tomáš
Lundáková Kateřina
Petkevič Vladimír
Pierscieniak Piotr
Poláčková Marie
Richter Michal
Rosen Alexandr
Skoumalová Hana
Sládek Šimon
Straka Milan
Toufarová Dagmar
Šebesta Karel
Škodová Svatava
Šormová Kateřina
Štindlová Barbora
Publication venue: 'Charles University in Prague, Karolinum Press'
Publication date: 26/05/2014
Field of study

LINDAT/CLARIN digital library at the Institute of Formal and Applied Linguistics (ÚFAL), Faculty of Mathematics and Physics, Charles University

AKCES 5 (CzeSL-SGT)

Author: Bedřichová Zuzanna
Hana Jiří
Hrdlička Milan
Hrdličková Tereza
Janeš Petr
Jelínek Tomáš
Lundáková Kateřina
Petkevič Vladimír
Pierscieniak Piotr
Poláčková Marie
Richter Michal
Rosen Alexandr
Skoumalová Hana
Sládek Šimon
Straka Milan
Toufarová Dagmar
Šebesta Karel
Škodová Svatava
Šormová Kateřina
Štindlová Barbora
Publication venue: 'Charles University in Prague, Karolinum Press'
Publication date: 26/05/2014
Field of study

LINDAT/CLARIN digital library at the Institute of Formal and Applied Linguistics (ÚFAL), Faculty of Mathematics and Physics, Charles University