Slovenian parliamentary corpus SlovParl 1.0

Erjavec, Tomaž; Pančur, Andrej; Šorn, Mojca

Slovenian parliamentary corpus SlovParl 1.0

Authors: Tomaž Erjavec
Andrej Pančur
Mojca Šorn
Publication date: 28 August 2016
Publisher: Institute of Contemporary History, Czech Academy of Sciences

Abstract

The SlovParl corpus contains minutes of the Chamber of Associated Labour of the Assembly of the Republic of Slovenia for the legislative period 1990-1992, i.e. it covers the period before, during, and after Slovenia became an independent country in 1991. The corpus comprises 54 sessions, 13,894 speeches and almost 2.7 million words. The corpus contains extensive meta-data about the speakers, a typology of sessions etc. and structural and editorial annotations. This item comprises three datasets: - the corpus in TEI P5 (module Transcriptions of speech); - the corpus in TEI P5 with added automatic linguistic annotation: tokenisation, MSD tagging and lemmatisation; - the corpus in vertical format used by various concordancers, e.g. CWB and Sketch Engine; this format is simpler and smaller but does not contain all the information from the source TEI. The SlovParl data originally come from https://github.com/SIstory/SlovParl, but have been converted to use TEI elements for speech. This version of the corpus corresponds to commit https://github.com/DARIAH-SI/CLARIN.SI/tree/5984661e7b19e054b3fb650f4d2d5d409b3d7e3d The resource is presented in the paper: Pančur, Andrej. "Označevanje zbirke zapisnikov sej slovenskega parlamenta s smernicami TEI." In the Proceedings of the Conference on Language Technologies & Digital Humanities (Tomaž Erjavec and Darja Fišer, eds.) 142-148. Ljubljana: Znanstvena založba Filozofske fakultete v Ljubljani, 2016. http://www.sdjt.si/wp/wp-content/uploads/2016/09/JTDH-2016_Pancur_Oznacevanje-zbirke-zapisnikov-sej-slovenskega-parlamenta.pd

Similar works

Full text

Available Versions

Common Language Resources and Technology Infrastructure - Slovenia

oai:www.clarin.si:11356/1167

Last time updated on 07/05/2019

Common Language Resources and Technology Infrastructure - Slovenia

oai:www.clarin.si:11356/1075

Last time updated on 07/05/2019