5 research outputs found
Dansk betydningsinventar i et datalingvistisk perspektiv
In this paper we investigate the Danish sense inventory from a paradigmatic and a syntagmatic perspective, respectively, and we present a collection of related lexical semantic resources that we have developed in collaboration between The Society for Danish Language and Literature and The University of Copenhagen. The resources comprise a Danish wordnet (DanNet), The Danish FrameNet Lexicon, and The Danish Sentiment Lexicon. All three resources are designed to enable semantic processing to be used in digital humanities research as well as more broadly in language-centric technology development. Finally, in order to illustrate the use of the resources when processing running text, we provide some annotation examples of each resource
FT Speech: Danish Parliament Speech Corpus
This paper introduces FT Speech, a new speech corpus created from the
recorded meetings of the Danish Parliament, otherwise known as the Folketing
(FT). The corpus contains over 1,800 hours of transcribed speech by a total of
434 speakers. It is significantly larger in duration, vocabulary, and amount of
spontaneous speech than the existing public speech corpora for Danish, which
are largely limited to read-aloud and dictation data. We outline design
considerations, including the preprocessing methods and the alignment
procedure. To evaluate the quality of the corpus, we train automatic speech
recognition systems on the new resource and compare them to the systems trained
on the Danish part of Spr\r{a}kbanken, the largest public ASR corpus for Danish
to date. Our baseline results show that we achieve a 14.01 WER on the new
corpus. A combination of FT Speech with in-domain language data provides
comparable results to models trained specifically on Spr\r{a}kbanken, showing
that FT Speech transfers well to this data set. Interestingly, our results
demonstrate that the opposite is not the case. This shows that FT Speech
provides a valuable resource for promoting research on Danish ASR with more
spontaneous speech.Comment: Submitted to Interspeech 202