The Danish Gigaword Project

Baglini, Rebekah; Christiansen, Morten H.; Ciosici, Manuel R.; Dalsgaard, Jacob Aarup; Fusaroli, Riccardo; Henrichsen, Peter Juel; Hvingelby, Rasmus; Kirkedal, Andreas; Kjeldsen, Alex Speed; Ladefoged, Claus; Nielsen, Finn Årup; Petersen, Malte Lau; Rystrøm, Jonathan Hvithamar; Strømberg-Derczynski, Leon; Varab, Daniel

slides

The Danish Gigaword Project

Authors: Rebekah Baglini
Morten H. Christiansen
Manuel R. Ciosici
Jacob Aarup Dalsgaard
Riccardo Fusaroli
Peter Juel Henrichsen
Rasmus Hvingelby
Andreas Kirkedal
Alex Speed Kjeldsen
Claus Ladefoged
Finn Årup Nielsen
Malte Lau Petersen
Jonathan Hvithamar Rystrøm
Leon Strømberg-Derczynski
Daniel Varab
Publication date: 1 January 2020
Publisher

Abstract

Danish is a North Germanic/Scandinavian language spoken primarily in Denmark, a country with a tradition of technological and scientific innovation. However, from a technological perspective, the Danish language has received relatively little attention and, as a result, Danish language technology is hard to develop, in part due to a lack of large or broad-coverage Danish corpora. This paper describes the Danish Gigaword project, which aims to construct a freely-available one billion word corpus of Danish text that represents the breadth of the written language

Similar works

Full text

Available Versions

Copenhagen University Research Information System

oai:pure.atira.dk:publications...

Last time updated on 19/05/2021

Online Research Database In Technology

oai:pure.atira.dk:publications...

Last time updated on 01/01/2022