Danish is a North Germanic/Scandinavian language spoken primarily in Denmark,
a country with a tradition of technological and scientific innovation. However,
from a technological perspective, the Danish language has received relatively
little attention and, as a result, Danish language technology is hard to
develop, in part due to a lack of large or broad-coverage Danish corpora. This
paper describes the Danish Gigaword project, which aims to construct a
freely-available one billion word corpus of Danish text that represents the
breadth of the written language