    Kollokationer som fraseologisk kategori set fra forskellige synsvinker

    The delimitation of collocations as a category on its own can be done along different lines. The purpose of this paper is to discuss the definition and use of the term ’collocation’ from various points of view. Collocations differ from free word combinations on one hand and from idioms on the other, having two characteristic features: a certain degree of meaning compositionality and a certain degree of formal, structural and lexical fixedness. These features appear in several combinations which form the basis of different classifications. First, criteria and classifications are discussed; second, the use of the term in lexicographic theory and corpus linguistics is analyzed. The discussion is illustrated by a few examples of lexicographic representation in Danish dictionaries. Finally, an outline of various descriptive approaches with their focus on particular features is presented

    What do we need to know about humans? A view into the DanNet database

    Proceedings of the 17th Nordic Conference of Computational Linguistics NODALIDA 2009. Editors: Kristiina Jokinen and Eckhard Bick. NEALT Proceedings Series, Vol. 4 (2009), 158-165. © 2009 The editors and contributors. Published by Northern European Association for Language Technology (NEALT) http://omilia.uio.no/nealt . Electronically published at Tartu University Library (Estonia) http://hdl.handle.net/10062/9206

    Når maskinen tager en på ordet - ordbogsarbejde for maskinoversættelse

    Dette indlæg beskæftiger sig med nogle aspekter af ordbogsarbejdet som er særegne for maskinoversættelse af fagsproglige tekster. (For korthedens skyld bruges i det følgende de engelske forkortelser MT/HT for hhv. maskinel og traditionel (human) oversættelse)

    Den danske Sprogteknologiske Ordbase og dens anvendelse i værktøj til leksikografiske formål

    The Danish Lexicon for Language Technology Applications (STO) and its use in a tool forlexicographic purposes. This article deals with the largest and most comprehensive computationallexicon for Danish. Firstly, the development principles, the lexical coverage and the linguisticcontent of this lexicon are presented. This part focuses on the treatment of inflectionalmorphology by means of the Remove/Add computing method. Secondly, the development andfunctionalities of a flexible and effective lemmatiser program for Danish are discussed; the rulesof the lemmatiser have been derived from the STO morphology data. A few examples illustratethe use of the lemmatiser in solving lexicographic tasks. Finally, the user-interface for onlinelook-ups in the STO database is described: it transforms the computational lexicon into anelectronic dictionary making it a useful source of lexical knowledge for lexicographers and otherinterested users. Also a number of useful web addresses, viz. to the STO database, the lemmatiserand relevant documentation, also in English, are provided

    Dansk standard for indholds- og strukturbeskrivelse af leksikalske datasamlinger - Eksempler på anvendelser i leksikografisk arbejde

    Tue fust version of aProposal for Danish Standard: Lexical data collections - Description of data categories and data structure - Part 1: Taxonomy for the classification of information types has been sent out for comments by Dansk Standard (The Danish Standard Association). This contribution deals with two main applications ofthe taxonomy: description ofthe information content of an existing lexical data collection for exchange/reuse purposes and planning of a new lexical data collection with a view to reusability. Tue Centre for Language Technology had the opportunity to work experimentally with both applications ofthe taxonomy within the framework of different projects. We provide here a brief description of the tasks where we applied the taxonomy. We conclude with some relevant aspects of our experience with the application ofthe taxonomy

    Verber med fælles syntaktiske konstruktioner og semantisk slægtskab

    This paper deals with the issue of observable relationships between the syntactic behaviour and semantics of selected verb groups. The aim is to investigate the feasibilityof exploiting syntactic similarity for effective uncovering of semantic kinship. The investigation is inspired by the demand for large-scale lexical resources combiningmorphological, syntactic and semantic descriptions on the one hand, and on the other by the fact that semantic networks, like WordNet usually do not at all, or only to a verylimited degree provide syntactic information. Both aspects show that there is a need for effective means that combine syntactic and semantic description methods and resourcescontaining the two information types.The pilot project presented here discusses verbs sharing selected syntactic descriptions; they are extracted from the Danish Lexicon for Language Technology Applications (STO)and grouped together according to semantic labels, which in the first run are intuitivelys elected. The idea is to derive underlying semantic information systematically from thesyntactic surface structures described. Although this approach starts from the ‘wrong side’ compared to other, well-known analyzes of the semantics-syntax relationship, itmight be exploited in prediction of verb senses

    Om elektroniske ordbøger – til brug for mennesker

    This paper deals with products which are in general terms called ’electronic dictionaries’.Firstly, a brief retrospect is given to the increasing use of computers in lexicography.Secondly, four selected types of electronic dictionaries and their basicfeatures are discussed. Thirdly, some essential technical facilities and lexicographicdesign features are outlined. Hereafter the utility value of fully developed electronicdictionaries is sketched. An excursus presents briefly the use of a Danish computationallexicon, den Sprogteknologiske Ordbase, as an electronic dictionary forhumans. Finally, visions for the future developments are presented

    Arbejdet med "Forslag om dansk standard for lagring og udveksling af leksikalske data"

    This paper gives a brief report on the work done so far within the ongoing STANLEX project which is aiming at a proposal for standardization of storage and exchange of lexical data. The STANLEX group is affiliated to the Danish Standard Association. The current goal is to develop a clearly defined format (comprising both stmcture and content aspects) to support efficient sharing of machine readable dictionary data. Also the reusability and multifunctionality aspects of these resources will be strengthened by means of a general taxonomy covering both printed dictionaries and current requirements of lexicon modules integrated with natura! language processing systems

    Hvad får man skudt i skoene? Flerordsenheder i aktive ordbøger for mennesker og maskiner

    Det arbejde, vi her fremlægger, er udsprunget af et samarbejde inden for den danske EUROTRAgruppe, men har udviklet sig til et samarbejde mellem artiklens to forfattere som repræsentanter for hhv. EUROTRA-DK (som nu er en del af det nyoprettede Center for Sprogteknologi) og en projektgruppe ved Handelshøjskolen i København. Formålet med det igangværende arbejde er at samle og klassificere et så omfattende materiale af eksempler på flerordsenheder, at vi vil kunne registrere de mønstre, der forhåbentlig vil tegne sig, og på grundlag heraf udvikle en beskrivelsesmodel, som er eksplicit og udtømmende nok til både at være implemeterbar i NLP og kunne danne basis for en aktiv ordbog for mennesker. Dette materiale hentes fortrinsvis i maskinlæsbare korpora som DK87-90 og sammenboldes med det materiale, som er beskrevet i Erik Bruun: Dansk Sprogbrug. Metoden og målsætningen omfatter både leksikografiske og datamatiske aspekter, og udbyttet af vekselvirkningen imellem disse aspekter anser vi for at være det mest spændende i dette projekt

    Title Pages

    Proceedings of the NODALIDA 2009 workshop WordNets and other Lexical Semantic Resources — between Lexical Semantics, Lexicography, Terminology and Formal Ontologies. Editors: Bolette Sandford Pedersen, Anna Braasch, Sanni Nimb and Ruth Vatvedt Fjeld. NEALT Proceedings Series, Vol. 7 (2009), i-ii. © 2009 The editors and contributors. Published by Northern European Association for Language Technology (NEALT) http://omilia.uio.no/nealt . Electronically published at Tartu University Library (Estonia) http://hdl.handle.net/10062/9209