10 research outputs found

    DeepDict — A Graphical Corpus-based Dictionary of Word Relations

    Get PDF
    Proceedings of the 17th Nordic Conference of Computational Linguistics NODALIDA 2009. Editors: Kristiina Jokinen and Eckhard Bick. NEALT Proceedings Series, Vol. 4 (2009), 268-271. © 2009 The editors and contributors. Published by Northern European Association for Language Technology (NEALT) http://omilia.uio.no/nealt . Electronically published at Tartu University Library (Estonia) http://hdl.handle.net/10062/9206

    Deepdict - et korpusbaseret relationelt leksikon

    Get PDF
    DeepDict (at www.gramtrans.com) is a new type of lexical resource,built from grammatically analysed corpus data. Co-occurrencestrength between mother-daughter dependency pairs is used toautomatically produce dictionary entries of typical complementationpatterns and collocations, in the fashion of an instant monolingualusage dictionary. DeepDict is capable of abstracting lemmarelations and semantic classes from inflected surface forms, andprovides concordances and statistics for the relations found. Entriesare supplied to the user in a graphical interface with variousthresholds for lexical frequencies as well as absolute and relative cooccurrencefrequencies. DeepDict draws its data from ConstraintGrammar-analysed corpora, ranging between tens and hundreds ofmillions of words, covering the major Germanic and Romance languages,among them both Swedish, Danish and Norwegian. Apartfrom its obvious lexicographical purposes, DeepDict also targetsteaching environments and translators

    Sadness-related Expressions in Danish and German: A Corpus-assisted NSM-analysis

    Get PDF
    The study explores sadness-related expressions in two typologically closely related languages in the natural semantic metalanguage (NSM) framework. A systematic corpus enquiry revealed the syntactic patterns and helped to identify the most frequent head-nouns of a number of Danish and German sadness-related expressions. German traurig, for instance, has a distribution similar to that of Danish sørgelig with semiotic products and clauses as subjects. However, when used with human subjects, its distribution aligns with the Danish multi-word expression ked af det. Semantic consultations conducted about the use of the most salient sadness adjectives with some speakers of Danish and German revealed fine-grained differences between German traurig and trist and Danish ked af det and trist respectively. Thus, when used with a human headword, Danish trist is more trait-like while ked af det is more state- like. The concept of sadness-related emotions in Danish and German is discussed, followed by a methodological discussion about the combinability of a quantitative corpus approach, a qualitative semantic consultation approach and NSM explications. Corpus inquiry was used to chart the adjectives’ polysemy, and as a method for creating the NSM explications, consultation data were used

    Reviewing Possible Extraction Tools

    Get PDF
    UID/LIN/03213/2013Collocations are a main problem for any natural language processing task, from machine translation to summarization. With the goal of building a corpus with collocations, enriched with statistical information about them, we survey, in this paper, four tools for extracting collocations. These tools allow us to collect sentences with collocations, and also to gather statistics on this particular type of co-ocurrences, like Mutual Information and Log likelihood values.publishersversionpublishe

    Deepdict - et korpusbaseret relationelt leksikon

    Get PDF
    DeepDict (at www.gramtrans.com) is a new type of lexical resource,built from grammatically analysed corpus data. Co-occurrencestrength between mother-daughter dependency pairs is used toautomatically produce dictionary entries of typical complementationpatterns and collocations, in the fashion of an instant monolingualusage dictionary. DeepDict is capable of abstracting lemmarelations and semantic classes from inflected surface forms, andprovides concordances and statistics for the relations found. Entriesare supplied to the user in a graphical interface with variousthresholds for lexical frequencies as well as absolute and relative cooccurrencefrequencies. DeepDict draws its data from ConstraintGrammar-analysed corpora, ranging between tens and hundreds ofmillions of words, covering the major Germanic and Romance languages,among them both Swedish, Danish and Norwegian. Apartfrom its obvious lexicographical purposes, DeepDict also targetsteaching environments and translators

    Conference Program

    Get PDF
    Proceedings of the 17th Nordic Conference of Computational Linguistics NODALIDA 2009. Editors: Kristiina Jokinen and Eckhard Bick. NEALT Proceedings Series, Vol. 4 (2009), xi-xiv. © 2009 The editors and contributors. Published by Northern European Association for Language Technology (NEALT) http://omilia.uio.no/nealt . Electronically published at Tartu University Library (Estonia) http://hdl.handle.net/10062/9206

    Contents

    Get PDF
    Proceedings of the 17th Nordic Conference of Computational Linguistics NODALIDA 2009. Editors: Kristiina Jokinen and Eckhard Bick. NEALT Proceedings Series, Vol. 4 (2009), iii-vi. © 2009 The editors and contributors. Published by Northern European Association for Language Technology (NEALT) http://omilia.uio.no/nealt . Electronically published at Tartu University Library (Estonia) http://hdl.handle.net/10062/9206

    Semantiske sprogressourcer - mellem sprogteknologi og leksikografi

    Get PDF
    This paper discusses the synergy between lexicography and semanticlanguage resources meant for computational use; before, nowand in the future. On the basis of a brief historical overview of thebackgrounds for language technology and lexicography, respectively,I analyze why the two fields have not always cooperated as closelyas one would think useful. I give a recent example of a project thathas exploited the similarities between the two fields by reusing amonolingual dictionary for the compilation of a Danish wordnetfor technological use: DanNet. I describe some areas where modificationshave been necessary in the reuse process; this regards in particularthe adjustment of hyponymy hierarchies and the spellingoutof underspecified information. I conclude that the two fieldswill most presumably be much more connected in the future dueto recent corpus and editing tools which help exploit more radicallythe intersection between the two areas
    corecore