273 research outputs found

    Embeddings for word sense disambiguation: an evaluation study

    Get PDF
    Recent years have seen a dramatic growth in the popularity of word embeddings mainly owing to their ability to capture semantic information from massive amounts of textual content. As a result, many tasks in Natural Language Processing have tried to take advantage of the potential of these distributional models. In this work, we study how word embeddings can be used in Word Sense Disambiguation, one of the oldest tasks in Natural Language Processing and Artificial Intelligence. We propose different methods through which word embeddings can be leveraged in a state-of-the-art supervised WSD system architecture, and perform a deep analysis of how different parameters affect performance. We show how a WSD system that makes use of word embeddings alone, if designed properly, can provide significant performance improvement over a state-of-the-art WSD system that incorporates several standard WSD features

    Context-aware graph segmentation for graph-based translation

    Get PDF
    In this paper, we present an improved graph-based translation model which segments an input graph into node-induced subgraphs by taking source context into consideration. Translations are generated by combining subgraph translations leftto-right using beam search. Experiments on Chinese–English and German–English demonstrate that the context-aware segmentation significantly improves the baseline graph-based model

    Disambiguierung deutschsprachiger Diskursmarker: Eine Pilot-Studie

    Get PDF
    Discourse markers such as German aber, wohl or obwohl can be regarded as valuable information for a wide range of text-linguistic applications, since they provide important cues for the interpretation of texts or text segments. Unfortunately, many of them are highly ambiguous. Thus, for their use in applications like automatic text summarizations a reliable disambiguation of discourse markers is needed. This should be done automatically, since manual disambiguation is feasible only for small amounts of data. The aim of this pilot study, therefore, was to investigate methodological requirements of automatic disambiguation of German discourse markers. Two different methods known from word-sense disambiguation, Naive-Bayes and decisionlists, were used for the highly ambiguous marker wenn. A statistical approach was taken to compare the two approaches and different feature combinations
    corecore