32,959 research outputs found

    Distributional Measures of Semantic Distance: A Survey

    Full text link
    The ability to mimic human notions of semantic distance has widespread applications. Some measures rely only on raw text (distributional measures) and some rely on knowledge sources such as WordNet. Although extensive studies have been performed to compare WordNet-based measures with human judgment, the use of distributional measures as proxies to estimate semantic distance has received little attention. Even though they have traditionally performed poorly when compared to WordNet-based measures, they lay claim to certain uniquely attractive features, such as their applicability in resource-poor languages and their ability to mimic both semantic similarity and semantic relatedness. Therefore, this paper presents a detailed study of distributional measures. Particular attention is paid to flesh out the strengths and limitations of both WordNet-based and distributional measures, and how distributional measures of distance can be brought more in line with human notions of semantic distance. We conclude with a brief discussion of recent work on hybrid measures

    Building a wordnet for Turkish

    Get PDF
    This paper summarizes the development process of a wordnet for Turkish as part of the Balkanet project. After discussing the basic method-ological issues that had to be resolved during the course of the project, the paper presents the basic steps of the construction process in chronological order. Two applications using Turkish wordnet are summarized and links to resources for wordnet builders are provided at the end of the paper

    Affect Analysis of Radical Contents on Web Forums Using SentiWordNet

    Get PDF
    The internet has become a major tool for communication, training, fundraising, media operations, and recruitment, and these processes often use web forums. This paper presents a model that was built using SentiWordNet, WordNet and NLTK to analyze selected web forums that included radical content. SentiWordNet is a lexical resource for supporting opinion mining by assigning a positivity score and a negativity score to each WordNet. The approaches of the model measure and identify sentiment polarity and affect the intensity of that which appears in the web forum. The results show that SentiWordNet can be used for analyzing sentences that appear in web forums

    Extending, trimming and fusing WordNet for technical documents

    Get PDF
    This paper describes a tool for the automatic extension and trimming of a multilingual WordNet database for cross-lingual retrieval and multilingual ontology building in intranets and domain-specific document collections. Hierarchies, built from automatically extracted terms and combined with the WordNet relations, are trimmed with a disambiguation method based on the document salience of the words in the glosses. The disambiguation is tested in a cross-lingual retrieval task, showing considerable improvement (7%-11%). The condensed hierarchies can be used as browse-interfaces to the documents complementary to retrieval

    A proposal for a shallow ontologization of WordNet

    Get PDF
    En este artículo se presenta el trabajo que se está realizando para la llamada ontologización superficial de WordNet, una estructura orientada a superar muchos de los problemas estructurales de la popular base de conocimiento léxico. El resultado esperado es un recurso multilingüe más apropiado que los ahora existentes para el procesamiento semántico a gran escala.This paper presents the work carried out towards the so-called shallow ontologization of WordNet, which is argued to be a way to overcome most of the many structural problems of the widely used lexical knowledge base. The result shall be a multilingual resource more suitable for large-scale semantic processing
    corecore