8,582 research outputs found

    Building Semantic Corpus from WordNet

    Get PDF
    We propose a novel methodology for extracting semantic similarity knowledge from semi-structured sources, such as WordNet. Unlike existing approaches that only explore the structured information (e.g., the hypernym relationship in WordNet), we present a framework that allows us to utilize all available information, including natural language descriptions. Our approach constructs a semantic corpus. It is represented using a graph that models the relationship between phrases using numbers. The data in the semantic corpus can be used to measure the similarity between phrases, the similarity between documents, or to perform a semantic search in a set of documents that uses the meaning of words and phrases (i.e., search that is not keyword-based)

    Building a wordnet for Turkish

    Get PDF
    This paper summarizes the development process of a wordnet for Turkish as part of the Balkanet project. After discussing the basic method-ological issues that had to be resolved during the course of the project, the paper presents the basic steps of the construction process in chronological order. Two applications using Turkish wordnet are summarized and links to resources for wordnet builders are provided at the end of the paper

    Towards a Universal Wordnet by Learning from Combined Evidenc

    Get PDF
    Lexical databases are invaluable sources of knowledge about words and their meanings, with numerous applications in areas like NLP, IR, and AI. We propose a methodology for the automatic construction of a large-scale multilingual lexical database where words of many languages are hierarchically organized in terms of their meanings and their semantic relations to other words. This resource is bootstrapped from WordNet, a well-known English-language resource. Our approach extends WordNet with around 1.5 million meaning links for 800,000 words in over 200 languages, drawing on evidence extracted from a variety of resources including existing (monolingual) wordnets, (mostly bilingual) translation dictionaries, and parallel corpora. Graph-based scoring functions and statistical learning techniques are used to iteratively integrate this information and build an output graph. Experiments show that this wordnet has a high level of precision and coverage, and that it can be useful in applied tasks such as cross-lingual text classification
    corecore