Search CORE

2,321 research outputs found

An information retrieval approach to ontology mapping

Author: Gulla J.A.
Su X.
Publication venue: Elsevier
Publication date: 01/01/2006
Field of study

In this paper, we present a heuristic mapping method and a prototype mapping system that support the process of semi-automatic ontology mapping for the purpose of improving semantic interoperability in heterogeneous systems. The approach is based on the idea of semantic enrichment, i.e., using instance information of the ontology to enrich the original ontology and calculate similarities between concepts in two ontologies. The functional settings for the mapping system are discussed and the evaluation of the prototype implementation of the approach is reported. \ud \u

CiteSeerX

University of Twente Research Information

Semantic Sort: A Supervised Approach to Personalized Semantic Relatedness

Author: El-Yaniv Ran
Yanay David
Publication venue
Publication date: 10/11/2013
Field of study

We propose and study a novel supervised approach to learning statistical semantic relatedness models from subjectively annotated training examples. The proposed semantic model consists of parameterized co-occurrence statistics associated with textual units of a large background knowledge corpus. We present an efficient algorithm for learning such semantic models from a training sample of relatedness preferences. Our method is corpus independent and can essentially rely on any sufficiently large (unstructured) collection of coherent texts. Moreover, the approach facilitates the fitting of semantic models for specific users or groups of users. We present the results of extensive range of experiments from small to large scale, indicating that the proposed method is effective and competitive with the state-of-the-art.Comment: 37 pages, 8 figures A short version of this paper was already published at ECML/PKDD 201

arXiv.org e-Print Archive

CiteSeerX

A Unified multilingual semantic representation of concepts

Author: CAMACHO COLLADOS Jose'
Navigli Roberto
Pilehvar MOHAMMED TAHER
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2015
Field of study

Semantic representation lies at the core of several applications in Natural Language Processing. However, most existing semantic representation techniques cannot be used effectively for the representation of individual word senses. We put forward a novel multilingual concept representation, called MUFFIN , which not only enables accurate representation of word senses in different languages, but also provides multiple advantages over existing approaches. MUFFIN represents a given concept in a unified semantic space irrespective of the language of interest, enabling cross-lingual comparison of different concepts. We evaluate our approach in two different evaluation benchmarks, semantic similarity and Word Sense Disambiguation, reporting state-of-the-art performance on several standard datasets

CiteSeerX

Crossref

Online Research @ Cardiff

Archivio della ricerca- Università di Roma La Sapienza

NASARI: a novel approach to a Semantically-Aware Representation of items

Author: CAMACHO COLLADOS Jose'
Navigli Roberto
Pilehvar MOHAMMED TAHER
Publication venue
Publication date: 01/01/2015
Field of study

The semantic representation of individual word senses and concepts is of fundamental importance to several applications in Natural Language Processing. To date, concept modeling techniques have in the main based their representation either on lexicographic resources, such as WordNet, or on encyclopedic resources, such as Wikipedia. We propose a vector representation technique that combines the complementary knowledge of both these types of resource. Thanks to its use of explicit semantics combined with a novel cluster-based dimensionality reduction and an effective weighting scheme, our representation attains state-of-the-art performance on multiple datasets in two standard benchmarks: word similarity and sense clustering. We are releasing our vector representations at http://lcl.uniroma1.it/nasari/

CiteSeerX

Crossref

Online Research @ Cardiff

Archivio della ricerca- Università di Roma La Sapienza

Distributional Measures of Semantic Distance: A Survey

Author: Hirst Graeme
Mohammad Saif M.
Publication venue
Publication date: 01/01/2012
Field of study

The ability to mimic human notions of semantic distance has widespread applications. Some measures rely only on raw text (distributional measures) and some rely on knowledge sources such as WordNet. Although extensive studies have been performed to compare WordNet-based measures with human judgment, the use of distributional measures as proxies to estimate semantic distance has received little attention. Even though they have traditionally performed poorly when compared to WordNet-based measures, they lay claim to certain uniquely attractive features, such as their applicability in resource-poor languages and their ability to mimic both semantic similarity and semantic relatedness. Therefore, this paper presents a detailed study of distributional measures. Particular attention is paid to flesh out the strengths and limitations of both WordNet-based and distributional measures, and how distributional measures of distance can be brought more in line with human notions of semantic distance. We conclude with a brief discussion of recent work on hybrid measures

arXiv.org e-Print Archive

CiteSeerX

On the Effect of Semantically Enriched Context Models on Software Modularization

Author: Hage Jurriaan
Jansen Slinger
Khadka Ravi
Saeidi Amir
Publication venue: 'Aspect-Oriented Software Association (AOSA)'
Publication date: 04/08/2017
Field of study

Many of the existing approaches for program comprehension rely on the linguistic information found in source code, such as identifier names and comments. Semantic clustering is one such technique for modularization of the system that relies on the informal semantics of the program, encoded in the vocabulary used in the source code. Treating the source code as a collection of tokens loses the semantic information embedded within the identifiers. We try to overcome this problem by introducing context models for source code identifiers to obtain a semantic kernel, which can be used for both deriving the topics that run through the system as well as their clustering. In the first model, we abstract an identifier to its type representation and build on this notion of context to construct contextual vector representation of the source code. The second notion of context is defined based on the flow of data between identifiers to represent a module as a dependency graph where the nodes correspond to identifiers and the edges represent the data dependencies between pairs of identifiers. We have applied our approach to 10 medium-sized open source Java projects, and show that by introducing contexts for identifiers, the quality of the modularization of the software systems is improved. Both of the context models give results that are superior to the plain vector representation of documents. In some cases, the authoritativeness of decompositions is improved by 67%. Furthermore, a more detailed evaluation of our approach on JEdit, an open source editor, demonstrates that inferred topics through performing topic analysis on the contextual representations are more meaningful compared to the plain representation of the documents. The proposed approach in introducing a context model for source code identifiers paves the way for building tools that support developers in program comprehension tasks such as application and domain concept location, software modularization and topic analysis

arXiv.org e-Print Archive

Heriot Watt Pure

Crossref

ZENODO

Utrecht University Repository

FigShare