Search CORE

30,648 research outputs found

Structure-semantics interplay in complex networks and its effects on the predictability of similarity in texts

Author: Amancio Diego R.
Costa Luciano da F.
Oliveira Jr. Osvaldo N.
Publication venue: 'Elsevier BV'
Publication date: 01/03/2013
Field of study

There are different ways to define similarity for grouping similar texts into clusters, as the concept of similarity may depend on the purpose of the task. For instance, in topic extraction similar texts mean those within the same semantic field, whereas in author recognition stylistic features should be considered. In this study, we introduce ways to classify texts employing concepts of complex networks, which may be able to capture syntactic, semantic and even pragmatic features. The interplay between the various metrics of the complex networks is analyzed with three applications, namely identification of machine translation (MT) systems, evaluation of quality of machine translated texts and authorship recognition. We shall show that topological features of the networks representing texts can enhance the ability to identify MT systems in particular cases. For evaluating the quality of MT texts, on the other hand, high correlation was obtained with methods capable of capturing the semantics. This was expected because the golden standards used are themselves based on word co-occurrence. Notwithstanding, the Katz similarity, which involves semantic and structure in the comparison of texts, achieved the highest correlation with the NIST measurement, indicating that in some cases the combination of both approaches can improve the ability to quantify quality in MT. In authorship recognition, again the topological features were relevant in some contexts, though for the books and authors analyzed good results were obtained with semantic features as well. Because hybrid approaches encompassing semantic and topological features have not been extensively used, we believe that the methodology proposed here may be useful to enhance text classification considerably, as it combines well-established strategies

arXiv.org e-Print Archive

Elsevier - Publisher Connector

RCAAP - Repositório Científico de Acesso Aberto de Portugal

Universidade de São Paulo

Exploiting conceptual spaces for ontology integration

Author: Dietze Stefan
Domingue John
Publication venue
Publication date: 01/01/2008
Field of study

The widespread use of ontologies raises the need to integrate distinct conceptualisations. Whereas the symbolic approach of established representation standards – based on first-order logic (FOL) and syllogistic reasoning – does not implicitly represent semantic similarities, ontology mapping addresses this problem by aiming at establishing formal relations between a set of knowledge entities which represent the same or a similar meaning in distinct ontologies. However, manually or semi-automatically identifying similarity relationships is costly. Hence, we argue, that representational facilities are required which enable to implicitly represent similarities. Whereas Conceptual Spaces (CS) address similarity computation through the representation of concepts as vector spaces, CS rovide neither an implicit representational mechanism nor a means to represent arbitrary relations between concepts or instances. In order to overcome these issues, we propose a hybrid knowledge representation approach which extends FOL-based ontologies with a conceptual grounding through a set of CS-based representations. Consequently, semantic similarity between instances – represented as members in CS – is indicated by means of distance metrics. Hence, automatic similarity detection across distinct ontologies is supported in order to facilitate ontology integration

CiteSeerX

Open Research Online (The Open University)

Self-adaptive GA, quantitative semantic similarity measures and ontology-based text clustering

Author: Li Chenghua
Song Wei
Yu Wei
Zhang Chengzhi
Publication venue: IEEE Press
Publication date: 01/01/2008
Field of study

As the common clustering algorithms use vector space model (VSM) to represent document, the conceptual relationships between related terms which do not co-occur literally are ignored. A genetic algorithm-based clustering technique, named GA clustering, in conjunction with ontology is proposed in this article to overcome this problem. In general, the ontology measures can be partitioned into two categories: thesaurus-based methods and corpus-based methods. We take advantage of the hierarchical structure and the broad coverage taxonomy of Wordnet as the thesaurus-based ontology. However, the corpus-based method is rather complicated to handle in practical application. We propose a transformed latent semantic analysis (LSA) model as the corpus-based method in this paper. Moreover, two hybrid strategies, the combinations of the various similarity measures, are implemented in the clustering experiments. The results show that our GA clustering algorithm, in conjunction with the thesaurus-based and the LSA-based method, apparently outperforms that with other similarity measures. Moreover, the superiority of the GA clustering algorithm proposed over the commonly used k-means algorithm and the standard GA is demonstrated by the improvements of the clustering performance

E-LIS

Crossref

Recommended from our members

Bridging between sensor measurements and symbolic ontologies through conceptual spaces

Author: Dietze Stefan
Domingue John
Publication venue
Publication date: 01/06/2009
Field of study

The increasing availability of sensor data through a variety of sensor-driven devices raises the need to exploit the data observed by sensors with the help of formally specified knowledge representations, such as the ones provided by the Semantic Web. In order to facilitate such a Semantic Sensor Web, the challenge is to bridge between symbolic knowledge representations and the measured data collected by sensors. In particular, one needs to map a given set of arbitrary sensor data to a particular set of symbolic knowledge representations, e.g. ontology instances. This task is particularly challenging due to the potential infinite variety of possible sensor measurements. Conceptual Spaces (CS) provide a means to represent knowledge in geometrical vector spaces in order to enable computation of similarities between knowledge entities by means of distance metrics. We propose an ontology for CS which allows to refine symbolic concepts as CS and to ground instances to so-called prototypical members described by vectors. By computing similarities in terms of spatial distances between a given set of sensor measurements and a finite set of prototypical members, the most similar instance can be identified. In that, we provide a means to bridge between the real-world as observed by sensors and symbolic representations. We also propose an initial implementation utilizing our approach for measurement-based Semantic Web Service discovery

Open Research Online (The Open University)

Recommended from our members

Blending the physical and the digital through conceptual spaces

Author: Benn Neil
Dietze Stefan
Domingue John
Orthuber Wolfgang
Publication venue
Publication date: 01/01/2009
Field of study

The rise of the Internet facilitates an ever increasing growth of virtual, i.e. digital spaces which co-exist with the physical environment, i.e. the physical space. In that, the question arises, how physical and digital space can interact synchronously. While sensors provide a means to continuously observe the physical space, several issues arise with respect to mapping sensor data streams to digital spaces, for instance, structured linked data, formally represented through symbolic Semantic Web (SW) standards such as OWL or RDF. The challenge is to bridge between symbolic knowledge representations and the measured data collected by sensors. In particular, one needs to map a given set of arbitrary sensor data to a particular set of symbolic knowledge representations, e.g. ontology instances. This task is particularly challenging due to the vast variety of possible sensor measurements. Conceptual Spaces (CS) provide a means to represent knowledge in geometrical vector spaces in order to enable computation of similarities between knowledge entities by means of distance metrics. We propose an approach which allows to refine symbolic concepts as CS and to ground ontology instances to so-called prototypical members which are vectors in the CS. By computing similarities in terms of spatial distances between a given set of sensor measurements and a finite set of CS members, the most similar instance can be identified. In that, we provide a means to bridge between the physical space, as observed by sensors, and the digital space made up of symbolic representations

Open Research Online (The Open University)