30,648 research outputs found
Structure-semantics interplay in complex networks and its effects on the predictability of similarity in texts
There are different ways to define similarity for grouping similar texts into
clusters, as the concept of similarity may depend on the purpose of the task.
For instance, in topic extraction similar texts mean those within the same
semantic field, whereas in author recognition stylistic features should be
considered. In this study, we introduce ways to classify texts employing
concepts of complex networks, which may be able to capture syntactic, semantic
and even pragmatic features. The interplay between the various metrics of the
complex networks is analyzed with three applications, namely identification of
machine translation (MT) systems, evaluation of quality of machine translated
texts and authorship recognition. We shall show that topological features of
the networks representing texts can enhance the ability to identify MT systems
in particular cases. For evaluating the quality of MT texts, on the other hand,
high correlation was obtained with methods capable of capturing the semantics.
This was expected because the golden standards used are themselves based on
word co-occurrence. Notwithstanding, the Katz similarity, which involves
semantic and structure in the comparison of texts, achieved the highest
correlation with the NIST measurement, indicating that in some cases the
combination of both approaches can improve the ability to quantify quality in
MT. In authorship recognition, again the topological features were relevant in
some contexts, though for the books and authors analyzed good results were
obtained with semantic features as well. Because hybrid approaches encompassing
semantic and topological features have not been extensively used, we believe
that the methodology proposed here may be useful to enhance text classification
considerably, as it combines well-established strategies
Exploiting conceptual spaces for ontology integration
The widespread use of ontologies raises the need to integrate distinct conceptualisations. Whereas the symbolic approach of established representation standards â based on first-order logic (FOL) and syllogistic reasoning â does not implicitly represent semantic similarities, ontology mapping addresses this problem by aiming at establishing formal relations between a set of knowledge entities which represent the same or a similar meaning in distinct ontologies. However, manually or semi-automatically identifying similarity relationships is costly. Hence, we argue, that representational facilities are required which enable to implicitly represent similarities. Whereas Conceptual Spaces (CS) address similarity computation through the representation of concepts as vector spaces, CS rovide neither an implicit representational mechanism nor a means to represent arbitrary relations between concepts or instances. In order to overcome these issues, we propose a hybrid knowledge representation approach which extends FOL-based ontologies with a conceptual grounding through a set of CS-based representations. Consequently, semantic similarity between instances â represented as members in CS â is indicated by means of distance metrics. Hence, automatic similarity detection across distinct ontologies is supported in order to facilitate ontology integration
Self-adaptive GA, quantitative semantic similarity measures and ontology-based text clustering
As the common clustering algorithms use vector space model (VSM) to represent document, the conceptual relationships between related terms which do not co-occur literally are ignored. A genetic algorithm-based clustering technique, named GA clustering, in conjunction with ontology is proposed in this article to overcome this problem. In general, the ontology measures can be partitioned into two categories: thesaurus-based methods and corpus-based methods. We take advantage of the hierarchical structure and the broad coverage taxonomy of Wordnet as the thesaurus-based ontology. However, the corpus-based method is rather complicated to handle in practical application. We propose a transformed latent semantic analysis (LSA) model as the corpus-based method in this paper. Moreover, two hybrid strategies, the combinations of the various similarity measures, are implemented in the clustering experiments. The results show that our GA clustering algorithm, in conjunction with the thesaurus-based and the LSA-based method, apparently outperforms that with other similarity measures. Moreover, the superiority of the GA clustering algorithm proposed over the commonly used k-means algorithm and the standard GA is demonstrated by the improvements of the clustering performance
Recommended from our members
Bridging between sensor measurements and symbolic ontologies through conceptual spaces
The increasing availability of sensor data through a variety of sensor-driven devices raises the need to exploit the data observed by sensors with the help of formally specified knowledge representations, such as the ones provided by the Semantic Web. In order to facilitate such a Semantic Sensor Web, the challenge is to bridge between symbolic knowledge representations and the measured data collected by sensors. In particular, one needs to map a given set of arbitrary sensor data to a particular set of symbolic knowledge representations, e.g. ontology instances. This task is particularly challenging due to the potential infinite variety of possible sensor measurements. Conceptual Spaces (CS) provide a means to represent knowledge in geometrical vector spaces in order to enable computation of similarities between knowledge entities by means of distance metrics. We propose an ontology for CS which allows to refine symbolic concepts as CS and to ground instances to so-called prototypical members described by vectors. By computing similarities in terms of spatial distances between a given set of sensor measurements and a finite set of prototypical members, the most similar instance can be identified. In that, we provide a means to bridge between the real-world as observed by sensors and symbolic representations. We also propose an initial implementation utilizing our approach for measurement-based Semantic Web Service discovery
Recommended from our members
Blending the physical and the digital through conceptual spaces
The rise of the Internet facilitates an ever increasing growth of virtual, i.e. digital spaces which co-exist with the physical environment, i.e. the physical space. In that, the question arises, how physical and digital space can interact synchronously. While sensors provide a means to continuously observe the physical space, several issues arise with respect to mapping sensor data streams to digital spaces, for instance, structured linked data, formally represented through symbolic Semantic Web (SW) standards such as OWL or RDF. The challenge is to bridge between symbolic knowledge representations and the measured data collected by sensors. In particular, one needs to map a given set of arbitrary sensor data to a particular set of symbolic knowledge representations, e.g. ontology instances. This task is particularly challenging due to the vast variety of possible sensor measurements. Conceptual Spaces (CS) provide a means to represent knowledge in geometrical vector spaces in order to enable computation of similarities between knowledge entities by means of distance metrics. We propose an approach which allows to refine symbolic concepts as CS and to ground ontology instances to so-called prototypical members which are vectors in the CS. By computing similarities in terms of spatial distances between a given set of sensor measurements and a finite set of CS members, the most similar instance can be identified. In that, we provide a means to bridge between the physical space, as observed by sensors, and the digital space made up of symbolic representations
- âŠ