6 research outputs found

    An unsupervised data-driven method to discover equivalent relations in large linked datasets

    Get PDF
    This article addresses a number of limitations of state-of-the-art methods of Ontology Alignment: 1) they primarily address concepts and entities while relations are less well-studied; 2) many build on the assumption of the ‘well-formedness’ of ontologies which is unnecessarily true in the domain of Linked Open Data; 3) few have looked at schema heterogeneity from a single source, which is also a common issue particularly in very large Linked Dataset created automatically from heterogeneous resources, or integrated from multiple datasets. We propose a domain- and language-independent and completely unsupervised method to align equivalent relations across schemata based on their shared instances. We introduce a novel similarity measure able to cope with unbalanced population of schema elements, an unsupervised technique to automatically decide similarity threshold to assert equivalence for a pair of relations, and an unsupervised clustering process to discover groups of equivalent relations across different schemata. Although the method is designed for aligning relations within a single dataset, it can also be adapted for cross-dataset alignment where sameAs links between datasets have been established. Using three gold standards created based on DBpedia, we obtain encouraging results from a thorough evaluation involving four baseline similarity measures and over 15 comparative models based on variants of the proposed method. The proposed method makes significant improvement over baseline models in terms of F1 measure (mostly between 7% and 40%), and it always scores the highest precision and is also among the top performers in terms of recall. We also make public the datasets used in this work, which we believe make the largest collection of gold standards for evaluating relation alignment in the LOD context

    Learning Class Disjointness Axioms Using Grammatical Evolution

    Get PDF
    International audienceoday, with the development of the Semantic Web, LinkedOpen Data (LOD), expressed using the Resource Description Frame-work (RDF), has reached the status of “big data” and can be consideredas a giant data resource from which knowledge can be discovered. Theprocess of learning knowledge defined in terms of OWL 2 axioms fromthe RDF datasets can be viewed as a special case of knowledge discov-ery from data or “data mining”, which can be called “RDF mining”.The approaches to automated generation of the axioms from recordedRDF facts on the Web may be regarded as a case of inductive reasoningand ontology learning. The instances, represented by RDF triples, playthe role of specific observations, from which axioms can be extracted bygeneralization. Based on the insight that discovering new knowledge isessentially an evolutionary process, whereby hypotheses are generatedby some heuristic mechanism and then tested against the available evi-dence, so that only the best hypotheses survive, we propose the use ofGrammatical Evolution, one type of evolutionary algorithm, for miningdisjointness OWL 2 axioms from an RDF data repository such as DBpe-dia. For the evaluation of candidate axioms against the DBpedia dataset,we adopt an approach based on possibility theory

    Hybrid fuzzy multi-objective particle swarm optimization for taxonomy extraction

    Get PDF
    Ontology learning refers to an automatic extraction of ontology to produce the ontology learning layer cake which consists of five kinds of output: terms, concepts, taxonomy relations, non-taxonomy relations and axioms. Term extraction is a prerequisite for all aspects of ontology learning. It is the automatic mining of complete terms from the input document. Another important part of ontology is taxonomy, or the hierarchy of concepts. It presents a tree view of the ontology and shows the inheritance between subconcepts and superconcepts. In this research, two methods were proposed for improving the performance of the extraction result. The first method uses particle swarm optimization in order to optimize the weights of features. The advantage of particle swarm optimization is that it can calculate and adjust the weight of each feature according to the appropriate value, and here it is used to improve the performance of term and taxonomy extraction. The second method uses a hybrid technique that uses multi-objective particle swarm optimization and fuzzy systems that ensures that the membership functions and fuzzy system rule sets are optimized. The advantage of using a fuzzy system is that the imprecise and uncertain values of feature weights can be tolerated during the extraction process. This method is used to improve the performance of taxonomy extraction. In the term extraction experiment, five extracted features were used for each term from the document. These features were represented by feature vectors consisting of domain relevance, domain consensus, term cohesion, first occurrence and length of noun phrase. For taxonomy extraction, matching Hearst lexico-syntactic patterns in documents and the web, and hypernym information form WordNet were used as the features that represent each pair of terms from the texts. These two proposed methods are evaluated using a dataset that contains documents about tourism. For term extraction, the proposed method is compared with benchmark algorithms such as Term Frequency Inverse Document Frequency, Weirdness, Glossary Extraction and Term Extractor, using the precision performance evaluation measurement. For taxonomy extraction, the proposed methods are compared with benchmark methods of Feature-based and weighting by Support Vector Machine using the f-measure, precision and recall performance evaluation measurements. For the first method, the experiment results concluded that implementing particle swarm optimization in order to optimize the feature weights in terms and taxonomy extraction leads to improved accuracy of extraction result compared to the benchmark algorithms. For the second method, the results concluded that the hybrid technique that uses multi-objective particle swarm optimization and fuzzy systems leads to improved performance of taxonomy extraction results when compared to the benchmark methods, while adjusting the fuzzy membership function and keeping the number of fuzzy rules to a minimum number with a high degree of accuracy

    PERICLES Deliverable 4.3:Content Semantics and Use Context Analysis Techniques

    Get PDF
    The current deliverable summarises the work conducted within task T4.3 of WP4, focusing on the extraction and the subsequent analysis of semantic information from digital content, which is imperative for its preservability. More specifically, the deliverable defines content semantic information from a visual and textual perspective, explains how this information can be exploited in long-term digital preservation and proposes novel approaches for extracting this information in a scalable manner. Additionally, the deliverable discusses novel techniques for retrieving and analysing the context of use of digital objects. Although this topic has not been extensively studied by existing literature, we believe use context is vital in augmenting the semantic information and maintaining the usability and preservability of the digital objects, as well as their ability to be accurately interpreted as initially intended.PERICLE

    Crowsdsourcing semantic web

    Get PDF
    Finding easier and less resource-intensive ways of building knowledge resources is neces- sary to help broaden the coverage and use of semantic web technologies. Crowdsourcing presents a means through which knowledge can be efficiently acquired to build semantic resources. Crowds can be identified that represent communities whose knowledge could be used to build domain ontologies. This work presents a knowledge acquisition approach aimed at incorporating ontology engineering tasks into community crowd activity. The success of this approach is evaluated by the degree to which a crowd consensus is reached regarding the description of the target domain. Two experiments are described which test the effectiveness of the approach. The first experiment tests the approach by using a crowd that is aware of the knowledge acquisition task. In the second experiment, the crowd is unaware of the knowledge acquisition task and is motivated to contribute through the use of an interactive map. The results of these two experiments show that a similar consensus is reached from both experiments, suggesting that the approach offered provides a valid mechanism for incorporating knowledge acquisition tasks into routine crowd activity
    corecore