8 research outputs found

    The potential of semantic paradigm in warehousing of big data

    Get PDF
    Big data have analytical potential that was hard to realize with available technologies. After new storage paradigms intended for big data such as NoSQL databases emerged, traditional systems got pushed out of the focus. The current research is focused on their reconciliation on different levels or paradigm replacement. Similarly, the emergence of NoSQL databases has started to push traditional (relational) data warehouses out of the research and even practical focus. Data warehousing is known for the strict modelling process, capturing the essence of the business processes. For that reason, a mere integration to bridge the NoSQL gap is not enough. It is necessary to deal with this issue on a higher abstraction level during the modelling phase. NoSQL databases generally lack clear, unambiguous schema, making the comprehension of their contents difficult and their integration and analysis harder. This motivated involving semantic web technologies to enrich NoSQL database contents by additional meaning and context. This paper reviews the application of semantics in data integration and data warehousing and analyses its potential in integrating NoSQL data and traditional data warehouses with some focus on document stores. Also, it gives a proposal of the future pursuit directions for the big data warehouse modelling phases

    Light-Weight Ontology Alignment using Best-Match Clone Detection

    Get PDF
    Abstract-Ontologies are a key component of the Semantic Web, providing a common basis for representing and exchanging domain meaning in web documents and resources. Ontology alignment is the problem of relating the elements of two formal ontologies for a semantic domain, in order to identify common concepts and relationships represented using different terminology or language, and thus allow meaningful communication and exchange of documents and resources represented using different ontologies for the same domain. Many algorithms have been proposed for ontology alignment, each with their own strengths and weaknesses. The problem is in many ways similar to nearmiss clone detection: while much of the description of concepts in two ontologies may be similar, there can be differences in structure or vocabulary that make similarity detection challenging. Based on our previous work extending clone detection to modelling languages such as WSDL using contextualization, in this work we apply near-miss clone detection to the problem of ontology alignment, and use the new notion of "best-match" clone detection to achieve results similar to many existing ontology alignment algorithms when applied to standard benchmarks

    Interactive multidimensional modeling of linked data for exploratory OLAP

    Get PDF
    Exploratory OLAP aims at coupling the precision and detail of corporate data with the information wealth of LOD. While some techniques to create, publish, and query RDF cubes are already available, little has been said about how to contextualize these cubes with situational data in an on-demand fashion. In this paper we describe an approach, called iMOLD, that enables non-technical users to enrich an RDF cube with multidimensional knowledge by discovering aggregation hierarchies in LOD. This is done through a user-guided process that recognizes in the LOD the recurring modeling patterns that express roll-up relationships between RDF concepts, then translates these patterns into aggregation hierarchies to enrich the RDF cube. Two families of aggregation patterns are identified, based on associations and generalization respectively, and the algorithms for recognizing them are described. To evaluate iMOLD in terms of efficiency and effectiveness we compare it with a related approach in the literature, we propose a case study based on DBpedia, and we discuss the results of a test made with real users.Peer ReviewedPostprint (author's final draft

    Deliverable D4.2 User profile schema and profile capturing

    Get PDF
    This deliverable presents methods employed in LinkedTV to create, update and formalise a semantic user model that will be used for concept and content filtering. It focuses on the ex-traction of lightweight and dense implicit knowledge about user preferences. This process includes the semantic interpretation of information that stem from the user’s interaction with the content, together with the estimation of the impact that pre-ferred concepts have for each specific interaction based on the type of transaction and the user’s physical reaction to the con-tent. User preferences are then updated based on their age, frequency of appearance and utility, while persistent associa-tions between preferences are learnt. This information evolves to a semantic user model that is made available for predictive inference about relevant concepts and content

    Instance-based Hierarchical Schema Alignment in Linked Data

    Get PDF
    학위논문 (박사)-- 서울대학교 대학원 : 치의과학과 의료경영과정보학전공, 2015. 8. 김홍기.Along with the development of Web of documents, there is a natural need for sharing, exchanging, and merging heterogeneous data to provide more comprehensive information and answer users with more complex questions. However, the data published on the Web are raw dumps that sacrifice much of the semantics that can be used for exchanging and integrating data. Resource Description Framework (RDF) and Linked Data are designed to expose the semantics of data by interlinking data represented with well-defined relations. With the profusion of RDF resources and Linked Data, ontology alignment has gained significance in providing highly comprehensive knowledge embedded in disparate sources. Ontology alignment, however, in Linking Open Data (LOD) has traditionally focused more on the instance-level rather than the schema-level. Linked Data supports schema-level matching, provided that instance-level matching is already established. Linked Data is a hotbed for instance-based schema matching, which is considered a better solution for matching classes with ambiguous or obscure names. In this dissertation, the author focuses on three issues in instance-based schema alignment for Linked Data: (1) how to align schemas based on instances, (2) how to scale the schema alignment, (3) how to generate a hierarchical schema structure. Targeting the first issue, the author has proposed an instance-based schema alignment algorithm called IUT. The IUT builds a unified taxonomy for the classes from two ontologies based on an instance-class matrix and obtains the relations of two classes by the common instances. The author tested the IUT with DBpedia and YAGO2, and compared the IUT with two state-of-the-art methods in four alignment tasks. The experiments show that the IUT outperforms the methods in terms of efficiency and effectiveness (e.g., costs 968 ms to obtain 0.810 F-score on intra-subsumption alignment in DBpedia). Targeting the second issue, the author has proposed a scaled version of the IUT called IUT(M). The IUT(M) decreases the computations of the IUT from two aspects based on Locality Sensitive Hashing (LSH): (1) decreasing the similarity computations for each pair of classes with MinHash functions, and (2) decreasing the number of similarity computations with banding. The author tested the IUT(M) with YAGO2-YAGO2 intra-subsumption alignment task to demonstrate that the running time of IUT can be reduced by 94% with a 5% loss in F-score. Targeting the third issue, the author has proposed a method to generate a faceted taxonomy based on object properties on Linked Data. A framework is proposed to build a sub-taxonomy in each facet with sub-data, extracted with an object property, with an Instance-based Concept Taxonomy generation algorithm called ICT. Two experiments demonstrate: (1) The ICT efficiently and effectively generates a sub-taxonomy with rdf:type in DBpedia and YAGO2 (e.g., costs 49 and 11,790 ms to build the concept taxonomies that achieve 0.917 and 0.780 on Taxonomic F-score). (2) The faceted taxonomies for Diseasome and DrugBank, efficiently generated based on multiple object properties (e.g., costs 2,032 and 2,525 ms to build the faceted taxonomies based on 6 and 16 properties), can effectively reduce the search spaces in faceted searches (e.g., obtains 1.65 and 1.03 on Maximum Resolution with 2 facets).1 Introduction 1 1.1 Background and Motivations 1 1.1.1 Data Integration and Schema Alignment 1 1.1.2 From RDF to Linked Data 3 1.1.3 Schema Alignment in Linked Data 5 1.2 Instance-based Schema Alignment 9 1.3 Contributions of this Dissertation 13 1.4 Organization of this Dissertation 15 2 Preliminaries and Related Works 17 2.1 Preliminaries 17 2.1.1 RDF and Linked Data 17 2.1.2 Ontology and Schema Alignment in Linked Data 20 2.2 Related Works 23 2.2.1 Instance-based Schema Alignment 23 2.2.2 Scaling Pairwise Similarity Computations 29 2.2.3 Automatic Taxonomy Generation 32 3 Aligning Schemas with Subsumption and Equivalence Relations 36 3.1 Introduction 36 3.2 Problem Definition 38 3.3 Methods 41 3.3.1 Workflow of Instance-based Schema Alignment 41 3.3.2 Instance-class Matrix Generation 42 3.3.3 Subsumption and Equivalence Relations Discovering 44 3.4 Experiments 48 3.4.1 Schema Alignment Algorithms in Comparison 48 3.4.2 Data and Experiment Design 48 3.5 Results 52 3.5.1 Intra-subsumption Relations for YAGO2-YAGO2 54 3.5.2 Intra-subsumption Relations for DBpedia-DBpedia 58 3.5.3 Inter-Subsumption and Equivalence Relations for YAGO2-DBpedia 61 3.5.4 Effects of χ_s and χ_e for the IUT 67 3.6 Discussions 71 3.7 Conclusion 75 4 Scaling Pair-wise Computations Using the Locality Sensitive Hashing 76 4.1 Introduction 76 4.2 Methods 78 4.2.1 MinHash and Signatures 79 4.2.2 Banding Technique 83 4.2.3 Scaling the IUT with MinHash and Banding 85 4.3 Experiment 87 4.4 Discussions 92 4.5 Conclusion 93 5 Unsupervised Hierarchical Schema Structure Generation in Linked Data 94 5.1 Introduction 94 5.2 Faceted Taxonomy for Linked Data 98 5.3 Framework 101 5.3.1 Facets Extraction 102 5.3.2 Instance Restriction and Redundancy Removal 102 5.3.3 Redundant Object Removal 103 5.3.4 Instance-object Matrix Generation 103 5.4 Generating Faceted Taxonomy 105 5.4.1 The Problem of Generating a Sub-taxonomy for a Facet 105 5.4.2 Concept Definition and Naming 105 5.4.3 Taxonomy Generation Algorithm 108 5.4.4 Instantiation and Taxonomy Refinement 110 5.5 Experiments 112 5.5.1 Task 1-Construction of Taxonomy with rdf:type 112 5.5.2 Task 2-Construction of Multiple Faceted Taxonomies 115 5.6 Results 119 5.6.1 Results of Task 1 119 5.6.2 Results of Task 2 124 5.7 Discussion 131 5.8 Conclusion 133 6 Future Works and Conclusion 134 6.1 Future Works 134 6.1.1 Similarity Measures for Instance-based Schema Alignment 134 6.1.2 Ontology Evolution for Instance-based Schema Alignment 135 6.1.3 Combining the IUT with Structure- and Lexical-based Methods 136 6.1.4 Scaling the IUT with Parallel Computations 137 6.1.5 Faceted Navigation and Search for Linked Data 137 6.2 Conclusion 139 Bibliography 142 초록 152Docto

    Semantic Enrichment of Ontology Mappings

    Get PDF
    Schema and ontology matching play an important part in the field of data integration and semantic web. Given two heterogeneous data sources, meta data matching usually constitutes the first step in the data integration workflow, which refers to the analysis and comparison of two input resources like schemas or ontologies. The result is a list of correspondences between the two schemas or ontologies, which is often called mapping or alignment. Many tools and research approaches have been proposed to automatically determine those correspondences. However, most match tools do not provide any information about the relation type that holds between matching concepts, for the simple but important reason that most common match strategies are too simple and heuristic to allow any sophisticated relation type determination. Knowing the specific type holding between two concepts, e.g., whether they are in an equality, subsumption (is-a) or part-of relation, is very important for advanced data integration tasks, such as ontology merging or ontology evolution. It is also very important for mappings in the biological or biomedical domain, where is-a and part-of relations may exceed the number of equality correspondences by far. Such more expressive mappings allow much better integration results and have scarcely been in the focus of research so far. In this doctoral thesis, the determination of the correspondence types in a given mapping is the focus of interest, which is referred to as semantic mapping enrichment. We introduce and present the mapping enrichment tool STROMA, which obtains a pre-calculated schema or ontology mapping and for each correspondence determines a semantic relation type. In contrast to previous approaches, we will strongly focus on linguistic laws and linguistic insights. By and large, linguistics is the key for precise matching and for the determination of relation types. We will introduce various strategies that make use of these linguistic laws and are able to calculate the semantic type between two matching concepts. The observations and insights gained from this research go far beyond the field of mapping enrichment and can be also applied to schema and ontology matching in general. Since generic strategies have certain limits and may not be able to determine the relation type between more complex concepts, like a laptop and a personal computer, background knowledge plays an important role in this research as well. For example, a thesaurus can help to recognize that these two concepts are in an is-a relation. We will show how background knowledge can be effectively used in this instance, how it is possible to draw conclusions even if a concept is not contained in it, how the relation types in complex paths can be resolved and how time complexity can be reduced by a so-called bidirectional search. The developed techniques go far beyond the background knowledge exploitation of previous approaches, and are now part of the semantic repository SemRep, a flexible and extendable system that combines different lexicographic resources. Further on, we will show how additional lexicographic resources can be developed automatically by parsing Wikipedia articles. The proposed Wikipedia relation extraction approach yields some millions of additional relations, which constitute significant additional knowledge for mapping enrichment. The extracted relations were also added to SemRep, which thus became a comprehensive background knowledge resource. To augment the quality of the repository, different techniques were used to discover and delete irrelevant semantic relations. We could show in several experiments that STROMA obtains very good results w.r.t. relation type detection. In a comparative evaluation, it was able to achieve considerably better results than related applications. This corroborates the overall usefulness and strengths of the implemented strategies, which were developed with particular emphasis on the principles and laws of linguistics
    corecore