1,967 research outputs found

    XML Matchers: approaches and challenges

    Full text link
    Schema Matching, i.e. the process of discovering semantic correspondences between concepts adopted in different data source schemas, has been a key topic in Database and Artificial Intelligence research areas for many years. In the past, it was largely investigated especially for classical database models (e.g., E/R schemas, relational databases, etc.). However, in the latest years, the widespread adoption of XML in the most disparate application fields pushed a growing number of researchers to design XML-specific Schema Matching approaches, called XML Matchers, aiming at finding semantic matchings between concepts defined in DTDs and XSDs. XML Matchers do not just take well-known techniques originally designed for other data models and apply them on DTDs/XSDs, but they exploit specific XML features (e.g., the hierarchical structure of a DTD/XSD) to improve the performance of the Schema Matching process. The design of XML Matchers is currently a well-established research area. The main goal of this paper is to provide a detailed description and classification of XML Matchers. We first describe to what extent the specificities of DTDs/XSDs impact on the Schema Matching task. Then we introduce a template, called XML Matcher Template, that describes the main components of an XML Matcher, their role and behavior. We illustrate how each of these components has been implemented in some popular XML Matchers. We consider our XML Matcher Template as the baseline for objectively comparing approaches that, at first glance, might appear as unrelated. The introduction of this template can be useful in the design of future XML Matchers. Finally, we analyze commercial tools implementing XML Matchers and introduce two challenging issues strictly related to this topic, namely XML source clustering and uncertainty management in XML Matchers.Comment: 34 pages, 8 tables, 7 figure

    A BIM - GIS Integrated Information Model Using Semantic Web and RDF Graph Databases

    Get PDF
    In recent years, 3D virtual indoor and outdoor urban modelling has become an essential geospatial information framework for civil and engineering applications such as emergency response, evacuation planning, and facility management. Building multi-sourced and multi-scale 3D urban models are in high demand among architects, engineers, and construction professionals to achieve these tasks and provide relevant information to decision support systems. Spatial modelling technologies such as Building Information Modelling (BIM) and Geographical Information Systems (GIS) are frequently used to meet such high demands. However, sharing data and information between these two domains is still challenging. At the same time, the semantic or syntactic strategies for inter-communication between BIM and GIS do not fully provide rich semantic and geometric information exchange of BIM into GIS or vice-versa. This research study proposes a novel approach for integrating BIM and GIS using semantic web technologies and Resources Description Framework (RDF) graph databases. The suggested solution's originality and novelty come from combining the advantages of integrating BIM and GIS models into a semantically unified data model using a semantic framework and ontology engineering approaches. The new model will be named Integrated Geospatial Information Model (IGIM). It is constructed through three stages. The first stage requires BIMRDF and GISRDF graphs generation from BIM and GIS datasets. Then graph integration from BIM and GIS semantic models creates IGIMRDF. Lastly, the information from IGIMRDF unified graph is filtered using a graph query language and graph data analytics tools. The linkage between BIMRDF and GISRDF is completed through SPARQL endpoints defined by queries using elements and entity classes with similar or complementary information from properties, relationships, and geometries from an ontology-matching process during model construction. The resulting model (or sub-model) can be managed in a graph database system and used in the backend as a data-tier serving web services feeding a front-tier domain-oriented application. A case study was designed, developed, and tested using the semantic integrated information model for validating the newly proposed solution, architecture, and performance

    Doctor of Philosophy

    Get PDF
    dissertationBiomedical data are a rich source of information and knowledge. Not only are they useful for direct patient care, but they may also offer answers to important population-based questions. Creating an environment where advanced analytics can be performed against biomedical data is nontrivial, however. Biomedical data are currently scattered across multiple systems with heterogeneous data, and integrating these data is a bigger task than humans can realistically do by hand; therefore, automatic biomedical data integration is highly desirable but has never been fully achieved. This dissertation introduces new algorithms that were devised to support automatic and semiautomatic integration of heterogeneous biomedical data. The new algorithms incorporate both data mining and biomedical informatics techniques to create "concept bags" that are used to compute similarity between data elements in the same way that "word bags" are compared in data mining. Concept bags are composed of controlled medical vocabulary concept codes that are extracted from text using named-entity recognition software. To test the new algorithm, three biomedical text similarity use cases were examined: automatically aligning data elements between heterogeneous data sets, determining degrees of similarity between medical terms using a published benchmark, and determining similarity between ICU discharge summaries. The method is highly configurable and 5 different versions were tested. The concept bag method performed particularly well aligning data elements and outperformed the compared algorithms by iv more than 5%. Another configuration that included hierarchical semantics performed particularly well at matching medical terms, meeting or exceeding 30 of 31 other published results using the same benchmark. Results for the third scenario of computing ICU discharge summary similarity were less successful. Correlations between multiple methods were low, including between terminologists. The concept bag algorithms performed consistently and comparatively well and appear to be viable options for multiple scenarios. New applications of the method and ideas for improving the algorithm are being discussed for future work, including several performance enhancements, configuration-based enhancements, and concept vector weighting using the TF-IDF formulas

    Semantic Model Alignment for Business Process Integration

    Get PDF
    Business process models describe an enterprise’s way of conducting business and in this form the basis for shaping the organization and engineering the appropriate supporting or even enabling IT. Thereby, a major task in working with models is their analysis and comparison for the purpose of aligning them. As models can differ semantically not only concerning the modeling languages used, but even more so in the way in which the natural language for labeling the model elements has been applied, the correct identification of the intended meaning of a legacy model is a non-trivial task that thus far has only been solved by humans. In particular at the time of reorganizations, the set-up of B2B-collaborations or mergers and acquisitions the semantic analysis of models of different origin that need to be consolidated is a manual effort that is not only tedious and error-prone but also time consuming and costly and often even repetitive. For facilitating automation of this task by means of IT, in this thesis the new method of Semantic Model Alignment is presented. Its application enables to extract and formalize the semantics of models for relating them based on the modeling language used and determining similarities based on the natural language used in model element labels. The resulting alignment supports model-based semantic business process integration. The research conducted is based on a design-science oriented approach and the method developed has been created together with all its enabling artifacts. These results have been published as the research progressed and are presented here in this thesis based on a selection of peer reviewed publications comprehensively describing the various aspects

    Semantically-enhanced recommendations in cultural heritage

    Get PDF
    In the Web 2.0 environment, institutes and organizations are starting to open up their previously isolated and heterogeneous collections in order to provide visitors with maximal access. Semantic Web technologies act as instrumental in integrating these rich collections of metadata by defining ontologies which accommodate different representation schemata and inconsistent naming conventions over the various vocabularies. Facing the large amount of metadata with complex semantic structures, it is becoming more and more important to support visitors with a proper selection and presentation of information. In this context, the Dutch Science Foundation (NWO) funded the Cultural Heritage Information Personalization (CHIP) project in early 2005, as part of the Continuous Access to Cultural Heritage (CATCH) program in the Netherlands. It is a collaborative project between the Rijksmuseum Amsterdam, the Eindhoven University of Technology and the Telematica Instituut. The problem statement that guides the research of this thesis is as follows: Can we support visitors with personalized access to semantically-enriched collections? To study this question, we chose cultural heritage (museums) as an application domain, and the semantically rich background knowledge about the museum collection provides a basis to our research. On top of it, we deployed user modeling and recommendation technologies in order to provide personalized services for museum visitors. Our main contributions are: (i) we developed an interactive rating dialog of artworks and art concepts for a quick instantiation of the CHIP user model, which is built as a specialization of FOAF and mapped to an existing event model ontology SEM; (ii) we proposed a hybrid recommendation algorithm, combining both explicit and implicit relations from the semantic structure of the collection. On the presentation level, we developed three tools for end-users: Art Recommender, Tour Wizard and Mobile Tour Guide. Following a user-centered design cycle, we performed a series of evaluations with museum visitors to test the effectiveness of recommendations using the rating dialog, different ways to build an optimal user model and the prediction accuracy of the hybrid algorithm. Chapter 1 introduces the research questions, our approaches and the outline of this thesis. Chapter 2 gives an overview of our work at the first stage. It includes (i) the semantic enrichment of the Rijksmuseum collection, which is mapped to three Getty vocabularies (ULAN, AAT, TGN) and the Iconclass thesaurus; (ii) the minimal user model ontology defined as a specialization of FOAF, which only stores user ratings at that time, (iii) the first implementation of the content-based recommendation algorithm in our first tool, the CHIP Art Recommender. Chapter 3 presents two other tools: Tour Wizard and Mobile Tour Guide. Based on the user's ratings, the Web-based Tour Wizard recommends museum tours consisting of recommended artworks that are currently available for museum exhibitions. The Mobile Tour Guide converts recommended tours to mobile devices (e.g. PDA) that can be used in the physical museum space. To connect users' various interactions with these tools, we made a conversion of the online user model stored in RDF into XML format which the mobile guide can parse, and in this way we keep the online and on-site user models dynamically synchronized. Chapter 4 presents the second generation of the Mobile Tour Guide with a real time routing system on different mobile devices (e.g. iPod). Compared with the first generation, it can adapt museum tours based on the user's ratings artworks and concepts, her/his current location in the physical museum and the coordinates of the artworks and rooms in the museum. In addition, we mapped the CHIP user model to an existing event model ontology SEM. Besides ratings, it can store additional user activities, such as following a tour and viewing artworks. Chapter 5 identifies a number of semantic relations within one vocabulary (e.g. a concept has a broader/narrower concept) and across multiple vocabularies (e.g. an artist is associated to an art style). We applied all these relations as well as the basic artwork features in content-based recommendations and compared all of them in terms of usefulness. This investigation also enables us to look at the combined use of artwork features and semantic relations in sequence and derive user navigation patterns. Chapter 6 defines the task of personalized recommendations and decomposes the task into a number of inference steps for ontology-based recommender systems, from a perspective of knowledge engineering. We proposed a hybrid approach combining both explicit and implicit recommendations. The explicit relations include artworks features and semantic relations with preliminary weights which are derived from the evaluation in Chapter 5. The implicit relations are built between art concepts based on instance-based ontology matching. Chapter 7 gives an example of reusing user interaction data generated by one application into another one for providing cross-application recommendations. In this example, user tagging about cultural events, gathered by iCITY, is used to enrich the user model for generating content-based recommendations in the CHIP Art Recommender. To realize full tagging interoperability, we investigated the problems that arise in mapping user tags to domain ontologies, and proposed additional mechanisms, such as the use of SKOS matching operators to deal with the possible mis-alignment of tags and domain-specific ontologies. We summarized to what extent the problem statement and each of the research questions are answered in Chapter 8. We also discussed a number of limitations in our research and looked ahead at what may follow as future work
    corecore