1,131 research outputs found

    XML Matchers: approaches and challenges

    Full text link
    Schema Matching, i.e. the process of discovering semantic correspondences between concepts adopted in different data source schemas, has been a key topic in Database and Artificial Intelligence research areas for many years. In the past, it was largely investigated especially for classical database models (e.g., E/R schemas, relational databases, etc.). However, in the latest years, the widespread adoption of XML in the most disparate application fields pushed a growing number of researchers to design XML-specific Schema Matching approaches, called XML Matchers, aiming at finding semantic matchings between concepts defined in DTDs and XSDs. XML Matchers do not just take well-known techniques originally designed for other data models and apply them on DTDs/XSDs, but they exploit specific XML features (e.g., the hierarchical structure of a DTD/XSD) to improve the performance of the Schema Matching process. The design of XML Matchers is currently a well-established research area. The main goal of this paper is to provide a detailed description and classification of XML Matchers. We first describe to what extent the specificities of DTDs/XSDs impact on the Schema Matching task. Then we introduce a template, called XML Matcher Template, that describes the main components of an XML Matcher, their role and behavior. We illustrate how each of these components has been implemented in some popular XML Matchers. We consider our XML Matcher Template as the baseline for objectively comparing approaches that, at first glance, might appear as unrelated. The introduction of this template can be useful in the design of future XML Matchers. Finally, we analyze commercial tools implementing XML Matchers and introduce two challenging issues strictly related to this topic, namely XML source clustering and uncertainty management in XML Matchers.Comment: 34 pages, 8 tables, 7 figure

    A Semantic Approach to Integrating XML Schemas Using Domain Ontologies

    Get PDF
    XML documents might often conform to different schemas even in the same application domain. To support the interoperability among different IT systems, this paper proposes a sophisticated method for integrating XML schemas. The proposed method determines the synonym, hypernym, and holonym relationships among XML elements and attributes by using domain ontologies as well as general dictionaries. Specifically, the proposed method takes the structural information of elements and attributes into account. The conciseness of the schema integrated is also considered. Experimental results with a variety of schemas show that the utilization of a domain ontology and the structural information improved the performance of schema integration

    Multimodality Data Integration in Epilepsy

    Get PDF
    An important goal of software development in the medical field is the design of methods which are able to integrate information obtained from various imaging and nonimaging modalities into a cohesive framework in order to understand the results of qualitatively different measurements in a larger context. Moreover, it is essential to assess the various features of the data quantitatively so that relationships in anatomical and functional domains between complementing modalities can be expressed mathematically. This paper presents a clinically feasible software environment for the quantitative assessment of the relationship among biochemical functions as assessed by PET imaging and electrophysiological parameters derived from intracranial EEG. Based on the developed software tools, quantitative results obtained from individual modalities can be merged into a data structure allowing a consistent framework for advanced data mining techniques and 3D visualization. Moreover, an effort was made to derive quantitative variables (such as the spatial proximity index, SPI) characterizing the relationship between complementing modalities on a more generic level as a prerequisite for efficient data mining strategies. We describe the implementation of this software environment in twelve children (mean age 5.2 ± 4.3 years) with medically intractable partial epilepsy who underwent both high-resolution structural MR and functional PET imaging. Our experiments demonstrate that our approach will lead to a better understanding of the mechanisms of epileptogenesis and might ultimately have an impact on treatment. Moreover, our software environment holds promise to be useful in many other neurological disorders, where integration of multimodality data is crucial for a better understanding of the underlying disease mechanisms

    A decision support system for corporations cyber security risk management

    Get PDF
    This thesis presents a decision aiding system named C3-SEC (Contex-aware Corporative Cyber Security), developed in the context of a master program at Polytechnic Institute of Leiria, Portugal. The research dimension and the corresponding software development process that followed are presented and validated with an application scenario and case study performed at Universidad de las Fuerzas Armadas ESPE – Ecuador. C3-SEC is a decision aiding software intended to support cyber risks and cyber threats analysis of a corporative information and communications technological infrastructure. The resulting software product will help corporations Chief Information Security Officers (CISO) on cyber security risk analysis, decision-making and prevention measures for the infrastructure and information assets protection. The work is initially focused on the evaluation of the most popular and relevant tools available for risk assessment and decision making in the cyber security domain. Their properties, metrics and strategies are studied and their support for cyber security risk analysis, decision-making and prevention is assessed for the protection of organization's information assets. A contribution for cyber security experts decision support is then proposed by the means of reuse and integration of existing tools and C3-SEC software. C3-SEC extends existing tools features from the data collection and data analysis (perception) level to a full context-ware reference model. The software developed makes use of semantic level, ontology-based knowledge representation and inference supported by widely adopted standards, as well as cyber security standards (CVE, CPE, CVSS, etc.) and cyber security information data sources made available by international authorities, to share and exchange information in this domain. C3-SEC development follows a context-aware systems reference model addressing the perception, comprehension, projection and decision/action layers to create corporative scale cyber security situation awareness

    Validation Framework for RDF-based Constraint Languages

    Get PDF
    In this thesis, a validation framework is introduced that enables to consistently execute RDF-based constraint languages on RDF data and to formulate constraints of any type. The framework reduces the representation of constraints to the absolute minimum, is based on formal logics, consists of a small lightweight vocabulary, and ensures consistency regarding validation results and enables constraint transformations for each constraint type across RDF-based constraint languages

    Building an XML document warehouse

    Get PDF
    International audienceData Warehouses and OLAP (On Line Analytical Processing) technologies are dedicated to analyzing structured data issued from organizations' OLTP (On Line Transaction Processing) systems. Furthermore, in order to enhance their decision support systems, these organizations need to explore XML (eXtensible Markup Language) documents as an additional and important source of unstructured data. In this context, this paper addresses the warehousing of document-centric XML documents. More specifically, we propose a two-method approach to build Document Warehouse conceptual schemas. The first method is for the unification of XML document structures; it aims to elaborate a global and generic view for a set of XML documents belonging to the same domain. The second method is for designing multidimensional galaxy schemas for Document Warehouses

    BlogForever D3.2: Interoperability Prospects

    Get PDF
    This report evaluates the interoperability prospects of the BlogForever platform. Therefore, existing interoperability models are reviewed, a Delphi study to identify crucial aspects for the interoperability of web archives and digital libraries is conducted, technical interoperability standards and protocols are reviewed regarding their relevance for BlogForever, a simple approach to consider interoperability in specific usage scenarios is proposed, and a tangible approach to develop a succession plan that would allow a reliable transfer of content from the current digital archive to other digital repositories is presented

    A geo-database for potentially polluting marine sites and associated risk index

    Get PDF
    The increasing availability of geospatial marine data provides an opportunity for hydrographic offices to contribute to the identification of Potentially Polluting Marine Sites (PPMS). To adequately manage these sites, a PPMS Geospatial Database (GeoDB) application was developed to collect and store relevant information suitable for site inventory and geo-spatial analysis. The benefits of structuring the data to conform to the Universal Hydrographic Data Model (IHO S-100) and to use the Geographic Mark-Up Language (GML) for encoding are presented. A storage solution is proposed using a GML-enabled spatial relational database management system (RDBMS). In addition, an example of a risk index methodology is provided based on the defined data structure. The implementation of this example was performed using scripts containing SQL statements. These procedures were implemented using a cross-platform C++ application based on open-source libraries and called PPMS GeoDB Manager

    A conceptual framework and a risk management approach for interoperability between geospatial datacubes

    Get PDF
    De nos jours, nous observons un intĂ©rĂȘt grandissant pour les bases de donnĂ©es gĂ©ospatiales multidimensionnelles. Ces bases de donnĂ©es sont dĂ©veloppĂ©es pour faciliter la prise de dĂ©cisions stratĂ©giques des organisations, et plus spĂ©cifiquement lorsqu’il s’agit de donnĂ©es de diffĂ©rentes Ă©poques et de diffĂ©rents niveaux de granularitĂ©. Cependant, les utilisateurs peuvent avoir besoin d’utiliser plusieurs bases de donnĂ©es gĂ©ospatiales multidimensionnelles. Ces bases de donnĂ©es peuvent ĂȘtre sĂ©mantiquement hĂ©tĂ©rogĂšnes et caractĂ©risĂ©es par diffĂ©rent degrĂ©s de pertinence par rapport au contexte d’utilisation. RĂ©soudre les problĂšmes sĂ©mantiques liĂ©s Ă  l’hĂ©tĂ©rogĂ©nĂ©itĂ© et Ă  la diffĂ©rence de pertinence d’une maniĂšre transparente aux utilisateurs a Ă©tĂ© l’objectif principal de l’interopĂ©rabilitĂ© au cours des quinze derniĂšres annĂ©es. Dans ce contexte, diffĂ©rentes solutions ont Ă©tĂ© proposĂ©es pour traiter l’interopĂ©rabilitĂ©. Cependant, ces solutions ont adoptĂ© une approche non systĂ©matique. De plus, aucune solution pour rĂ©soudre des problĂšmes sĂ©mantiques spĂ©cifiques liĂ©s Ă  l’interopĂ©rabilitĂ© entre les bases de donnĂ©es gĂ©ospatiales multidimensionnelles n’a Ă©tĂ© trouvĂ©e. Dans cette thĂšse, nous supposons qu’il est possible de dĂ©finir une approche qui traite ces problĂšmes sĂ©mantiques pour assurer l’interopĂ©rabilitĂ© entre les bases de donnĂ©es gĂ©ospatiales multidimensionnelles. Ainsi, nous dĂ©finissons tout d’abord l’interopĂ©rabilitĂ© entre ces bases de donnĂ©es. Ensuite, nous dĂ©finissons et classifions les problĂšmes d’hĂ©tĂ©rogĂ©nĂ©itĂ© sĂ©mantique qui peuvent se produire au cours d’une telle interopĂ©rabilitĂ© de diffĂ©rentes bases de donnĂ©es gĂ©ospatiales multidimensionnelles. Afin de rĂ©soudre ces problĂšmes d’hĂ©tĂ©rogĂ©nĂ©itĂ© sĂ©mantique, nous proposons un cadre conceptuel qui se base sur la communication humaine. Dans ce cadre, une communication s’établit entre deux agents systĂšme reprĂ©sentant les bases de donnĂ©es gĂ©ospatiales multidimensionnelles impliquĂ©es dans un processus d’interopĂ©rabilitĂ©. Cette communication vise Ă  Ă©changer de l’information sur le contenu de ces bases. Ensuite, dans l’intention d’aider les agents Ă  prendre des dĂ©cisions appropriĂ©es au cours du processus d’interopĂ©rabilitĂ©, nous Ă©valuons un ensemble d’indicateurs de la qualitĂ© externe (fitness-for-use) des schĂ©mas et du contexte de production (ex., les mĂ©tadonnĂ©es). Finalement, nous mettons en Ɠuvre l’approche afin de montrer sa faisabilitĂ©.Today, we observe wide use of geospatial databases that are implemented in many forms (e.g., transactional centralized systems, distributed databases, multidimensional datacubes). Among those possibilities, the multidimensional datacube is more appropriate to support interactive analysis and to guide the organization’s strategic decisions, especially when different epochs and levels of information granularity are involved. However, one may need to use several geospatial multidimensional datacubes which may be semantically heterogeneous and having different degrees of appropriateness to the context of use. Overcoming the semantic problems related to the semantic heterogeneity and to the difference in the appropriateness to the context of use in a manner that is transparent to users has been the principal aim of interoperability for the last fifteen years. However, in spite of successful initiatives, today's solutions have evolved in a non systematic way. Moreover, no solution has been found to address specific semantic problems related to interoperability between geospatial datacubes. In this thesis, we suppose that it is possible to define an approach that addresses these semantic problems to support interoperability between geospatial datacubes. For that, we first describe interoperability between geospatial datacubes. Then, we define and categorize the semantic heterogeneity problems that may occur during the interoperability process of different geospatial datacubes. In order to resolve semantic heterogeneity between geospatial datacubes, we propose a conceptual framework that is essentially based on human communication. In this framework, software agents representing geospatial datacubes involved in the interoperability process communicate together. Such communication aims at exchanging information about the content of geospatial datacubes. Then, in order to help agents to make appropriate decisions during the interoperability process, we evaluate a set of indicators of the external quality (fitness-for-use) of geospatial datacube schemas and of production context (e.g., metadata). Finally, we implement the proposed approach to show its feasibility
    • 

    corecore