974 research outputs found

    The potential of semantic paradigm in warehousing of big data

    Get PDF
    Big data have analytical potential that was hard to realize with available technologies. After new storage paradigms intended for big data such as NoSQL databases emerged, traditional systems got pushed out of the focus. The current research is focused on their reconciliation on different levels or paradigm replacement. Similarly, the emergence of NoSQL databases has started to push traditional (relational) data warehouses out of the research and even practical focus. Data warehousing is known for the strict modelling process, capturing the essence of the business processes. For that reason, a mere integration to bridge the NoSQL gap is not enough. It is necessary to deal with this issue on a higher abstraction level during the modelling phase. NoSQL databases generally lack clear, unambiguous schema, making the comprehension of their contents difficult and their integration and analysis harder. This motivated involving semantic web technologies to enrich NoSQL database contents by additional meaning and context. This paper reviews the application of semantics in data integration and data warehousing and analyses its potential in integrating NoSQL data and traditional data warehouses with some focus on document stores. Also, it gives a proposal of the future pursuit directions for the big data warehouse modelling phases

    Disaster Data Management in Cloud Environments

    Get PDF
    Facilitating decision-making in a vital discipline such as disaster management requires information gathering, sharing, and integration on a global scale and across governments, industries, communities, and academia. A large quantity of immensely heterogeneous disaster-related data is available; however, current data management solutions offer few or no integration capabilities and limited potential for collaboration. Moreover, recent advances in cloud computing, Big Data, and NoSQL have opened the door for new solutions in disaster data management. In this thesis, a Knowledge as a Service (KaaS) framework is proposed for disaster cloud data management (Disaster-CDM) with the objectives of 1) facilitating information gathering and sharing, 2) storing large amounts of disaster-related data from diverse sources, and 3) facilitating search and supporting interoperability and integration. Data are stored in a cloud environment taking advantage of NoSQL data stores. The proposed framework is generic, but this thesis focuses on the disaster management domain and data formats commonly present in that domain, i.e., file-style formats such as PDF, text, MS Office files, and images. The framework component responsible for addressing simulation models is SimOnto. SimOnto, as proposed in this work, transforms domain simulation models into an ontology-based representation with the goal of facilitating integration with other data sources, supporting simulation model querying, and enabling rule and constraint validation. Two case studies presented in this thesis illustrate the use of Disaster-CDM on the data collected during the Disaster Response Network Enabled Platform (DR-NEP) project. The first case study demonstrates Disaster-CDM integration capabilities by full-text search and querying services. In contrast to direct full-text search, Disaster-CDM full-text search also includes simulation model files as well as text contained in image files. Moreover, Disaster-CDM provides querying capabilities and this case study demonstrates how file-style data can be queried by taking advantage of a NoSQL document data store. The second case study focuses on simulation models and uses SimOnto to transform proprietary simulation models into ontology-based models which are then stored in a graph database. This case study demonstrates Disaster-CDM benefits by showing how simulation models can be queried and how model compliance with rules and constraints can be validated

    Uniformly Integrated Database Approach for Heterogenous Databases

    Get PDF
    The demands of more storage, scalability, commodity of heterogenous data for storing, analyzing and retrieving data are rapidly increasing in today data-centric area such as cloud computing, big data analytics, etc. These demands cannot be solely handled by relational database system (RDBMS) due to its strict relational model for scalability and adaptability. Therefore, NoSQL (Not only SQL) database called non-relational database is recently introduced to extend RDBMS, and now it is widely used in some software developments. As a result, it becomes challenges regarding how to transform relational to non-relational database or how to integrate them to achieve business purposes regarding storage and adaptability. This paper therefore proposes an approach for uniformly integrated database to integrate data separately extracted from individual database schema from relational and NoSQL database systems. We firstly try to map the data elements in terms of their semantic meaning and structures with the help of ontological semantic mapping and metamodeling from the extracted data. We then cover structural, semantical and syntactical diversity of each database schema and produce integrated database results. To prove efficiency and usefulness of our proposed system, we test our developed system with popular datasets in BSON and traditional sql format using MongoDB and MySQL database. According to the results compared with other proficient contemporary approaches, we have achieved significant results in mapping similarity results although running time and retrieval time are competitive with the others

    Collaborative knowledge as a service applied to the disaster management domain

    Get PDF
    Cloud computing offers services which promise to meet continuously increasing computing demands by using a large number of networked resources. However, data heterogeneity remains a major hurdle for data interoperability and data integration. In this context, a Knowledge as a Service (KaaS) approach has been proposed with the aim of generating knowledge from heterogeneous data and making it available as a service. In this paper, a Collaborative Knowledge as a Service (CKaaS) architecture is proposed, with the objective of satisfying consumer knowledge needs by integrating disparate cloud knowledge through collaboration among distributed KaaS entities. The NIST cloud computing reference architecture is extended by adding a KaaS layer that integrates diverse sources of data stored in a cloud environment. CKaaS implementation is domain-specific; therefore, this paper presents its application to the disaster management domain. A use case demonstrates collaboration of knowledge providers and shows how CKaaS operates with simulation models

    An ontology-based secure design framework for graph-based databases

    Get PDF
    Graph-based databases are concerned with performance and flexibility. Most of the existing approaches used to design secure NoSQL databases are limited to the final implementation stage, and do not involve the design of security and access control issues at higher abstraction levels. Ensuring security and access control for Graph-based databases is difficult, as each approach differs significantly depending on the technology employed. In this paper, we propose the first technology-ascetic framework with which to design secure Graph-based databases. Our proposal raises the abstraction level by using ontologies to simultaneously model database and security requirements together. This is supported by the TITAN framework, which facilitates the way in which both aspects are dealt with. The great advantages of our approach are, therefore, that it: allows database designers to focus on the simultaneous protection of security and data while ignoring the implementation details; facilitates the secure design and rapid migration of security rules by deriving specific security measures for each underlying technology, and enables database designers to employ ontology reasoning in order to verify whether the security rules are consistent. We show the applicability of our proposal by applying it to a case study based on a hospital data access control.This work has been developed within the AETHER-UA (PID2020-112540RB-C43), AETHER-UMA (PID2020-112540RB-C41) and AETHER-UCLM (PID2020-112540RB-C42), ALBA (TED2021-130355B-C31, TED2021-130355B-C33), PRESECREL (PID2021-124502OB-C42) projects funded by the “Ministerio de Ciencia e Innovación”, Andalusian PAIDI program with grant (P18-RT-2799) and the BALLADER Project (PROMETEO/2021/088) funded by the “Consellería de Innovación, Universidades, Ciencia Sociedad Digital”, Generalitat Valenciana

    Knowledge Guided Integration of Structured and Unstructured Data in Health Decision Process

    Get PDF
    Data in the health domain is continuously increasing. It is collected from several sources, has several formats and is characterized by its sensibility (protection of personal health data). These characteristics make the management and the expert interaction with the collected data, in order to facilitate decision-making in Health Information Systems (HIS) a challenging field. In this paper, we propose a Knowledge guided integration of structured and unstructured data for health decision process. The knowledge is represented by domain ontology, which allows the integration of structured and unstructured data, stored in NoSQL format. Our motivation is to combine the confirmed advantages of ontologies and NoSQL databases both in data integration and decision aided processes. The proposed ontology has been implemented and evaluated using quality metrics. The approach was evaluated and results show response time optimization, compared with traditional approaches, and improvement of data relevance

    Collaboration through Patient Data Access and sharing in the cloud

    Get PDF
    (c) 2014 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other users, including reprinting/ republishing this material for advertising or promotional purposes, creating new collective works for resale or redistribution to servers or lists, or reuse of any copyrighted components of this work in other works.There have been many socio-political and technological developments in the area of Electronic Patient Records (EPR). The technological aspects include EPR implemented using Online Transaction Processing (OTP) using Internet and Internet based systems, more recently via Cloud- Based systems (CBS) exploiting Cloud Service Models (CSM). Additionally, there are many socio-political considerations comprising: (1) political moves, including UK Government policy, which aims to deliver for patients 27/7 online access to their patient record, (2) considerations around ethical issues and informed permission and acceptance by the public and non-governmental organizations (NGO), (3) technological considerations about identification of suitable CBS and data structures in distributed systems characterized by unstructured data and, finally (4) sharing and collaboration as means of increasing efficiency, security, privacy, etc. In all, the aim is to provide professionals in medical domain with advanced platforms to not only access but also most importantly to share and collaborate at a wide scale level (e.g. National level). Addressing these aspects of EPR requires collaboration between all stakeholders in EPR, this paper considers these and concludes that such collaboration is essential if EPR are ever to become a reality.Peer ReviewedPostprint (author's final draft

    METADATA MANAGEMENT FOR CLINICAL DATA INTEGRATION

    Get PDF
    Clinical data have been continuously collected and growing with the wide adoption of electronic health records (EHR). Clinical data have provided the foundation to facilitate state-of-art researches such as artificial intelligence in medicine. At the same time, it has become a challenge to integrate, access, and explore study-level patient data from large volumes of data from heterogeneous databases. Effective, fine-grained, cross-cohort data exploration, and semantically enabled approaches and systems are needed. To build semantically enabled systems, we need to leverage existing terminology systems and ontologies. Numerous ontologies have been developed recently and they play an important role in semantically enabled applications. Because they contain valuable codified knowledge, the management of these ontologies, as metadata, also requires systematic approaches. Moreover, in most clinical settings, patient data are collected with the help of a data dictionary. Knowledge of the relationships between an ontology and a related data dictionary is important for semantic interoperability. Such relationships are represented and maintained by mappings. Mappings store how data source elements and domain ontology concepts are linked, as well as how domain ontology concepts are linked between different ontologies. While mappings are crucial to the maintenance of relationships between an ontology and a related data dictionary, they are commonly captured by CSV files with limits capabilities for sharing, tracking, and visualization. The management of mappings requires an innovative, interactive, and collaborative approach. Metadata management servers to organize data that describes other data. In computer science and information science, ontology is the metadata consisting of the representation, naming, and definition of the hierarchies, properties, and relations between concepts. A structural, scalable, and computer understandable way for metadata management is critical to developing systems with the fine-grained data exploration capabilities. This dissertation presents a systematic approach called MetaSphere using metadata and ontologies to support the management and integration of clinical research data through our ontology-based metadata management system for multiple domains. MetaSphere is a general framework that aims to manage specific domain metadata, provide fine-grained data exploration interface, and store patient data in data warehouses. Moreover, MetaSphere provides a dedicated mapping interface called Interactive Mapping Interface (IMI) to map the data dictionary to well-recognized and standardized ontologies. MetaSphere has been applied to three domains successfully, sleep domain (X-search), pressure ulcer injuries and deep tissue pressure (SCIPUDSphere), and cancer. Specifically, MetaSphere stores domain ontology structurally in databases. Patient data in the corresponding domains are also stored in databases as data warehouses. MetaSphere provides a powerful query interface to enable interaction between human and actual patient data. Query interface is a mechanism allowing researchers to compose complex queries to pinpoint specific cohort over a large amount of patient data. The MetaSphere framework has been instantiated into three domains successfully and the detailed results are as below. X-search is publicly available at https://www.x-search.net with nine sleep domain datasets consisting of over 26,000 unique subjects. The canonical data dictionary contains over 900 common data elements across the datasets. X-search has received over 1800 cross-cohort queries by users from 16 countries. SCIPUDSphere has integrated a total number of 268,562 records containing 282 ICD9 codes related to pressure ulcer injuries among 36,626 individuals with spinal cord injuries. IMI is publicly available at http://epi-tome.com/. Using IMI, we have successfully mapped the North American Association of Central Cancer Registries (NAACCR) data dictionary to the National Cancer Institute Thesaurus (NCIt) concepts
    • …
    corecore