42 research outputs found

    Developing Virtual Data Warehouse for Rehabilitation Registry in Sabah, Borneo: Towards Big Data Analytics and Geomapping

    Get PDF
    Clinical registry, defined as an organised system for the collection, storage, retrieval, analysis, and dissemination of information on individuals with a condition that predisposes to the occurrence of a health-related event, are designed through data repository or data warehouse. Data repository is described as a real-time database that consolidates data from a variety of clinical sources that offers a comprehensive source for storage and retrieval of relevant clinical information needed. However, data warehouse is a data repository that concentrates on data queries and data analytics. Rehabilitation registry in Malaysia is still at its infancy with lack of data sharing and integration. As rehabilitation is a subspecialty concerned with the prevention, diagnosis, and rehabilitation of disabling conditions, a registry would allow identification of patients’ demographics, clinical and functional outcomes improvement, benchmarking the delivery of rehabilitation services, and research purposes. The application of virtual data warehouse, cloud computing, big data analytics and geomapping for clinical registries have been implemented well in countries like China and United Kingdom. The main objectives of this research-in-progress paper are to demonstrate the feasibility of developing and designing virtual data warehouse framework based on cloud computing technology, in an attempt towards big data analytics and geomapping implementation for inpatient rehabilitation registry in Sabah, Malaysia

    BigDedup: a Big Data Integration toolkit for Duplicate Detection in Industrial Scenarios

    Get PDF
    Duplicate detection aims to identify different records in data sources that refers to the same real-world entity. It is a fundamental task for: item catalogs fusion, customer databases integration, fraud detection, and more. In this work we present BigDedup, a toolkit able to detect duplicate records on Big Data sources in an efficient manner. BigDedup makes available the state-of-the-art duplicate detection techniques on Apache Spark, a modern framework for distributed computing in Big Data scenarios. It can be used in two different ways: (i) through a simple graphic interface that permit the user to process structured and unstructured data in a fast and effective way; (ii) as a library that provides different components that can be easily extended and customized. In the paper we show how to use BigDedup and its usefulness through some industrial examples

    Perancangan Arsitektur Educational Data Warehouse Dengan Pemodelan Intelligent Multidimensional

    Get PDF
    Educational Data Warehouse (EDW) merupakan sistem yang dibangun sebagai tempat pengarsipan data historis suatu organisasi kependidikan, data historis ini dapat dipergunakan sebagai sumber penggalian data (data mining), analisis, peramalan, business intelligence (BI), OLAP, dan lainnya. Sekolah Tinggi Teknik Ibnu Sina Batam telah menggunakan sistem informasi pada setiap lini aktifitasnya, sistem informasi tersebut menggunakan basis data relasional. Akan tetapi banyak permasalahan-permasalahan analisis di level pengambil keputusan yang tidak dapat diakomodir oleh penggunaan sistem informasi dan basis data relasional tersebut maka dibutuhkan adanya EDW. Umumnya basisdata EDW terpisah dengan basisdata operasional sehingga tidak menganggu kinerja kegiatan operasional. Penelitian ini bertujuan untuk merancang arsitektur EDW dengan pendekatan pemodelan dimensional, ruang lingkup permasalahan yang diteliti meliputi seluruh aktifitas akademik di STT Ibnu Sina Batam. Diharapkan dengan adanya arsitektur EDW ini dapat dipergunakan sebagai acuan cetak biru implementasi sistem EDW di masa yang akan datang

    Case-based decision support system for breast cancer management

    Get PDF
    Breast cancer is identified as the most common type of cancer in women worldwide with 1.6 million women around the world diagnosed every year. This prompts many active areas of research in identifying better ways to prevent, detect, and treat breast cancer. DESIREE is a European Union funded project, which aims at developing a web-based software ecosystem for the multidisciplinary management of primary breast cancer. The development of an intelligent clinical decision support system offering various modalities of decision support is one of the key objectives of the project. This paper explores case-based reasoning as a problem solving paradigm and discusses the use of an explicit domain knowledge ontology in the development of a knowledge-intensive case-based decision support system for breast cancer management

    A unified ontology-based data integration approach for the internet of things

    Get PDF
    Data integration enables combining data from various data sources in a standard format. Internet of things (IoT) applications use ontology approaches to provide a machine-understandable conceptualization of a domain. We propose a unified ontology schema approach to solve all IoT integration problems at once. The data unification layer maps data from different formats to data patterns based on the unified ontology model. This paper proposes a middleware consisting of an ontology-based approach that collects data from different devices. IoT middleware requires an additional semantic layer for cloud-based IoT platforms to build a schema for data generated from diverse sources. We tested the proposed model on real data consisting of approximately 160,000 readings from various sources in different formats like CSV, JSON, raw data, and XML. The data were collected through the file transfer protocol (FTP) and generated 960,000 resource description framework (RDF) triples. We evaluated the proposed approach by running different queries on different machines on SPARQL protocol and RDF query language (SPARQL) endpoints to check query processing time, validation of integration, and performance of the unified ontology model. The average response time for query execution on generated RDF triples on the three servers were approximately 0.144 seconds, 0.070 seconds, 0.062 seconds, respectively

    DTRM: A new reputation mechanism to enhance data trustworthiness for high-performance cloud computing

    Get PDF
    This is the author accepted manuscript. The final version is available from Elsevier via the DOI in this record.Cloud computing and the mobile Internet have been the two most influential information technology revolutions, which intersect in mobile cloud computing (MCC). The burgeoning MCC enables the large-scale collection and processing of big data, which demand trusted, authentic, and accurate data to ensure an important but often overlooked aspect of big data - data veracity. Troublesome internal attacks launched by internal malicious users is one key problem that reduces data veracity and remains difficult to handle. To enhance data veracity and thus improve the performance of big data computing in MCC, this paper proposes a Data Trustworthiness enhanced Reputation Mechanism (DTRM) which can be used to defend against internal attacks. In the DTRM, the sensitivity-level based data category, Metagraph theory based user group division, and reputation transferring methods are integrated into the reputation query and evaluation process. The extensive simulation results based on real datasets show that the DTRM outperforms existing classic reputation mechanisms under bad mouthing attacks and mobile attacks.This work was supported by the National Natural Science Foundation of China (61602360, 61772008, 61472121), the Pilot Project of Fujian Province (formal industry key project) (2016Y0031), the Foundation of Science and Technology on Information Assurance Laboratory (KJ-14-109) and the Fujian Provincial Key Lab of Network Security and Cryptology Research Fund (15012)

    Ontology: Core Process Mining and Querying Enabling Tool

    Get PDF
    Ontology permits the addition of semantics to process models derived from mining the various data stored in many information systems. The ontological schema enables for automated querying and inference of useful knowledge from the different domain processes. Indeed, such conceptualization methods particularly ontologies for process management which is currently allied to semantic process mining trails to combine process models with ontologies, and are increasingly gaining attention in recent years. In view of that, this chapter introduces an ontology-based mining approach that makes use of concepts within the extracted event logs about domain processes to propose a method which allows for effective querying and improved analysis of the resulting models through semantic labelling (annotation), semantic representation (ontology) and semantic reasoning (reasoner). The proposed method is a semantic-based process mining approach that is able to induce new knowledge based on previously unobserved behaviours, and a more intuitive and easy way to represent and query the datasets and the discovered models compared to other standard logical procedures. To this end, the study claims that it is possible to apply effective reasoning methods to make inferences over a process knowledge-base (e.g. the learning process) that leads to automated discovery of learning patterns and/or behaviour
    corecore