183 research outputs found

    Data governance in the health industry: investigating data quality dimensions within a big data context

    Get PDF
    In the health industry, the use of data (including Big Data) is of growing importance. The term ‘Big Data’ characterizes data by its volume, and also by its velocity, variety, and veracity. Big Data needs to have effective data governance, which includes measures to manage and control the use of data and to enhance data quality, availability, and integrity. The type and description of data quality can be expressed in terms of the dimensions of data quality. Well-known dimensions are accuracy, completeness, and consistency, amongst others. Since data quality depends on how the data is expected to be used, the most important data quality dimensions depend on the context of use and industry needs. There is a lack of current research focusing on data quality dimensions for Big Data within the health industry; this paper, therefore, investigates the most important data quality dimensions for Big Data within this context. An inner hermeneutic cycle research approach was used to review relevant literature related to data quality for big health datasets in a systematic way and to produce a list of the most important data quality dimensions. Based on a hierarchical framework for organizing data quality dimensions, the highest ranked category of dimensions was determined

    Data governance in the health industry: investigating data quality dimensions within a big data context

    Get PDF
    In the health industry, the use of data (including Big Data) is of growing importance. The term ‘Big Data’ characterizes data by its volume, and also by its velocity, variety, and veracity. Big Data needs to have effective data governance, which includes measures to manage and control the use of data and to enhance data quality, availability, and integrity. The type and description of data quality can be expressed in terms of the dimensions of data quality. Well-known dimensions are accuracy, completeness, and consistency, amongst others. Since data quality depends on how the data is expected to be used, the most important data quality dimensions depend on the context of use and industry needs. There is a lack of current research focusing on data quality dimensions for Big Data within the health industry; this paper, therefore, investigates the most important data quality dimensions for Big Data within this context. An inner hermeneutic cycle research approach was used to review relevant literature related to data quality for big health datasets in a systematic way and to produce a list of the most important data quality dimensions. Based on a hierarchical framework for organizing data quality dimensions, the highest ranked category of dimensions was determined

    BIG DATA MINING TOOLS FOR UNSTRUCTURED DATA: A REVIEW

    Get PDF
    Big data is a buzzword that is used for a large size data which includes structured data, semi-structured data and unstructured data. The size of big data is so large, that it is nearly impossible to collect, process and store data using traditional database management system and software techniques. Therefore, big data requires different approaches and tools to analyze data. The process of collecting, storing and analyzing large amount of data to find unknown patterns is called as big data analytics. The information and patterns found by the analysis process is used by large enterprise and companies to get deeper knowledge and to make better decision in faster way to get advantage over competition. So, better techniques and tools must be developed to analyze and process big data. Big data mining is used to extract useful information from large datasets which is mostly unstructured data. Unstructured data is data that has no particular structure, it can be any form. Today, storage of high dimensional data has no standard structure or schema, because of this problem has risen. This paper gives an overview of big data sources, challenges, scope and unstructured data mining techniques that can be used for big data

    A Cloud-to-Edge Approach to Support Predictive Analytics in Robotics Industry

    Get PDF
    Data management and processing to enable predictive analytics in cyber physical systems holds the promise of creating insight over underlying processes, discovering anomalous behaviours and predicting imminent failures threatening a normal and smooth production process. In this context, proactive strategies can be adopted, as enabled by predictive analytics. Predictive analytics in turn can make a shift in traditional maintenance approaches to more effective optimising their cost and transforming maintenance from a necessary evil to a strategic business factor. Empowered by the aforementioned points, this paper discusses a novel methodology for remaining useful life (RUL) estimation enabling predictive maintenance of industrial equipment using partial knowledge over its degradation function and the parameters that are affecting it. Moreover, the design and prototype implementation of a plug-n-play end-to-end cloud architecture, supporting predictive maintenance of industrial equipment is presented integrating the aforementioned concept as a service. This is achieved by integrating edge gateways, data stores at both the edge and the cloud, and various applications, such as predictive analytics, visualization and scheduling, integrated as services in the cloud system. The proposed approach has been implemented into a prototype and tested in an industrial use case related to the maintenance of a robotic arm. Obtained results show the effectiveness and the efficiency of the proposed methodology in supporting predictive analytics in the era of Industry 4.0

    Investigating Bias in Facial Analysis Systems: A Systematic Review

    Get PDF
    © 2013 IEEE. Recent studies have demonstrated that most commercial facial analysis systems are biased against certain categories of race, ethnicity, culture, age and gender. The bias can be traced in some cases to the algorithms used and in other cases to insufficient training of algorithms, while in still other cases bias can be traced to insufficient databases. To date, no comprehensive literature review exists which systematically investigates bias and discrimination in the currently available facial analysis software. To address the gap, this study conducts a systematic literature review (SLR) in which the context of facial analysis system bias is investigated in detail. The review, involving 24 studies, additionally aims to identify (a) facial analysis databases that were created to alleviate bias, (b) the full range of bias in facial analysis software and (c) algorithms and techniques implemented to mitigate bias in facial analysis

    Runtime model checking for sla compliance monitoring and qos prediction

    Get PDF
    Sophisticated workflows, where multiple parties cooperate towards the achievement of a shared goal are today common. In a market-oriented setup, it is key that effective mechanisms be available for providing accountability within the business process. The challenge is to be able to continuously monitor the progress of the business process, ideally,anticipating contract breaches and triggering corrective actions. In this paper we propose a novel QoS prediction approach which combines runtime monitoring of the real system with probabilistic model-checking on a parametric system model. To cope with the huge amount of data generated by the monitored system, while ensuring that parameters are extracted in a timing fashion, we relied on big data analytics solutions. To validate the proposed approach, a prototype of the QoS prediction framework has been developed, and an experimental campaign has been conducted with respect to a case study in the field of Smart Grids

    A New Big Data Benchmark for OLAP Cube Design Using Data Pre-Aggregation Techniques

    Get PDF
    In recent years, several new technologies have enabled OLAP processing over Big Data sources. Among these technologies, we highlight those that allow data pre-aggregation because of their demonstrated performance in data querying. This is the case of Apache Kylin, a Hadoop based technology that supports sub-second queries over fact tables with billions of rows combined with ultra high cardinality dimensions. However, taking advantage of data pre-aggregation techniques to designing analytic models for Big Data OLAP is not a trivial task. It requires very advanced knowledge of the underlying technologies and user querying patterns. A wrong design of the OLAP cube alters significantly several key performance metrics, including: (i) the analytic capabilities of the cube (time and ability to provide an answer to a query), (ii) size of the OLAP cube, and (iii) time required to build the OLAP cube. Therefore, in this paper we (i) propose a benchmark to aid Big Data OLAP designers to choose the most suitable cube design for their goals, (ii) we identify and describe the main requirements and trade-offs for effectively designing a Big Data OLAP cube taking advantage of data pre-aggregation techniques, and (iii) we validate our benchmark in a case study.This work has been funded by the ECLIPSE project (RTI2018-094283-B-C32) from the Spanish Ministry of Science, Innovation and Universities

    Real-time Intraday Traffic Volume Forecasting – A Hybrid Application Using Singular Spectrum Analysis and Artificial Neural Networks

    Get PDF
    The present paper provides a comparative evaluation of hybrid Singular Spectrum Analysis (SSA) and Artificial Neural Networks (ANN) against conventional ANN, applied on real time intraday traffic volume forecasting. The main research objective was to assess the applicability and functionality of intraday traffic volume forecasting, based on toll station measurements. The proposed methodology was implemented and evaluated upon a custom developed forecasting software toolbox, based on the software Mathworks MatLab, by using real data from Iasmos-Greece toll station. Experimental results demonstrated a superior ex post forecasting accuracy of the proposed hybrid forecasting methodology against conventional ANN, when compared to performance of usual statistical criteria (Mean Absolute Error, Mean Squared Error, Root Mean Squared Error, Coefficient of Determination R2, Theil's inequality coefficient). The obtained results revealed that the hybrid model could advance forecasting accuracy of a conventional ANN model in intraday traffic volume forecasting, while embedding hybrid forecasting algorithm in an Intelligent Transport System could provide an advanced decision support module for transportation system maintenance, operation and management

    A review of the internet of floods : near real-time detection of a flood event and its impact

    Get PDF
    Worldwide, flood events frequently have a dramatic impact on urban societies. Time is key during a flood event in order to evacuate vulnerable people at risk, minimize the socio-economic, ecologic and cultural impact of the event and restore a society from this hazard as quickly as possible. Therefore, detecting a flood in near real-time and assessing the risks relating to these flood events on the fly is of great importance. Therefore, there is a need to search for the optimal way to collect data in order to detect floods in real time. Internet of Things (IoT) is the ideal method to bring together data of sensing equipment or identifying tools with networking and processing capabilities, allow them to communicate with one another and with other devices and services over the Internet to accomplish the detection of floods in near real-time. The main objective of this paper is to report on the current state of research on the IoT in the domain of flood detection. Current trends in IoT are identified, and academic literature is examined. The integration of IoT would greatly enhance disaster management and, therefore, will be of greater importance into the future
    • …
    corecore