Search CORE

35,542 research outputs found

Recommended from our members

Towards a Data Quality Framework for Heterogeneous Data

Author: Campean I. Felician
Habib Zadeh Esmaeil
Micic Natasha
Neagu Daniel
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 22/04/2017
Field of study

yesEvery industry has signiﬁcant data output as a product of their working process, and with the recent advent of big data mining and integrated data warehousing it is the case for a robust methodology for assessing the quality for sustainable and consistent processing. In this paper a review is conducted on Data Quality (DQ) in multiple domains in order to propose connections between their methodologies. This critical review suggests that within the process of DQ assessment of heterogeneous data sets, not often are they treated as separate types of data in need of an alternate data quality assessment framework. We discuss the need for such a directed DQ framework and the opportunities that are foreseen in this research area and propose to address it through degrees of heterogeneity

Bradford Scholars

A Review of Contemporary Data Quality Issues in Data Warehouse ETL Environment

Author: Jaiteg Singh
Rupali Gill
Publication venue: 'Chitkara University Publications'
Publication date: 30/12/2014
Field of study

In today’s scenario, Extraction–transformation– loading (ETL) tools have become important pieces of software responsible for integrating heterogeneous information from several sources. The task of carrying out the ETL process is potentially a complex, hard and time consuming. Organisations now –a-days are concerned about vast qualities of data. The data quality is concerned with technical issues in data warehouse environment. Research in last few decades has laid more stress on data quality issues in a data warehouse ETL process. The data quality can be ensured cleaning the data prior to loading the data into a warehouse. Since the data is collected from various sources, it comes in various formats. The standardization of formats and cleaning such data becomes the need of clean data warehouse environment. Data quality attributes like accuracy, correctness, consistency, timeliness are required for a Knowledge discovery process. The present state -of –the- art purpose of the research work is to deal on data quality issues at all the aforementioned stages of data warehousing 1) Data sources, 2) Data integration 3) Data staging, 4) Data warehouse modelling and schematic design and to formulate descriptive classification of these causes. The discovered knowledge is used to repair the data deficiencies. This work proposes a framework for quality of extraction transformation and loading of data into a warehouse

A family of experiments to validate measures for UML activity diagrams of ETL processes in data warehouses

Author: Mazón Jose Norberto
Muñoz Lilia
Trujillo Juan
Publication venue
Publication date: 11/01/2010
Field of study

In data warehousing, Extract, Transform, and Load (ETL) processes are in charge of extracting the data from the data sources that will be contained in the data warehouse. Their design and maintenance is thus a cornerstone in any data warehouse development project. Due to their relevance, the quality of these processes should be formally assessed early in the development in order to avoid populating the data warehouse with incorrect data. To this end, this paper presents a set of measures with which to evaluate the structural complexity of ETL process models at the conceptual level. This study is, moreover, accompanied by the application of formal frameworks and a family of experiments whose aim is to theoretical and empirically validate the proposed measures, respectively. Our experiments show that the use of these measures can aid designers to predict the effort associated with the maintenance tasks of ETL processes and to make ETL process models more usable. Our work is based on Unified Modeling Language (UML) activity diagrams for modeling ETL processes, and on the Framework for the Modeling and Evaluation of Software Processes (FMESP) framework for the definition and validation of the measures.In data warehousing, Extract, Transform, and Load (ETL) processes are in charge of extracting the data from the data sources that will be contained in the data warehouse. Their design and maintenance is thus a cornerstone in any data warehouse development project. Due to their relevance, the quality of these processes should be formally assessed early in the development in order to avoid populating the data warehouse with incorrect data. To this end, this paper presents a set of measures with which to evaluate the structural complexity of ETL process models at the conceptual level. This study is, moreover, accompanied by the application of formal frameworks and a family of experiments whose aim is to theoretical and empirically validate the proposed measures, respectively. Our experiments show that the use of these measures can aid designers to predict the effort associated with the maintenance tasks of ETL processes and to make ETL process models more usable. Our work is based on Unified Modeling Language (UML) activity diagrams for modeling ETL processes, and on the Framework for the Modeling and Evaluation of Software Processes (FMESP) framework for the definition and validation of the measures

Repositorio Institucional de la Universidad Tecnológica de Panamá

An i2b2-based, generalizable, open source, self-scaling chronic disease registry

Author: Bousvaros Athos
Ilowite Norman T
Inman Christi J
Mandl Kenneth D
Marsolo Keith
McMurry Andrew J
Natter Marc D
Ortiz David M
Quan Justin
Sandborg Christy I
Schanberg Laura E
Wallace Carol A
Warren Robert W
Weber Griffin M
Publication venue: 'BMJ'
Publication date: 11/03/2014
Field of study

Objective: Registries are a well-established mechanism for obtaining high quality, disease-specific data, but are often highly project-specific in their design, implementation, and policies for data use. In contrast to the conventional model of centralized data contribution, warehousing, and control, we design a self-scaling registry technology for collaborative data sharing, based upon the widely adopted Integrating Biology & the Bedside (i2b2) data warehousing framework and the Shared Health Research Information Network (SHRINE) peer-to-peer networking software. Materials and methods Focusing our design around creation of a scalable solution for collaboration within multi-site disease registries, we leverage the i2b2 and SHRINE open source software to create a modular, ontology-based, federated infrastructure that provides research investigators full ownership and access to their contributed data while supporting permissioned yet robust data sharing. We accomplish these objectives via web services supporting peer-group overlays, group-aware data aggregation, and administrative functions. Results: The 56-site Childhood Arthritis & Rheumatology Research Alliance (CARRA) Registry and 3-site Harvard Inflammatory Bowel Diseases Longitudinal Data Repository now utilize i2b2 self-scaling registry technology (i2b2-SSR). This platform, extensible to federation of multiple projects within and between research networks, encompasses >6000 subjects at sites throughout the USA. Discussion We utilize the i2b2-SSR platform to minimize technical barriers to collaboration while enabling fine-grained control over data sharing. Conclusions: The implementation of i2b2-SSR for the multi-site, multi-stakeholder CARRA Registry has established a digital infrastructure for community-driven research data sharing in pediatric rheumatology in the USA. We envision i2b2-SSR as a scalable, reusable solution facilitating interdisciplinary research across diseases

Big Data guided Digital Petroleum Ecosystems for Visual Analytics and Knowledge Management

Author: Mani Neel
Nimmagadda Shastri
Ochan Andrew
Reiners Torsten
Publication venue
Publication date: 01/01/2021
Field of study

The North West Shelf (NWS) interpreted as a Total Petroleum System (TPS), is Super Westralian Basin with active onshore and offshore basins through which shelf, - slope and deep-oceanic geological events are construed. In addition to their data associativity, TPS emerges with geographic connectivity through phenomena of digital petroleum ecosystem. The super basin has a multitude of sub-basins, each basin is associated with several petroleum systems and each system comprised of multiple oil and gas fields with either known or unknown areal extents. Such hierarchical ontologies make connections between attribute relationships of diverse petroleum systems. Besides, NWS has a scope of storing volumes of instances in a data-warehousing environment to analyse and motivate to create new business opportunities. Furthermore, the big exploration data, characterized as heterogeneous and multidimensional, can complicate the data integration process, precluding interpretation of data views, drawn from TPS metadata in new knowledge domains. The research objective is to develop an integrated framework that can unify the exploration and other interrelated multidisciplinary data into a holistic TPS metadata for visualization and valued interpretation. Petroleum digital ecosystem is prototyped as a digital oil field solution, with multitude of big data tools. Big data associated with elements and processes of petroleum systems are examined using prototype solutions. With conceptual framework of Digital Petroleum Ecosystems and Technologies (DPEST), we manage the interconnectivity between diverse petroleum systems and their linked basins. The ontology-based data warehousing and mining articulations ascertain the collaboration through data artefacts, the coexistence between different petroleum systems and their linked oil and gas fields that benefit the explorers. The connectivity between systems further facilitates us with presentable exploration data views, improvising visualization and interpretation. The metadata with meta-knowledge in diverse knowledge domains of digital petroleum ecosystems ensures the quality of untapped reservoirs and their associativity between Westralian basins

espace@Curtin

Eco-efficient supply chain networks: Development of a design framework and application to a real case study

Author: Colicchia Claudia
Creazza Alessandro
Dallari Fabrizio
Melacini Marco
Publication venue: 'Informa UK Limited'
Publication date: 22/10/2015
Field of study

© 2015 Taylor & Francis. This paper presents a supply chain network design framework that is based on multi-objective mathematical programming and that can identify 'eco-efficient' configuration alternatives that are both efficient and ecologically sound. This work is original in that it encompasses the environmental impact of both transportation and warehousing activities. We apply the proposed framework to a real-life case study (i.e. Lindt & Sprüngli) for the distribution of chocolate products. The results show that cost-driven network optimisation may lead to beneficial effects for the environment and that a minor increase in distribution costs can be offset by a major improvement in environmental performance. This paper contributes to the body of knowledge on eco-efficient supply chain design and closes the missing link between model-based methods and empirical applied research. It also generates insights into the growing debate on the trade-off between the economic and environmental performance of supply chains, supporting organisations in the eco-efficient configuration of their supply chains

Archivio istituzionale della ricerca - Politecnico di Milano