30 research outputs found

    Effective corporate data quality management: systematic literature review

    Get PDF
    As the entire world is in the transition becoming more and more data-driven, the quality of the data has become a major issue for individuals, organizations, governments and societies. The vast amount of data created every day has created various business opportunities, but the opportunity to use the data still varies due to the quality problems of the data. The next crucial issue in creating a more intelligent society is to standardize and develop effective corporate data quality management. This thesis reviews previous studies on data quality management in order to study how an organization should manage its data quality. The focus is in business organizations, but the material reviewed consists of case studies from various organizations (e.g. military, government) indicating a society-wide issue. This research conducts a systematic literature review on the existing material on data quality and data quality management. The goal of the systematic literature review is to review the material so that the review can be repeated according to an existing criteria. Originating from natural sciences, the systematic literature review is meant to reduce the personal bias of the researchers and increase the thoroughness and critical assessment. Another method used in the study is snowball linking method. This study reviews the existing literature about managing strategic data assets. The focus points of the research are the definition and assessment of the organizational data quality, current issues in the data quality management, and data quality management. The results of the literature review are further discussed. The focus points of the discussion are the results, the possible limitations of the research and further study points.Koko maailman muuttuessa yhÀ enemmÀn datan ohjaamaksi datan laatu on noussut merkittÀvÀksi asiaksi henkilöille, organisaatioille, hallinnoille ja yhteiskunnille. Joka pÀivÀ luotu valtava datan mÀÀrÀ on luonut erilisia liikemahdollisuuksia, mutta data laadun ongelmat vaikuttavat suuresti mahdollisuuksiin kÀyttÀÀ dataa. Seuraava merkittÀvÀ asia ÀlykkÀÀmmÀn yhteiskunnan luomisessa on standardisoida ja kehittÀÀ tehokasta yrityksen datan laadunhallintaa. TÀmÀ Pro gradu-tutkielma tarkastelee aikaisemmin kirjoitettuja datan laadunhallinnan tutkimuksia selvittÀÀkseen miten organisaation tulisi hallinnoida datan laatua. Tutkielma keskittyy yrityksiin, mutta tutkittu materiaali koostuu tutkimuksista, joita on tehty mitÀ erilaisimmille organisaatioille, kuten esimerkiksi asevoimat ja hallitukset. TÀmÀ osoittaa ettÀ kyseessÀ on koko yhteiskuntaa koskettava ongelma. TÀssÀ tutkielmassa toteutetaan systemaattinen kirjallisuuskatsaus olemassa olevalle tutkimusmateriaalille datan laadusta ja sen hallinnasta. Systemaattisen kirjallisuuskatsauksen tarkoitus on tarkastella tutkimusmateriaalia niin ettÀ kirjallisuuskatsaus voidaan toisintaa mÀÀritettyjen kriteerien puitteessa. Systemaattinen kirjallisuuskatsaus tulee alun perin luonnontieteellisestÀ tutkimuksesta ja sen tarkoitus on vÀhentÀÀ tutkijoiden henkilökohtaisia ennakkoasenteita ja lisÀtÀ tutkimusmateriaalin kattavuutta ja kriittistÀ arviota. Toinen kÀytetty tutkimusmenetelmÀ on lumipallometodi. TÀmÀ tutkielma tarkastelee olemassa olevaa kirjallisuutta strategisen datan hallinnasta. Tutkimus keskittyy datan laadun mÀÀrittÀmiseen ja arviointiin, nykyisiin ongelmiin datan laadunhallinnassa ja malliin datan laadunhallintaan. Kirjallisuuskatsauksen tuloksista keskustellaan pidemmÀlle. Tutkielma keskittyy keskustelemaan tuloksista, tutkimuksen mahdollisista rajoitteista ja mahdollisista tulevaisuuden tutkimuskohteista

    Warehousing and Analyzing Streaming Data Quality Information

    Get PDF
    The development of integrative IS architectures focuses typically on solving problems related to the functionality of the system. It is attempted to design optimally flexible interfaces that can achieve the most agile architecture. The quality of the data that will be exchanged across these interfaces is often disregarded (implicitly or explicitly). This results in distributed applications which are functionally correct but cannot be deployed due to the low quality of the data involved. In order to avoid wrong business decisions due to ‘dirty data’, quality characteristics have to be captured, processed, and provided to the respective business task. However, the issue of how to efficiently provide applications with information about data quality is still an open research problem. Our approach tackles the problems posed by data quality deficiencies by presenting a novel concept to stream and warehouse data together with its describing data quality information

    A Framework for Classification of the Data and Information Quality Literature and Preliminart Results (1996-2007)

    Get PDF
    The value of management decisions, the security of our nation, and the very foundations of our business integrity are all dependent on the quality of data and information. However, the quality of the data and information is dependent on how that data or information will be used. This paper proposes a theory of data quality based on the five principles defined by J. M. Juran for product and service quality and extends Wang et al’s 1995 framework for data quality research. It then examines the data and information quality literature from journals within the context of this framework

    Evaluating the Semantic and Representational Consistency of Interconnected Structured and Unstructured Data

    Get PDF
    In this paper we present research in progress that has the aim of developing a set of data quality metrics for two aspects of the dimension of consistency, the semantic and representational aspects. In the literature metrics for these two aspects are relatively unexplored, especially in comparison with the data integrity aspect. Our goal is to apply these data quality metrics to interconnected structured and unstructured data. Because of the prevalence of unstructured data in organizations today, many strive for “content convergence” by interconnecting structured and unstructured data. The literature offers few data quality metrics for this type of data, despite the growing recognition of its potential value. We are developing our metrics in the context of data mining, and evaluating their utility using data mining outcomes in an economic context. If our metric development is successful, a well-defined economic utility function for data quality metrics can be of direct use to managers making decisions

    Calculating with Unreliable Data in Business Analytics Applications

    Get PDF
    The success of operational and managerial decisions depends on the reliability of the information provided to decision makers by the respective business analytics applications. Thus, in this research-in-progress paper, we explain how the mathematical foundations of the Algebra of Random Varia-bles (AoRV) can be used to extend the capability of business analytics applications to process and report unreliable data. First, we present the theoretical foundations of the AoRV in a concise way that is tailored to business analytics. Second, we present and discuss two example cases, in which we evaluate an application of the AoRV to real-world business analytics scenarios. Initial results from this first design-and-evaluate feedback loop show that the additional reliability information provided by the AoRV is of high value for decision makers, since it allows to predict how uncertain-ties in complex business analytics scenarios will interact. As the next step of this research project, we plan to test the potential of the AoRV to extend business analytics applications through another evaluation loop in a fully natural setting

    Assessing Accuracy with Locality-Sensitive Hashing in Multiple Source Environment

    Get PDF
    Accuracy assessment is a key issue in data quality management. Most of current studies focus on how to qualitatively analyze accuracy dimension and the analysis depends heavily on experts’ knowledge. Seldom work is given on how to automatically quantify accuracy dimension. Based on Jensen-Shannon Divergence (JSD) measure, we propose accuracy of data can be automatically quantified by comparing data with its entity’s most approximation in available context. To quickly identify most approximation in large scale data sources, Locality-Sensitive Hashing (LSH) is employed to extract most approximation at multiple levels, namely column, record and field level. Our approach can not only give each data source an objective accuracy score very quickly as long as context member is available but also avoid human’s laborious interaction. Theory and experiment show our approach performs well in achieving metadata on accuracy dimension

    A Model of Error Propagation in Satisficing Decisions and its Application to Database Quality Management

    Get PDF
    This study centers on the accuracy dimension of information quality and models the relationship between input accuracy and output accuracy in a popular class of applications. Such applications consist of dichotomous decisions or judgments that are implemented through conjunction of selected criteria. Initially, this paper introduces a model that designates a single decision rule which employs a single binary conjunction operation. This model is extended to handle multiple, related decision rules that consist of any number of binary conjunction operations. Finally, application of the extended model is illustrated through the example of an online hotel reservation database. This example demonstrates how the new model can be utilized for ranking and quantifying the damage that errors in different database attributes inflict. Numerical estimates of the model can be integrated into cost-benefit analyses that assess alternative data accuracy enhancements or process or system designs

    Data Collection Interfaces in Online Communities: The Impact of Data Structuredness and Nature of Shared Content on Perceived Information Quality

    Get PDF
    The growth of online communities has resulted in an increased availability of user-generated content (UGC). Given the varied sources of UGC, the quality of information it provides is a growing challenge. While many aspects of UGC have been studied, the role of data structures in gathering UGC and nature of to-be-shared content has yet to receive attention. UGC is created in online platforms with varying degrees of data structure, ranging from unstructured to highly-structured formats. These platforms are often designed without regard to how the structure of the input format impacts the quality of outcome. In this study, we investigate the impact of the degree of data structure on the perceived quality of information from the novel perspective of data creators. We also propose and evaluate a novel moderating effect due to the nature of content online users wish to share. The preliminary findings support our claims of the importance of these factors for information quality. We conclude the paper with directions for future research and expected contributions for theory and practice

    A risk based model for quantifying the impact of information quality

    Get PDF
    Information quality is one of the key determinants of information system success. When information quality is poor, it can cause a variety of risks in an organization. To manage resources for information quality improvement effectively, it is necessary to understand where, how, and how much information quality impacts an organization's ability to successfully deliver its objectives. So far, existing approaches have mostly focused on the measurement of information quality but not adequately on the impact that information quality causes. This paper presents a model to quantify the business impact that arises through poor information quality in an organization by using a risk based approach. It hence addresses the inherent uncertainty in the relationship between information quality and organizational impact. The model can help information managers to obtain quantitative figures which can be used to build reliable and convincing business cases for information quality improvement.EPSRCThis is the author accepted manuscript. The final version can be found on the publisher's website at: http://www.sciencedirect.com/science/article/pii/S0166361513002467 © 2013 Elsevier B.V. All rights reserved

    Strategic Decision Support For Information Protection: A Facilitation Framework For Small And Medium Enterprises

    Get PDF
    Information security seriously concerns Corporate America but the soaring cost on protecting information assets raises equal concerns. These concerns appear to be more threatening to the small and medium enterprises (SMEs) as the percentage of their IT budgets spent on information security protection sharply surpasses those percentages budgeted by large enterprises. In light of these concerns, we propose an integrated and attainable framework that could heuristically promote strategic decision thinking on protecting information assets for the SMEs.  In comparison to other approaches that aim at reaching an optimal decision through complex mathematical models, our framework requires no such computations. The goal of our approach is to help a SME reach such decisions with a framework that takes business, technological and managerial issues into account. The proposed framework fosters strategic thinking of security issues with simple and practical steps to achieve a balanced, consistent, and efficient protection with total involvement from all stakeholders of the information assets that need to be protected
    corecore