23 research outputs found

    Handling Failures in Data Quality Measures

    Get PDF
    Successful data quality (DQ) measure is importantfor many data consumers (or data guardians) to decide on theacceptability of data of concerned. Nevertheless, little is knownabout how “failures” of DQ measures can be handled by dataguardians in the presence of factor(s) that contributes to thefailures. This paper presents a review of failure handling mechanismsfor DQ measures. The failure factors faced by existing DQmeasures will be presented, together with the research gaps inrespect to failure handling mechanisms in DQ frameworks. Inparticular, by comparing existing DQ frameworks in terms of: theinputs used to measure DQ, the way DQ scores are computed andthey way DQ scores are stored, we identified failure factorsinherent within the frameworks. Understanding of how failurescan be handled will lead to the design of a systematic failurehandling mechanism for robust DQ measures

    A Framework for Classification of the Data and Information Quality Literature and Preliminart Results (1996-2007)

    Get PDF
    The value of management decisions, the security of our nation, and the very foundations of our business integrity are all dependent on the quality of data and information. However, the quality of the data and information is dependent on how that data or information will be used. This paper proposes a theory of data quality based on the five principles defined by J. M. Juran for product and service quality and extends Wang et al’s 1995 framework for data quality research. It then examines the data and information quality literature from journals within the context of this framework

    Handling Failures in Data Quality Measures

    Get PDF
    Successful data quality (DQ) measure is important for many data consumers (or data guardians) to decide on the acceptability of data of concerned. Nevertheless, little is known about how “failures” of DQ measures can be handled by data guardians in the presence of factor(s) that contributes to the failures. This paper presents a review of failure handling mechanisms for DQ measures. The failure factors faced by existing DQ measures will be presented, together with the research gaps in respect to failure handling mechanisms in DQ frameworks. We propose ways to maximise the situations in which data quality scores can be produced when factors that would cause the failure of currently proposed scoring mechanisms are present. By understanding how failures can be handled, a systematic failure handling mechanism for robust DQ measures can be designed

    Model-Driven Component Generation for Families of Completeness Measures

    Get PDF
    Completeness is a well-understood dimension of data quality. In particular, measures of coverage can be used to assess the completeness of a data source, relative to some universe, for instance a collection of reference databases. We observe that this definition is inherently and implicitly multidimensional: in principle, one can compute measures of coverage that are expressed as a combination of subset of the attributes in the data source schema. This generalization can be useful in several application domains, notably in the life sciences. This leads to the idea of domain-specic families of completeness measures that users can choose from. Furthermore, individuals in the family can be specified as OLAP-type queries on a dimensional schema. In this paper we describe an initial data architecture to support and validate the idea, and show how dimensional completeness measures can be supported in practice by extending the Quality View model [11]

    Viewpoints on emergent semantics

    Get PDF
    Authors include:Philippe Cudr´e-Mauroux, and Karl Aberer (editors), Alia I. Abdelmoty, Tiziana Catarci, Ernesto Damiani, Arantxa Illaramendi, Robert Meersman, Erich J. Neuhold, Christine Parent, Kai-Uwe Sattler, Monica Scannapieco, Stefano Spaccapietra, Peter Spyns, and Guy De Tr´eWe introduce a novel view on how to deal with the problems of semantic interoperability in distributed systems. This view is based on the concept of emergent semantics, which sees both the representation of semantics and the discovery of the proper interpretation of symbols as the result of a self-organizing process performed by distributed agents exchanging symbols and having utilities dependent on the proper interpretation of the symbols. This is a complex systems perspective on the problem of dealing with semantics. We highlight some of the distinctive features of our vision and point out preliminary examples of its applicatio

    A measure-theoretic foundation for data quality

    Get PDF

    Data and Information Quality Framework Development: Proposed for Indonesia Higher Education

    Get PDF
    The main objective of this research is to find solutions to improve the quality of data and information on higher education in Indonesia. The target is the availability of a structured and comprehensive Framework and strategy accompanied by manual book adoption and implementation, which is expected to encourage efforts to obtain quality data and information (accurate, complete, timely, and consistent) utilizing information technology in supporting governance good governance. In the long run, it is expected that all higher education institutions in Indonesia have an information system that is organized with valid and reliable data quality. This quality data and information is used for policy formulation, process clarity procedures, data cleansing approaches, adequate planning, and appropriate decision making, and management of higher education institution management at all levels and levels of management (institutions, faculties, departments and study programs) towards the realization of the national education industry in Indonesia. To achieve this goal, this study uses a mixed method approach, namely quantitative methods (National and national studies and surveys) and qualitative (case studies). Delphi studies were conducted over three rounds involving experts in data / information quality and higher education. National surveys are carried out using online questionnaires, while case studies are conducted with semi-structured interview techniques. Framework development is carried out by triangulation between Delpi findings data, studies and case studies. The research population is State and Private Universities (Universities, Institutions, Colleges, Academies) which are spread across all major islands in Indonesia (Sumatra, Java, Bali, Kalimantan, Sulawesi, Maluku and Papua)

    Datenunvollständigkeit aufgrund der mangelnden Modellierungsmächtigkeit aktuell dominierender Datenmodelle

    Get PDF
    In den am weitesten verbreiteten Datenmodellen, speziell dem relationalen Datenmodell, werden Informationen über die Ausprägungen einzelner Objekteigenschaften in Attributen gespeichert. In vielen Fällen (z.B. bei partiellen Informationen) ist eine Darstellung durch einzelne Elemente des dem Attribut zugehörigen Wertebereichs jedoch nicht möglich und erfordert die Anwendung spezieller Konzepte (z.B. Nullwerte). In aktuell verwendeten Modellen sind diese Konzepte jedoch nur unzureichend auf die notwendigen Erfordernisse ausgelegt. Die ursprünglich vorliegenden Informationen lassen sich daher oft nicht wieder aus den gespeicherten Daten zurückgewinnen. Bisherige Ansätze zur Behebung dieses Problems haben sich aus unterschiedlichen Gründen nicht durchsetzen können. Die hier beschriebene Arbeit enthält daher einen entsprechenden Vorschlag, der sowohl den Informationsverlust während der Datenspeicherung verringern als auch die Schwächen der bisherigen Lösungsansätze hinsichtlich eines fehlenden Durchsetzungsvermögen vermeiden soll. Ersteres wird durch eine Verwendung mehrerer Nullwerten ermöglicht, letzteres beruht hauptsächlich auf der Vermeidung gravierender Abweichungen von den aktuell vorherrschenden Modellen. Da dies wiederum eine Beibehaltung wichtiger und fundamentaler Konzepte erfordert, muss die Auswertung der verschiedenen Nullwerte in der dreiwertigen Logik erfolgen. Neben einer Kompatibilität zu den vorherrschenden Modellen, bietet dieses Vorgehen zudem die Vorteile einer geringen Modellkomplexität und ermöglicht eine intuitive Handhabung der auf dem neu entworfenen Modell basierenden Systeme

    Data analytics 2016: proceedings of the fifth international conference on data analytics

    Get PDF
    corecore