5 research outputs found

    Model-Driven Component Generation for Families of Completeness Measures

    Get PDF
    Completeness is a well-understood dimension of data quality. In particular, measures of coverage can be used to assess the completeness of a data source, relative to some universe, for instance a collection of reference databases. We observe that this definition is inherently and implicitly multidimensional: in principle, one can compute measures of coverage that are expressed as a combination of subset of the attributes in the data source schema. This generalization can be useful in several application domains, notably in the life sciences. This leads to the idea of domain-specic families of completeness measures that users can choose from. Furthermore, individuals in the family can be specified as OLAP-type queries on a dimensional schema. In this paper we describe an initial data architecture to support and validate the idea, and show how dimensional completeness measures can be supported in practice by extending the Quality View model [11]

    An Assessment Of Open Data Sets Completeness

    Get PDF
    The rapid growth of open data sources is driven by free-of-charge contents and ease of accessibility. While it is convenient for public data consumers to use data sets extracted from open data sources, the decision to use these data sets should be based on data sets’ quality. Several data quality dimensions such as completeness, accuracy, and timeliness are common requirements to make data fit for use. More importantly, in many cases, high-quality data sets are desirable in ensuring reliable outcomes of reports and analytics. Even though many open data sources provide data quality guidelines, the responsibility to ensure data of high quality requires commitment from data contributors. In this paper, an initial investigation on the quality of open data sets in terms of completeness dimension was conducted. In particular, the results of the missing values in 20 open data sets measurement were extracted from the open data sources. The analysis covered all the missing values representations which are not limited to nulls or blank spaces. The results exhibited a range of missing values ratios that indicated the level of the data sets completeness. The limited coverage of this analysis does not hinder understanding of the current level of data completeness of open data sets. The findings may motivate open data providers to design initiatives that will empower data quality policy and guidelines for data contributors. In addition, this analysis may assist public data users to decide on the acceptability of open data sets by applying the simple methods proposed in this paper or performing data cleaning actions to improve the completeness of the data sets concerne

    Model-driven component generation for families of completeness measures

    No full text
    Completeness is a well-understood dimension of data quality. In particular, measures of coverage can be used to assess the completeness of a data source, relative to some universe, for instance a collection of reference databases. We dimensional: in principle, one can compute measures of coverage that are expressed as a combination of subset of the attributes in the data source schema. This generalization can be useful in several application domains, notably in the ilies of completeness measures that users can choose from. OLAP-type queries on a dimensional schema. In this paper we describe an initial data architecture to support and validate the idea, and show how dimensional completeness measure's can be supported in practice by extending the Quality View model [11].</p

    Model-driven component generation for families of completeness measures

    No full text
    Completeness is a well-understood dimension of data quality. In particular, measures of coverage can be used to assess the completeness of a data source, relative to some universe, for instance a collection of reference databases. We dimensional: in principle, one can compute measures of coverage that are expressed as a combination of subset of the attributes in the data source schema. This generalization can be useful in several application domains, notably in the ilies of completeness measures that users can choose from. OLAP-type queries on a dimensional schema. In this paper we describe an initial data architecture to support and validate the idea, and show how dimensional completeness measure's can be supported in practice by extending the Quality View model [11].</p

    Model-driven component generation for families of completeness measures

    No full text
    Completeness is a well-understood dimension of data quality. In particular, measures of coverage can be used to assess the completeness of a data source, relative to some universe, for instance a collection of reference databases. We dimensional: in principle, one can compute measures of coverage that are expressed as a combination of subset of the attributes in the data source schema. This generalization can be useful in several application domains, notably in the ilies of completeness measures that users can choose from. OLAP-type queries on a dimensional schema. In this paper we describe an initial data architecture to support and validate the idea, and show how dimensional completeness measure's can be supported in practice by extending the Quality View model [11].</p
    corecore