635,048 research outputs found

    Mining data quality rules based on T-dependence

    Get PDF
    Since their introduction in 1976, edit rules have been a standard tool in statistical analysis. Basically, edit rules are a compact representation of non-permitted combinations of values in a dataset. In this paper, we propose a technique to automatically find edit rules by use of the concept of T-dependence. We first generalize the traditional notion of lift, to that of T-lift, where stochastic independence is generalized to T-dependence. A combination of values is declared as an edit rule under a t-norm T if there is a strong negative correlation under T-dependence. We show several interesting properties of this approach. In particular, we show that under the minimum t-norm, edit rules can be computed efficiently by use of frequent pattern trees. Experimental results show that there is a weak to medium correlation in the rank order of edit rules obtained under T_M and T_P, indicating that the semantics of these kinds of dependencies are different

    The LSST Data Mining Research Agenda

    Full text link
    We describe features of the LSST science database that are amenable to scientific data mining, object classification, outlier identification, anomaly detection, image quality assurance, and survey science validation. The data mining research agenda includes: scalability (at petabytes scales) of existing machine learning and data mining algorithms; development of grid-enabled parallel data mining algorithms; designing a robust system for brokering classifications from the LSST event pipeline (which may produce 10,000 or more event alerts per night); multi-resolution methods for exploration of petascale databases; indexing of multi-attribute multi-dimensional astronomical databases (beyond spatial indexing) for rapid querying of petabyte databases; and more.Comment: 5 pages, Presented at the "Classification and Discovery in Large Astronomical Surveys" meeting, Ringberg Castle, 14-17 October, 200

    Publications on Chronic Disease in Coal Dependent Communities in Central Appalachia

    Full text link
    CONTEXT: Agency and nonprofit reports have traditionally been the source of health information in Appalachia. Recently, publications have appeared in the literature associating coal mining, specifically mountain top mining, with numerous chronic health conditions spurring debate among environmental and industry interest groups. Publication quantity and quality were objectively assessed. This article reports on a literature review and analysis of publications on chronic disease in coal dependent communities in Appalachia. OBJECTIVE: To conduct a review and analysis of original, peer reviewed research publications on chronic health conditions in communities dependent on coal mining with a focus on central Appalachia and report on publication and research quantity and quality. DATA SOURCES: Thorough searches were conducted using PubMed, EBSCO, and CiNAHL computerized databases to identify original, peer-reviewed research articles addressing ‘Appalachia’, ‘health’ and ‘coal’. STUDY SELECTION: The computerized database search identified original research publications relevant to chronic health conditions (heart disease, lung disease, kidney disease, cancers, diabetes, obesity, etc.) and coal mining in central Appalachia. DATA EXTRACTION: Quantitative measures of the literature review provided information on author collaborations, year of publication, frequency of publication by contributing authors, etc. Journal impact factors were noted and other objective qualitative criteria were considered. DATA SYNTHESIS: Over 60 publications relevant to mining with 38 publications specific to Appalachia and health were identified. The publications were reviewed relative to relevance and article quality i.e., current, original research, application to central Appalachia and discussions of chronic human health and coal mining. Over the past five years most of the publications relevant to chronic disease and coal mining in central Appalachia resulted from a research group with a single common author. CONCLUSIONS: Science based evidence is needed and data must be provided by independent researchers from various disciplines of study to share different perspectives on how to alleviate the longstanding health disparities in central Appalachia. Studies will require the application of sound methodologies to validate the findings and support future interventions
    • …
    corecore