Search CORE

635,048 research outputs found

Mining data quality rules based on T-dependence

Author: Boeckling Toon
Bronselaer Antoon
De Tré Guy
Publication venue: 'Atlantis Press'
Publication date: 01/01/2019
Field of study

Since their introduction in 1976, edit rules have been a standard tool in statistical analysis. Basically, edit rules are a compact representation of non-permitted combinations of values in a dataset. In this paper, we propose a technique to automatically find edit rules by use of the concept of T-dependence. We first generalize the traditional notion of lift, to that of T-lift, where stochastic independence is generalized to T-dependence. A combination of values is declared as an edit rule under a t-norm T if there is a strong negative correlation under T-dependence. We show several interesting properties of this approach. In particular, we show that under the minimum t-norm, edit rules can be computed efficiently by use of frequent pattern trees. Experimental results show that there is a weak to medium correlation in the rank order of edit rules obtained under T_M and T_P, indicating that the semantics of these kinds of dependencies are different

Crossref

Ghent University Academic Bibliography

Recommended from our members

New advances in aircraft MRO services: data mining enhancement

Author: Gulliver Stephen
Ke Lisheng
Tang Yinshan
Yu Jun
Publication venue
Publication date: 01/10/2011
Field of study

Aircraft Maintenance, Repair and Overhaul (MRO) agencies rely largely on row-data based quotation systems to select the best suppliers for the customers (airlines). The data quantity and quality becomes a key issue to determining the success of an MRO job, since we need to ensure we achieve cost and quality benchmarks. This paper introduces a data mining approach to create an MRO quotation system that enhances the data quantity and data quality, and enables significantly more precise MRO job quotations. Regular Expression was utilized to analyse descriptive textual feedback (i.e. engineer’s reports) in order to extract more referable highly normalised data for job quotation. A text mining based key influencer analysis function enables the user to proactively select sub-parts, defects and possible solutions to make queries more accurate. Implementation results show that system data would improve cost quotation in 40% of MRO jobs, would reduce service cost without causing a drop in service quality

Central Archive at the University of Reading

The LSST Data Mining Research Agenda

Author: A. Szalay
Coryn A.L. Bailer-Jones
I. Davidson
J. A. Tyson
J. Becla
K. Borne
Publication venue: 'AIP Publishing'
Publication date: 01/01/2008
Field of study

We describe features of the LSST science database that are amenable to scientific data mining, object classification, outlier identification, anomaly detection, image quality assurance, and survey science validation. The data mining research agenda includes: scalability (at petabytes scales) of existing machine learning and data mining algorithms; development of grid-enabled parallel data mining algorithms; designing a robust system for brokering classifications from the LSST event pipeline (which may produce 10,000 or more event alerts per night); multi-resolution methods for exploration of petascale databases; indexing of multi-attribute multi-dimensional astronomical databases (beyond spatial indexing) for rapid querying of petabyte databases; and more.Comment: 5 pages, Presented at the "Classification and Discovery in Large Astronomical Surveys" meeting, Ringberg Castle, 14-17 October, 200

arXiv.org e-Print Archive

Crossref

Publications on Chronic Disease in Coal Dependent Communities in Central Appalachia

Author: Meacham , Susan, PhD
Metzger Diana
Sukpraput , Suporn, PhD
Taber Thomas, OMS I
Publication venue: Digital Scholarship@UNLV
Publication date: 09/01/2013
Field of study

CONTEXT: Agency and nonprofit reports have traditionally been the source of health information in Appalachia. Recently, publications have appeared in the literature associating coal mining, specifically mountain top mining, with numerous chronic health conditions spurring debate among environmental and industry interest groups. Publication quantity and quality were objectively assessed. This article reports on a literature review and analysis of publications on chronic disease in coal dependent communities in Appalachia. OBJECTIVE: To conduct a review and analysis of original, peer reviewed research publications on chronic health conditions in communities dependent on coal mining with a focus on central Appalachia and report on publication and research quantity and quality. DATA SOURCES: Thorough searches were conducted using PubMed, EBSCO, and CiNAHL computerized databases to identify original, peer-reviewed research articles addressing ‘Appalachia’, ‘health’ and ‘coal’. STUDY SELECTION: The computerized database search identified original research publications relevant to chronic health conditions (heart disease, lung disease, kidney disease, cancers, diabetes, obesity, etc.) and coal mining in central Appalachia. DATA EXTRACTION: Quantitative measures of the literature review provided information on author collaborations, year of publication, frequency of publication by contributing authors, etc. Journal impact factors were noted and other objective qualitative criteria were considered. DATA SYNTHESIS: Over 60 publications relevant to mining with 38 publications specific to Appalachia and health were identified. The publications were reviewed relative to relevance and article quality i.e., current, original research, application to central Appalachia and discussions of chronic human health and coal mining. Over the past five years most of the publications relevant to chronic disease and coal mining in central Appalachia resulted from a research group with a single common author. CONCLUSIONS: Science based evidence is needed and data must be provided by independent researchers from various disciplines of study to share different perspectives on how to alleviate the longstanding health disparities in central Appalachia. Studies will require the application of sound methodologies to validate the findings and support future interventions

University of Nevada, Las Vegas Repository