635,048 research outputs found
Mining data quality rules based on T-dependence
Since their introduction in 1976, edit rules have been a standard tool in statistical analysis. Basically, edit rules are a compact representation of non-permitted combinations of values in a dataset. In this paper, we propose a technique to automatically find edit rules by use of the concept of T-dependence. We first generalize the traditional notion of lift, to that of T-lift, where stochastic independence is generalized to T-dependence. A combination of values is declared as an edit rule under a t-norm T if there is a strong negative correlation under T-dependence. We show several interesting properties of this approach. In particular, we show that under the minimum t-norm, edit rules can be computed efficiently by use of frequent pattern trees. Experimental results show that there is a weak to medium correlation in the rank order of edit rules obtained under T_M and T_P, indicating that the semantics of these kinds of dependencies are different
Recommended from our members
New advances in aircraft MRO services: data mining enhancement
Aircraft Maintenance, Repair and Overhaul (MRO) agencies rely largely on row-data based quotation systems to select the best suppliers for the customers (airlines). The data quantity and quality becomes a key issue to determining the success of an MRO job, since we need to ensure we achieve cost and quality benchmarks. This paper introduces a data mining approach to create an MRO quotation system that enhances the data quantity and data quality, and enables significantly more precise MRO job quotations.
Regular Expression was utilized to analyse descriptive textual feedback (i.e. engineer’s reports) in order to extract more referable highly normalised data for job quotation. A text mining based key influencer analysis function enables the user to proactively select sub-parts, defects and possible solutions to make queries more accurate. Implementation results show that system data would improve cost quotation in 40% of MRO jobs, would reduce service cost without causing a drop in service quality
The LSST Data Mining Research Agenda
We describe features of the LSST science database that are amenable to
scientific data mining, object classification, outlier identification, anomaly
detection, image quality assurance, and survey science validation. The data
mining research agenda includes: scalability (at petabytes scales) of existing
machine learning and data mining algorithms; development of grid-enabled
parallel data mining algorithms; designing a robust system for brokering
classifications from the LSST event pipeline (which may produce 10,000 or more
event alerts per night); multi-resolution methods for exploration of petascale
databases; indexing of multi-attribute multi-dimensional astronomical databases
(beyond spatial indexing) for rapid querying of petabyte databases; and more.Comment: 5 pages, Presented at the "Classification and Discovery in Large
Astronomical Surveys" meeting, Ringberg Castle, 14-17 October, 200
Publications on Chronic Disease in Coal Dependent Communities in Central Appalachia
CONTEXT:
Agency and nonprofit reports have traditionally been the source of health information in Appalachia. Recently, publications have appeared in the literature associating coal mining, specifically mountain top mining, with numerous chronic health conditions spurring debate among environmental and industry interest groups. Publication quantity and quality were objectively assessed. This article reports on a literature review and analysis of publications on chronic disease in coal dependent communities in Appalachia.
OBJECTIVE:
To conduct a review and analysis of original, peer reviewed research publications on chronic health conditions in communities dependent on coal mining with a focus on central Appalachia and report on publication and research quantity and quality.
DATA SOURCES:
Thorough searches were conducted using PubMed, EBSCO, and CiNAHL computerized databases to identify original, peer-reviewed research articles addressing ‘Appalachia’, ‘health’ and ‘coal’.
STUDY SELECTION:
The computerized database search identified original research publications relevant to chronic health conditions (heart disease, lung disease, kidney disease, cancers, diabetes, obesity, etc.) and coal mining in central Appalachia.
DATA EXTRACTION:
Quantitative measures of the literature review provided information on author collaborations, year of publication, frequency of publication by contributing authors, etc. Journal impact factors were noted and other objective qualitative criteria were considered.
DATA SYNTHESIS:
Over 60 publications relevant to mining with 38 publications specific to Appalachia and health were identified. The publications were reviewed relative to relevance and article quality i.e., current, original research, application to central Appalachia and discussions of chronic human health and coal mining. Over the past five years most of the publications relevant to chronic disease and coal mining in central Appalachia resulted from a research group with a single common author.
CONCLUSIONS:
Science based evidence is needed and data must be provided by independent researchers from various disciplines of study to share different perspectives on how to alleviate the longstanding health disparities in central Appalachia. Studies will require the application of sound methodologies to validate the findings and support future interventions
- …