3,738 research outputs found

    Finnim Iterative Imputation Of Missing Values In Dissolved Gas Analysis Dataset

    Get PDF
    Missing values are a common occurrence in a number of real world databases, and statistical methods have been developed to deal with this problem, referred to as missing data imputation. In the detection and prediction of incipient faults in power transformers using Dissolved Gas Analysis (DGA), the problem of missing values is significant and has resulted in inconclusive decision making. This study proposes an efficient non-parametric iterative imputation method, named FINNIM, which comprises of three components : the imputation ordering, the imputation estimator and the iterative imputation. The relationship between gases and faults and the percentage of missing values in an instance are used as a basis for the imputation ordering; whilst the plausible values for the missing values are estimated from k-nearest neighbour instances in the imputation estimator; and the iterative imputation allows complete and incomplete instances in a DGA dataset to be utilized iteratively for imputing all the missing values. Experimental results on both artificially inserted and actual missing values found in a few DGA datasets demonstrate that the proposed method outperforms the existing methods in imputation accuracy, classification performance and convergence criteria at different missing percentages

    Estimation of the Distribution of Hourly Pay from Household Survey Data: The Use of Missing Data Methods to Handle Measurement Error

    Get PDF
    Measurement errors in survey data on hourly pay may lead to serious upward bias in low pay estimates. We consider how to correct for this bias when auxiliary accurately measured data are available for a subsample. An application to the UK Labour Force Survey is described. The use of fractional imputation, nearest neighbour imputation, predictive mean matching and propensity score weighting are considered. Properties of point estimators are compared both theoretically and by simulation. A fractional predictive mean matching imputation approach is advocated. It performs similarly to propensity score weighting, but displays slight advantages of robustness and efficiency.

    A systematic review of data quality issues in knowledge discovery tasks

    Get PDF
    Hay un gran crecimiento en el volumen de datos porque las organizaciones capturan permanentemente la cantidad colectiva de datos para lograr un mejor proceso de toma de decisiones. El desafío mas fundamental es la exploración de los grandes volúmenes de datos y la extracción de conocimiento útil para futuras acciones por medio de tareas para el descubrimiento del conocimiento; sin embargo, muchos datos presentan mala calidad. Presentamos una revisión sistemática de los asuntos de calidad de datos en las áreas del descubrimiento de conocimiento y un estudio de caso aplicado a la enfermedad agrícola conocida como la roya del café.Large volume of data is growing because the organizations are continuously capturing the collective amount of data for better decision-making process. The most fundamental challenge is to explore the large volumes of data and extract useful knowledge for future actions through knowledge discovery tasks, nevertheless many data has poor quality. We presented a systematic review of the data quality issues in knowledge discovery tasks and a case study applied to agricultural disease named coffee rust

    One-Class Classification: Taxonomy of Study and Review of Techniques

    Full text link
    One-class classification (OCC) algorithms aim to build classification models when the negative class is either absent, poorly sampled or not well defined. This unique situation constrains the learning of efficient classifiers by defining class boundary just with the knowledge of positive class. The OCC problem has been considered and applied under many research themes, such as outlier/novelty detection and concept learning. In this paper we present a unified view of the general problem of OCC by presenting a taxonomy of study for OCC problems, which is based on the availability of training data, algorithms used and the application domains applied. We further delve into each of the categories of the proposed taxonomy and present a comprehensive literature review of the OCC algorithms, techniques and methodologies with a focus on their significance, limitations and applications. We conclude our paper by discussing some open research problems in the field of OCC and present our vision for future research.Comment: 24 pages + 11 pages of references, 8 figure

    Imputing income in Longitudinal Study of Australian Children

    Get PDF
    This paper explains why imputing missing income data is of benefit to Longitudinal Study of Australian Children and shows how two techniques have been used in combination to impute this income data
    • …
    corecore