2,152,675 research outputs found

    Comment on "Spatio-temporal filling of missing points in geophysical data sets" by D. Kondrashov and M. Ghil, Nonlin. Processes Geophys., 13, 151–159, 2006

    Get PDF
    Kondrashov and Ghil (2006) (KG hereafter) describe a method for imputing missing values in incomplete datasets that can exploit both spatial and temporal covariability to estimate missing values from available values. Temporal covariability has not been exploited as widely as spatial covariability in imputing missing values in geophysical datasets, but, as KG show, doing so can improve estimates of missing values. However, there are several inaccuracies in KG’s paper. Since similar inaccuracies have surfaced in other recent papers, for example, in the literature on paleo-climate reconstructions, I would like to point them out here

    Missing Value Imputation With Unsupervised Backpropagation

    Full text link
    Many data mining and data analysis techniques operate on dense matrices or complete tables of data. Real-world data sets, however, often contain unknown values. Even many classification algorithms that are designed to operate with missing values still exhibit deteriorated accuracy. One approach to handling missing values is to fill in (impute) the missing values. In this paper, we present a technique for unsupervised learning called Unsupervised Backpropagation (UBP), which trains a multi-layer perceptron to fit to the manifold sampled by a set of observed point-vectors. We evaluate UBP with the task of imputing missing values in datasets, and show that UBP is able to predict missing values with significantly lower sum-squared error than other collaborative filtering and imputation techniques. We also demonstrate with 24 datasets and 9 supervised learning algorithms that classification accuracy is usually higher when randomly-withheld values are imputed using UBP, rather than with other methods

    Approximating Clustering of Fingerprint Vectors with Missing Values

    Full text link
    The problem of clustering fingerprint vectors is an interesting problem in Computational Biology that has been proposed in (Figureroa et al. 2004). In this paper we show some improvements in closing the gaps between the known lower bounds and upper bounds on the approximability of some variants of the biological problem. Namely we are able to prove that the problem is APX-hard even when each fingerprint contains only two unknown position. Moreover we have studied some variants of the orginal problem, and we give two 2-approximation algorithm for the IECMV and OECMV problems when the number of unknown entries for each vector is at most a constant.Comment: 13 pages, 4 figure

    Generalized canonical correlation analysis with missing values

    Get PDF
    Two new methods for dealing with missing values in generalized canonicalcorrelation analysis are introduced. The first approach, which does notrequire iterations, is a generalization of the Test Equating method availablefor principal component analysis. In the second approach, missing values areimputed in such a way that the generalized canonical correlation analysisobjective function does not increase in subsequent steps. Convergence isachieved when the value of the objective function remains constant. By meansof a simulation study, we assess the performance of the new methods. Wecompare the results with those of two available methods; the missing-datapassive method, introduced Gifi's homogeneity analysis framework, and theGENCOM algorithm developed by Green and Carroll.generalized canoncial correlation analysis;missing values
    corecore