2,152,675 research outputs found
Comment on "Spatio-temporal filling of missing points in geophysical data sets" by D. Kondrashov and M. Ghil, Nonlin. Processes Geophys., 13, 151–159, 2006
Kondrashov and Ghil (2006) (KG hereafter) describe a method for imputing missing values in incomplete datasets that can exploit both spatial and temporal covariability to estimate missing values from available values. Temporal covariability has not been exploited as widely as spatial covariability in imputing missing values in geophysical datasets, but, as KG show, doing so can improve estimates of missing values. However, there are several inaccuracies in KG’s paper. Since similar inaccuracies have surfaced in other recent papers, for example, in the literature on paleo-climate reconstructions, I would like to point them out here
Missing Value Imputation With Unsupervised Backpropagation
Many data mining and data analysis techniques operate on dense matrices or
complete tables of data. Real-world data sets, however, often contain unknown
values. Even many classification algorithms that are designed to operate with
missing values still exhibit deteriorated accuracy. One approach to handling
missing values is to fill in (impute) the missing values. In this paper, we
present a technique for unsupervised learning called Unsupervised
Backpropagation (UBP), which trains a multi-layer perceptron to fit to the
manifold sampled by a set of observed point-vectors. We evaluate UBP with the
task of imputing missing values in datasets, and show that UBP is able to
predict missing values with significantly lower sum-squared error than other
collaborative filtering and imputation techniques. We also demonstrate with 24
datasets and 9 supervised learning algorithms that classification accuracy is
usually higher when randomly-withheld values are imputed using UBP, rather than
with other methods
Approximating Clustering of Fingerprint Vectors with Missing Values
The problem of clustering fingerprint vectors is an interesting problem in
Computational Biology that has been proposed in (Figureroa et al. 2004). In
this paper we show some improvements in closing the gaps between the known
lower bounds and upper bounds on the approximability of some variants of the
biological problem. Namely we are able to prove that the problem is APX-hard
even when each fingerprint contains only two unknown position. Moreover we have
studied some variants of the orginal problem, and we give two 2-approximation
algorithm for the IECMV and OECMV problems when the number of unknown entries
for each vector is at most a constant.Comment: 13 pages, 4 figure
Generalized canonical correlation analysis with missing values
Two new methods for dealing with missing values in generalized canonicalcorrelation analysis are introduced. The first approach, which does notrequire iterations, is a generalization of the Test Equating method availablefor principal component analysis. In the second approach, missing values areimputed in such a way that the generalized canonical correlation analysisobjective function does not increase in subsequent steps. Convergence isachieved when the value of the objective function remains constant. By meansof a simulation study, we assess the performance of the new methods. Wecompare the results with those of two available methods; the missing-datapassive method, introduced Gifi's homogeneity analysis framework, and theGENCOM algorithm developed by Green and Carroll.generalized canoncial correlation analysis;missing values
- …
