3 research outputs found

    Unsupervised Metric Learning in Presence of Missing Data

    Full text link
    For many machine learning tasks, the input data lie on a low-dimensional manifold embedded in a high dimensional space and, because of this high-dimensional structure, most algorithms are inefficient. The typical solution is to reduce the dimension of the input data using standard dimension reduction algorithms such as ISOMAP, LAPLACIAN EIGENMAPS or LLES. This approach, however, does not always work in practice as these algorithms require that we have somewhat ideal data. Unfortunately, most data sets either have missing entries or unacceptably noisy values. That is, real data are far from ideal and we cannot use these algorithms directly. In this paper, we focus on the case when we have missing data. Some techniques, such as matrix completion, can be used to fill in missing data but these methods do not capture the non-linear structure of the manifold. Here, we present a new algorithm MR-MISSING that extends these previous algorithms and can be used to compute low dimensional representation on data sets with missing entries. We demonstrate the effectiveness of our algorithm by running three different experiments. We visually verify the effectiveness of our algorithm on synthetic manifolds, we numerically compare our projections against those computed by first filling in data using nlPCA and mDRUR on the MNIST data set, and we also show that we can do classification on MNIST with missing data. We also provide a theoretical guarantee for MR-MISSING under some simplifying assumptions

    Enhancing Missing Data Imputation of Non-stationary Signals with Harmonic Decomposition

    Full text link
    Dealing with time series with missing values, including those afflicted by low quality or over-saturation, presents a significant signal processing challenge. The task of recovering these missing values, known as imputation, has led to the development of several algorithms. However, we have observed that the efficacy of these algorithms tends to diminish when the time series exhibit non-stationary oscillatory behavior. In this paper, we introduce a novel algorithm, coined Harmonic Level Interpolation (HaLI), which enhances the performance of existing imputation algorithms for oscillatory time series. After running any chosen imputation algorithm, HaLI leverages the harmonic decomposition based on the adaptive nonharmonic model of the initial imputation to improve the imputation accuracy for oscillatory time series. Experimental assessments conducted on synthetic and real signals consistently highlight that HaLI enhances the performance of existing imputation algorithms. The algorithm is made publicly available as a readily employable Matlab code for other researchers to use

    Metric and Representation Learning

    Full text link
    All data has some inherent mathematical structure. I am interested in understanding the intrinsic geometric and probabilistic structure of data to design effective algorithms and tools that can be applied to machine learning and across all branches of science. The focus of this thesis is to increase the effectiveness of machine learning techniques by developing a mathematical and algorithmic framework using which, given any type of data, we can learn an optimal representation. Representation learning is done for many reasons. It could be done to fix the corruption given corrupted data or to learn a low dimensional or simpler representation, given high dimensional data or a very complex representation of the data. It could also be that the current representation of the data does not capture the important geometric features of the data. One of the many challenges in representation learning is determining ways to judge the quality of the representation learned. In many cases, the consensus is that if d is the natural metric on the representation, then this metric should provide meaningful information about the data. Many examples of this can be seen in areas such as metric learning, manifold learning, and graph embedding. However, most algorithms that solve these problems learn a representation in a metric space first and then extract a metric. A large part of my research is exploring what happens if the order is switched, that is, learn the appropriate metric first and the embedding later. The philosophy behind this approach is that understanding the inherent geometry of the data is the most crucial part of representation learning. Often, studying the properties of the appropriate metric on the input data sets indicates the type of space, we should be seeking for the representation. Hence giving us more robust representations. Optimizing for the appropriate metric can also help overcome issues such as missing and noisy data. My projects fall into three different areas of representation learning. 1) Geometric and probabilistic analysis of representation learning methods. 2) Developing methods to learn optimal metrics on large datasets. 3) Applications. For the category of geometric and probabilistic analysis of representation learning methods, we have three projects. First, designing optimal training data for denoising autoencoders. Second, formulating a new optimal transport problem and understanding the geometric structure. Third, analyzing the robustness to perturbations of the solutions obtained from the classical multidimensional scaling algorithm versus that of the true solutions to the multidimensional scaling problem. For learning optimal metric, we are given a dissimilarity matrix hatDhat{D}, some function ff and some a subset SS of the space of all metrics and we want to find DinSD in S that minimizes f(D,hatD)f(D,hat{D}). In this thesis, we consider the version of the problem when SS is the space of metrics defined on a fixed graph. That is, given a graph GG, we let SS, be the space of all metrics defined via GG. For this SS, we consider the sparse objective function as well as convex objective functions. We also looked at the problem where we want to learn a tree. We also show how the ideas behind learning the optimal metric can be applied to dimensionality reduction in the presence of missing data. Finally, we look at an application to real world data. Specifically trying to reconstruct ancient Greek text.PHDApplied and Interdisciplinary MathematicsUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/169738/1/rsonthal_1.pd