3 research outputs found
Unsupervised Metric Learning in Presence of Missing Data
For many machine learning tasks, the input data lie on a low-dimensional
manifold embedded in a high dimensional space and, because of this
high-dimensional structure, most algorithms are inefficient. The typical
solution is to reduce the dimension of the input data using standard dimension
reduction algorithms such as ISOMAP, LAPLACIAN EIGENMAPS or LLES. This
approach, however, does not always work in practice as these algorithms require
that we have somewhat ideal data. Unfortunately, most data sets either have
missing entries or unacceptably noisy values. That is, real data are far from
ideal and we cannot use these algorithms directly. In this paper, we focus on
the case when we have missing data. Some techniques, such as matrix completion,
can be used to fill in missing data but these methods do not capture the
non-linear structure of the manifold. Here, we present a new algorithm
MR-MISSING that extends these previous algorithms and can be used to compute
low dimensional representation on data sets with missing entries. We
demonstrate the effectiveness of our algorithm by running three different
experiments. We visually verify the effectiveness of our algorithm on synthetic
manifolds, we numerically compare our projections against those computed by
first filling in data using nlPCA and mDRUR on the MNIST data set, and we also
show that we can do classification on MNIST with missing data. We also provide
a theoretical guarantee for MR-MISSING under some simplifying assumptions
Enhancing Missing Data Imputation of Non-stationary Signals with Harmonic Decomposition
Dealing with time series with missing values, including those afflicted by
low quality or over-saturation, presents a significant signal processing
challenge. The task of recovering these missing values, known as imputation,
has led to the development of several algorithms. However, we have observed
that the efficacy of these algorithms tends to diminish when the time series
exhibit non-stationary oscillatory behavior. In this paper, we introduce a
novel algorithm, coined Harmonic Level Interpolation (HaLI), which enhances the
performance of existing imputation algorithms for oscillatory time series.
After running any chosen imputation algorithm, HaLI leverages the harmonic
decomposition based on the adaptive nonharmonic model of the initial imputation
to improve the imputation accuracy for oscillatory time series. Experimental
assessments conducted on synthetic and real signals consistently highlight that
HaLI enhances the performance of existing imputation algorithms. The algorithm
is made publicly available as a readily employable Matlab code for other
researchers to use
Metric and Representation Learning
All data has some inherent mathematical structure. I am interested in understanding the intrinsic geometric and probabilistic structure of data to design effective algorithms and tools that can be applied to machine learning and across all branches of science.
The focus of this thesis is to increase the effectiveness of machine learning techniques by developing a mathematical and algorithmic framework using which, given any type of data, we can learn an optimal representation. Representation learning is done for many reasons. It could be done to fix the corruption given corrupted data or to learn a low dimensional or simpler representation, given high dimensional data or a very complex representation of the data. It could also be that the current representation of the data does not capture the important geometric features of the data.
One of the many challenges in representation learning is determining ways to judge the quality of the representation learned. In many cases, the consensus is that if d is the natural metric on the representation, then this metric should provide meaningful information about the data. Many examples of this can be seen in areas such as metric learning, manifold learning, and graph embedding. However, most algorithms that solve these problems learn a representation in a metric space first and then extract a metric.
A large part of my research is exploring what happens if the order is switched, that is, learn the appropriate metric first and the embedding later. The philosophy behind this approach is that understanding the inherent geometry of the data is the most crucial part of representation learning. Often, studying the properties of the appropriate metric on the input data sets indicates the type of space, we should be seeking for the representation. Hence giving us more robust representations. Optimizing for the appropriate metric can also help overcome issues such as missing and noisy data. My projects fall into three different areas of representation learning.
1) Geometric and probabilistic analysis of representation learning methods.
2) Developing methods to learn optimal metrics on large datasets.
3) Applications.
For the category of geometric and probabilistic analysis of representation learning methods, we have three projects. First, designing optimal training data for denoising autoencoders. Second, formulating a new optimal transport problem and understanding the geometric structure. Third, analyzing the robustness to perturbations of the solutions obtained from the classical multidimensional scaling algorithm versus that of the true solutions to the multidimensional scaling problem.
For learning optimal metric, we are given a dissimilarity matrix , some function and some a subset of the space of all metrics and we want to find that minimizes . In this thesis, we consider the version of the problem when is the space of metrics defined on a fixed graph. That is, given a graph , we let , be the space of all metrics defined via . For this , we consider the sparse objective function as well as convex objective functions. We also looked at the problem where we want to learn a tree. We also show how the ideas behind learning the optimal metric can be applied to dimensionality reduction in the presence of missing data.
Finally, we look at an application to real world data. Specifically trying to reconstruct ancient Greek text.PHDApplied and Interdisciplinary MathematicsUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/169738/1/rsonthal_1.pd