6,871 research outputs found
Masking Strategies for Image Manifolds
We consider the problem of selecting an optimal mask for an image manifold,
i.e., choosing a subset of the pixels of the image that preserves the
manifold's geometric structure present in the original data. Such masking
implements a form of compressive sensing through emerging imaging sensor
platforms for which the power expense grows with the number of pixels acquired.
Our goal is for the manifold learned from masked images to resemble its full
image counterpart as closely as possible. More precisely, we show that one can
indeed accurately learn an image manifold without having to consider a large
majority of the image pixels. In doing so, we consider two masking methods that
preserve the local and global geometric structure of the manifold,
respectively. In each case, the process of finding the optimal masking pattern
can be cast as a binary integer program, which is computationally expensive but
can be approximated by a fast greedy algorithm. Numerical experiments show that
the relevant manifold structure is preserved through the data-dependent masking
process, even for modest mask sizes
An Efficient Dual Approach to Distance Metric Learning
Distance metric learning is of fundamental interest in machine learning
because the distance metric employed can significantly affect the performance
of many learning methods. Quadratic Mahalanobis metric learning is a popular
approach to the problem, but typically requires solving a semidefinite
programming (SDP) problem, which is computationally expensive. Standard
interior-point SDP solvers typically have a complexity of (with
the dimension of input data), and can thus only practically solve problems
exhibiting less than a few thousand variables. Since the number of variables is
, this implies a limit upon the size of problem that can
practically be solved of around a few hundred dimensions. The complexity of the
popular quadratic Mahalanobis metric learning approach thus limits the size of
problem to which metric learning can be applied. Here we propose a
significantly more efficient approach to the metric learning problem based on
the Lagrange dual formulation of the problem. The proposed formulation is much
simpler to implement, and therefore allows much larger Mahalanobis metric
learning problems to be solved. The time complexity of the proposed method is
, which is significantly lower than that of the SDP approach.
Experiments on a variety of datasets demonstrate that the proposed method
achieves an accuracy comparable to the state-of-the-art, but is applicable to
significantly larger problems. We also show that the proposed method can be
applied to solve more general Frobenius-norm regularized SDP problems
approximately
Dimensionality Reduction Mappings
A wealth of powerful dimensionality reduction methods has been established which can be used for data visualization and preprocessing. These are accompanied by formal evaluation schemes, which allow a quantitative evaluation along general principles and which even lead to further visualization schemes based on these objectives. Most methods, however, provide a mapping of a priorly given finite set of points only, requiring additional steps for out-of-sample extensions. We propose a general view on dimensionality reduction based on the concept of cost functions, and, based on this general principle, extend dimensionality reduction to explicit mappings of the data manifold. This offers simple out-of-sample extensions. Further, it opens a way towards a theory of data visualization taking the perspective of its generalization ability to new data points. We demonstrate the approach based on a simple global linear mapping as well as prototype-based local linear mappings.
Network Topology Mapping from Partial Virtual Coordinates and Graph Geodesics
For many important network types (e.g., sensor networks in complex harsh
environments and social networks) physical coordinate systems (e.g.,
Cartesian), and physical distances (e.g., Euclidean), are either difficult to
discern or inapplicable. Accordingly, coordinate systems and characterizations
based on hop-distance measurements, such as Topology Preserving Maps (TPMs) and
Virtual-Coordinate (VC) systems are attractive alternatives to Cartesian
coordinates for many network algorithms. Herein, we present an approach to
recover geometric and topological properties of a network with a small set of
distance measurements. In particular, our approach is a combination of shortest
path (often called geodesic) recovery concepts and low-rank matrix completion,
generalized to the case of hop-distances in graphs. Results for sensor networks
embedded in 2-D and 3-D spaces, as well as a social networks, indicates that
the method can accurately capture the network connectivity with a small set of
measurements. TPM generation can now also be based on various context
appropriate measurements or VC systems, as long as they characterize different
nodes by distances to small sets of random nodes (instead of a set of global
anchors). The proposed method is a significant generalization that allows the
topology to be extracted from a random set of graph shortest paths, making it
applicable in contexts such as social networks where VC generation may not be
possible.Comment: 17 pages, 9 figures. arXiv admin note: substantial text overlap with
arXiv:1712.1006
Visualizing dimensionality reduction of systems biology data
One of the challenges in analyzing high-dimensional expression data is the
detection of important biological signals. A common approach is to apply a
dimension reduction method, such as principal component analysis. Typically,
after application of such a method the data is projected and visualized in the
new coordinate system, using scatter plots or profile plots. These methods
provide good results if the data have certain properties which become visible
in the new coordinate system and which were hard to detect in the original
coordinate system. Often however, the application of only one method does not
suffice to capture all important signals. Therefore several methods addressing
different aspects of the data need to be applied. We have developed a framework
for linear and non-linear dimension reduction methods within our visual
analytics pipeline SpRay. This includes measures that assist the interpretation
of the factorization result. Different visualizations of these measures can be
combined with functional annotations that support the interpretation of the
results. We show an application to high-resolution time series microarray data
in the antibiotic-producing organism Streptomyces coelicolor as well as to
microarray data measuring expression of cells with normal karyotype and cells
with trisomies of human chromosomes 13 and 21
- …