225,297 research outputs found
Recommended from our members
Geometric Sparsity in High Dimension
While typically complex and high-dimensional, modern data sets often have a concise underlying structure. This thesis explores the sparsity inherent in the geometric structure of many high-dimensional data sets.
Constructing an efficient parametrization of a large data set of points lying close to a smooth manifold in high dimension remains a fundamental problem. One approach, guided by geometry, consists in recovering a local parametrization (a chart) using the local tangent plane. In practice, the data are noisy and the estimation of a low-dimensional tangent plane in high dimension becomes ill posed. Principal component analysis (PCA) is often the tool of choice, as it returns an optimal basis in the case of noise-free samples from a linear subspace. To process noisy data, PCA must be applied locally, at a scale small enough such that the manifold is approximately linear, but at a scale large enough such that structure may be discerned from noise.
We present an approach that uses the geometry of the data to guide our definition of locality, discovering the optimal balance of this noise-curvature trade-off. Using eigenspace perturbation theory, we study the stability of the subspace estimated by PCA as a function of scale, and bound (with high probability) the angle it forms with the true tangent space. By adaptively selecting the scale that minimizes this bound, our analysis reveals the optimal scale for local tangent plane recovery. Additionally, we are able to accurately and efficiently estimate the curvature of the local neighborhood, and we introduce a geometric uncertainty principle quantifying the limits of noise-curvature perturbation for tangent plane recovery. An algorithm for partitioning a noisy data set is then studied, yielding an appropriate scale for practical tangent plane estimation.
Next, we study the interaction of sparsity, scale, and noise from a signal decomposition perspective. Empirical Mode Decomposition is a time-frequency analysis tool for nonstationary data that adaptively defines modes based on the intrinsic frequency scales of a signal. A novel understanding of the scales at which noise corrupts the otherwise sparse frequency decomposition is presented. The thesis concludes with a discussion of future work, including applications to image processing and the continued development of sparse representation from a geometric perspective
Compressive Sensing and Recovery of Structured Sparse Signals
In the recent years, numerous disciplines including telecommunications, medical imaging, computational biology, and neuroscience benefited from increasing applications of high dimensional datasets. This calls for efficient ways of data capturing and data processing. Compressive sensing (CS), which is introduced as an efficient sampling (data capturing) method, is addressing this need. It is well-known that the signals, which belong to an ambient high-dimensional space, have much smaller dimensionality in an appropriate domain. CS taps into this principle and dramatically reduces the number of samples that is required to be captured to avoid any distortion in the information content of the data. This reduction in the required number of samples enables many new applications that were previously infeasible using classical sampling techniques. Most CS-based approaches take advantage of the inherent low-dimensionality in many datasets. They try to determine a sparse representation of the data, in an appropriately chosen basis using only a few significant elements. These approaches make no extra assumptions regarding possible relationships among the significant elements of that basis. In this dissertation, different ways of incorporating the knowledge about such relationships are integrated into the data sampling and the processing schemes. We first consider the recovery of temporally correlated sparse signals and show that using the time correlation model. The recovery performance can be significantly improved. Next, we modify the sampling process of sparse signals to incorporate the signal structure in a more efficient way. In the image processing application, we show that exploiting the structure information in both signal sampling and signal recovery improves the efficiency of the algorithm. In addition, we show that region-of-interest information can be included in the CS sampling and recovery steps to provide a much better quality for the region-of-interest area compared the rest of the image or video. In spectrum sensing applications, CS can dramatically improve the sensing efficiency by facilitating the coordination among spectrum sensors. A cluster-based spectrum sensing with coordination among spectrum sensors is proposed for geographically disperse cognitive radio networks. Further, CS has been exploited in this problem for simultaneous sensing and localization. Having access to this information dramatically facilitates the implementation of advanced communication technologies as required by 5G communication networks
Methods for Bayesian power spectrum inference with galaxy surveys
We derive and implement a full Bayesian large scale structure inference
method aiming at precision recovery of the cosmological power spectrum from
galaxy redshift surveys. Our approach improves over previous Bayesian methods
by performing a joint inference of the three dimensional density field, the
cosmological power spectrum, luminosity dependent galaxy biases and
corresponding normalizations. We account for all joint and correlated
uncertainties between all inferred quantities. Classes of galaxies with
different biases are treated as separate sub samples. The method therefore also
allows the combined analysis of more than one galaxy survey.
In particular, it solves the problem of inferring the power spectrum from
galaxy surveys with non-trivial survey geometries by exploring the joint
posterior distribution with efficient implementations of multiple block Markov
chain and Hybrid Monte Carlo methods. Our Markov sampler achieves high
statistical efficiency in low signal to noise regimes by using a deterministic
reversible jump algorithm. We test our method on an artificial mock galaxy
survey, emulating characteristic features of the Sloan Digital Sky Survey data
release 7, such as its survey geometry and luminosity dependent biases. These
tests demonstrate the numerical feasibility of our large scale Bayesian
inference frame work when the parameter space has millions of dimensions.
The method reveals and correctly treats the anti-correlation between bias
amplitudes and power spectrum, which are not taken into account in current
approaches to power spectrum estimation, a 20 percent effect across large
ranges in k-space. In addition, the method results in constrained realizations
of density fields obtained without assuming the power spectrum or bias
parameters in advance
High Dimensional Low Rank plus Sparse Matrix Decomposition
This paper is concerned with the problem of low rank plus sparse matrix
decomposition for big data. Conventional algorithms for matrix decomposition
use the entire data to extract the low-rank and sparse components, and are
based on optimization problems with complexity that scales with the dimension
of the data, which limits their scalability. Furthermore, existing randomized
approaches mostly rely on uniform random sampling, which is quite inefficient
for many real world data matrices that exhibit additional structures (e.g.
clustering). In this paper, a scalable subspace-pursuit approach that
transforms the decomposition problem to a subspace learning problem is
proposed. The decomposition is carried out using a small data sketch formed
from sampled columns/rows. Even when the data is sampled uniformly at random,
it is shown that the sufficient number of sampled columns/rows is roughly
O(r\mu), where \mu is the coherency parameter and r the rank of the low rank
component. In addition, adaptive sampling algorithms are proposed to address
the problem of column/row sampling from structured data. We provide an analysis
of the proposed method with adaptive sampling and show that adaptive sampling
makes the required number of sampled columns/rows invariant to the distribution
of the data. The proposed approach is amenable to online implementation and an
online scheme is proposed.Comment: IEEE Transactions on Signal Processin
Low-Rank Matrices on Graphs: Generalized Recovery & Applications
Many real world datasets subsume a linear or non-linear low-rank structure in
a very low-dimensional space. Unfortunately, one often has very little or no
information about the geometry of the space, resulting in a highly
under-determined recovery problem. Under certain circumstances,
state-of-the-art algorithms provide an exact recovery for linear low-rank
structures but at the expense of highly inscalable algorithms which use nuclear
norm. However, the case of non-linear structures remains unresolved. We revisit
the problem of low-rank recovery from a totally different perspective,
involving graphs which encode pairwise similarity between the data samples and
features. Surprisingly, our analysis confirms that it is possible to recover
many approximate linear and non-linear low-rank structures with recovery
guarantees with a set of highly scalable and efficient algorithms. We call such
data matrices as \textit{Low-Rank matrices on graphs} and show that many real
world datasets satisfy this assumption approximately due to underlying
stationarity. Our detailed theoretical and experimental analysis unveils the
power of the simple, yet very novel recovery framework \textit{Fast Robust PCA
on Graphs
Proceedings of the second "international Traveling Workshop on Interactions between Sparse models and Technology" (iTWIST'14)
The implicit objective of the biennial "international - Traveling Workshop on
Interactions between Sparse models and Technology" (iTWIST) is to foster
collaboration between international scientific teams by disseminating ideas
through both specific oral/poster presentations and free discussions. For its
second edition, the iTWIST workshop took place in the medieval and picturesque
town of Namur in Belgium, from Wednesday August 27th till Friday August 29th,
2014. The workshop was conveniently located in "The Arsenal" building within
walking distance of both hotels and town center. iTWIST'14 has gathered about
70 international participants and has featured 9 invited talks, 10 oral
presentations, and 14 posters on the following themes, all related to the
theory, application and generalization of the "sparsity paradigm":
Sparsity-driven data sensing and processing; Union of low dimensional
subspaces; Beyond linear and convex inverse problem; Matrix/manifold/graph
sensing/processing; Blind inverse problems and dictionary learning; Sparsity
and computational neuroscience; Information theory, geometry and randomness;
Complexity/accuracy tradeoffs in numerical methods; Sparsity? What's next?;
Sparse machine learning and inference.Comment: 69 pages, 24 extended abstracts, iTWIST'14 website:
http://sites.google.com/site/itwist1
- …