1,078 research outputs found
A Neuron as a Signal Processing Device
A neuron is a basic physiological and computational unit of the brain. While
much is known about the physiological properties of a neuron, its computational
role is poorly understood. Here we propose to view a neuron as a signal
processing device that represents the incoming streaming data matrix as a
sparse vector of synaptic weights scaled by an outgoing sparse activity vector.
Formally, a neuron minimizes a cost function comprising a cumulative squared
representation error and regularization terms. We derive an online algorithm
that minimizes such cost function by alternating between the minimization with
respect to activity and with respect to synaptic weights. The steps of this
algorithm reproduce well-known physiological properties of a neuron, such as
weighted summation and leaky integration of synaptic inputs, as well as an
Oja-like, but parameter-free, synaptic learning rule. Our theoretical framework
makes several predictions, some of which can be verified by the existing data,
others require further experiments. Such framework should allow modeling the
function of neuronal circuits without necessarily measuring all the microscopic
biophysical parameters, as well as facilitate the design of neuromorphic
electronics.Comment: 2013 Asilomar Conference on Signals, Systems and Computers, see
http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=681029
Provable Dynamic Robust PCA or Robust Subspace Tracking
Dynamic robust PCA refers to the dynamic (time-varying) extension of robust
PCA (RPCA). It assumes that the true (uncorrupted) data lies in a
low-dimensional subspace that can change with time, albeit slowly. The goal is
to track this changing subspace over time in the presence of sparse outliers.
We develop and study a novel algorithm, that we call simple-ReProCS, based on
the recently introduced Recursive Projected Compressive Sensing (ReProCS)
framework. Our work provides the first guarantee for dynamic RPCA that holds
under weakened versions of standard RPCA assumptions, slow subspace change and
a lower bound assumption on most outlier magnitudes. Our result is significant
because (i) it removes the strong assumptions needed by the two previous
complete guarantees for ReProCS-based algorithms; (ii) it shows that it is
possible to achieve significantly improved outlier tolerance, compared with all
existing RPCA or dynamic RPCA solutions by exploiting the above two simple
extra assumptions; and (iii) it proves that simple-ReProCS is online (after
initialization), fast, and, has near-optimal memory complexity.Comment: Minor writing edits. The paper has been accepted to IEEE Transactions
on Information Theor
Online Product Quantization
Approximate nearest neighbor (ANN) search has achieved great success in many
tasks. However, existing popular methods for ANN search, such as hashing and
quantization methods, are designed for static databases only. They cannot
handle well the database with data distribution evolving dynamically, due to
the high computational effort for retraining the model based on the new
database. In this paper, we address the problem by developing an online product
quantization (online PQ) model and incrementally updating the quantization
codebook that accommodates to the incoming streaming data. Moreover, to further
alleviate the issue of large scale computation for the online PQ update, we
design two budget constraints for the model to update partial PQ codebook
instead of all. We derive a loss bound which guarantees the performance of our
online PQ model. Furthermore, we develop an online PQ model over a sliding
window with both data insertion and deletion supported, to reflect the
real-time behaviour of the data. The experiments demonstrate that our online PQ
model is both time-efficient and effective for ANN search in dynamic large
scale databases compared with baseline methods and the idea of partial PQ
codebook update further reduces the update cost.Comment: To appear in IEEE Transactions on Knowledge and Data Engineering
(DOI: 10.1109/TKDE.2018.2817526
Dynamic Algorithms and Asymptotic Theory for Lp-norm Data Analysis
The focus of this dissertation is the development of outlier-resistant stochastic algorithms for Principal Component Analysis (PCA) and the derivation of novel asymptotic theory for Lp-norm Principal Component Analysis (Lp-PCA). Modern machine learning and signal processing applications employ sensors that collect large volumes of data measurements that are stored in the form of data matrices, that are often massive and need to be efficiently processed in order to enable machine learning algorithms to perform effective underlying pattern discovery. One such commonly used matrix analysis technique is PCA. Over the past century, PCA has been extensively used in areas such as machine learning, deep learning, pattern recognition, and computer vision, just to name a few. PCA\u27s popularity can be attributed to its intuitive formulation on the L2-norm, availability of an elegant solution via the singular-value-decomposition (SVD), and asymptotic convergence guarantees. However, PCA has been shown to be highly sensitive to faulty measurements (outliers) because of its reliance on the outlier-sensitive L2-norm. Arguably, the most straightforward approach to impart robustness against outliers is to replace the outlier-sensitive L2-norm by the outlier-resistant L1-norm, thus formulating what is known as L1-PCA. Exact and approximate solvers are proposed for L1-PCA in the literature. On the other hand, in this big-data era, the data matrix may be very large and/or the data measurements may arrive in streaming fashion. Traditional L1-PCA algorithms are not suitable in this setting. In order to efficiently process streaming data, while being resistant against outliers, we propose a stochastic L1-PCA algorithm that computes the dominant principal component (PC) with formal convergence guarantees. We further generalize our stochastic L1-PCA algorithm to find multiple components by propose a new PCA framework that maximizes the recently proposed Barron loss. Leveraging Barron loss yields a stochastic algorithm with a tunable robustness parameter that allows the user to control the amount of outlier-resistance required in a given application. We demonstrate the efficacy and robustness of our stochastic algorithms on synthetic and real-world datasets. Our experimental studies include online subspace estimation, classification, video surveillance, and image conditioning, among other things. Last, we focus on the development of asymptotic theory for Lp-PCA. In general, Lp-PCA for p\u3c2 has shown to outperform PCA in the presence of outliers owing to its outlier resistance. However, unlike PCA, Lp-PCA is perceived as a ``robust heuristic\u27\u27 by the research community due to the lack of theoretical asymptotic convergence guarantees. In this work, we strive to shed light on the topic by developing asymptotic theory for Lp-PCA. Specifically, we show that, for a broad class of data distributions, the Lp-PCs span the same subspace as the standard PCs asymptotically and moreover, we prove that the Lp-PCs are specific rotated versions of the PCs. Finally, we demonstrate the asymptotic equivalence of PCA and Lp-PCA with a wide variety of experimental studies
- …