1,291 research outputs found
Low-Rank Matrices on Graphs: Generalized Recovery & Applications
Many real world datasets subsume a linear or non-linear low-rank structure in
a very low-dimensional space. Unfortunately, one often has very little or no
information about the geometry of the space, resulting in a highly
under-determined recovery problem. Under certain circumstances,
state-of-the-art algorithms provide an exact recovery for linear low-rank
structures but at the expense of highly inscalable algorithms which use nuclear
norm. However, the case of non-linear structures remains unresolved. We revisit
the problem of low-rank recovery from a totally different perspective,
involving graphs which encode pairwise similarity between the data samples and
features. Surprisingly, our analysis confirms that it is possible to recover
many approximate linear and non-linear low-rank structures with recovery
guarantees with a set of highly scalable and efficient algorithms. We call such
data matrices as \textit{Low-Rank matrices on graphs} and show that many real
world datasets satisfy this assumption approximately due to underlying
stationarity. Our detailed theoretical and experimental analysis unveils the
power of the simple, yet very novel recovery framework \textit{Fast Robust PCA
on Graphs
Compressive PCA for Low-Rank Matrices on Graphs
We introduce a novel framework for an approxi- mate recovery of data matrices
which are low-rank on graphs, from sampled measurements. The rows and columns
of such matrices belong to the span of the first few eigenvectors of the graphs
constructed between their rows and columns. We leverage this property to
recover the non-linear low-rank structures efficiently from sampled data
measurements, with a low cost (linear in n). First, a Resrtricted Isometry
Property (RIP) condition is introduced for efficient uniform sampling of the
rows and columns of such matrices based on the cumulative coherence of graph
eigenvectors. Secondly, a state-of-the-art fast low-rank recovery method is
suggested for the sampled data. Finally, several efficient, parallel and
parameter-free decoders are presented along with their theoretical analysis for
decoding the low-rank and cluster indicators for the full data matrix. Thus, we
overcome the computational limitations of the standard linear low-rank recovery
methods for big datasets. Our method can also be seen as a major step towards
efficient recovery of non- linear low-rank structures. For a matrix of size n X
p, on a single core machine, our method gains a speed up of over Robust
Principal Component Analysis (RPCA), where k << p is the subspace dimension.
Numerically, we can recover a low-rank matrix of size 10304 X 1000, 100 times
faster than Robust PCA
Time-frequency detection algorithm for gravitational wave bursts
An efficient algorithm is presented for the identification of short bursts of
gravitational radiation in the data from broad-band interferometric detectors.
The algorithm consists of three steps: pixels of the time-frequency
representation of the data that have power above a fixed threshold are first
identified. Clusters of such pixels that conform to a set of rules on their
size and their proximity to other clusters are formed, and a final threshold is
applied on the power integrated over all pixels in such clusters. Formal
arguments are given to support the conjecture that this algorithm is very
efficient for a wide class of signals. A precise model for the false alarm rate
of this algorithm is presented, and it is shown using a number of
representative numerical simulations to be accurate at the 1% level for most
values of the parameters, with maximal error around 10%.Comment: 26 pages, 15 figures, to appear in PR
ARDA: Automatic Relational Data Augmentation for Machine Learning
Automatic machine learning (\AML) is a family of techniques to automate the
process of training predictive models, aiming to both improve performance and
make machine learning more accessible. While many recent works have focused on
aspects of the machine learning pipeline like model selection, hyperparameter
tuning, and feature selection, relatively few works have focused on automatic
data augmentation. Automatic data augmentation involves finding new features
relevant to the user's predictive task with minimal ``human-in-the-loop''
involvement.
We present \system, an end-to-end system that takes as input a dataset and a
data repository, and outputs an augmented data set such that training a
predictive model on this augmented dataset results in improved performance. Our
system has two distinct components: (1) a framework to search and join data
with the input data, based on various attributes of the input, and (2) an
efficient feature selection algorithm that prunes out noisy or irrelevant
features from the resulting join. We perform an extensive empirical evaluation
of different system components and benchmark our feature selection algorithm on
real-world datasets
One-Class Classification: Taxonomy of Study and Review of Techniques
One-class classification (OCC) algorithms aim to build classification models
when the negative class is either absent, poorly sampled or not well defined.
This unique situation constrains the learning of efficient classifiers by
defining class boundary just with the knowledge of positive class. The OCC
problem has been considered and applied under many research themes, such as
outlier/novelty detection and concept learning. In this paper we present a
unified view of the general problem of OCC by presenting a taxonomy of study
for OCC problems, which is based on the availability of training data,
algorithms used and the application domains applied. We further delve into each
of the categories of the proposed taxonomy and present a comprehensive
literature review of the OCC algorithms, techniques and methodologies with a
focus on their significance, limitations and applications. We conclude our
paper by discussing some open research problems in the field of OCC and present
our vision for future research.Comment: 24 pages + 11 pages of references, 8 figure
- …