Search CORE

1,291 research outputs found

Low-Rank Matrices on Graphs: Generalized Recovery & Applications

Author: Perraudin Nathanael
Shahid Nauman
Vandergheynst Pierre
Publication venue
Publication date: 18/05/2016
Field of study

Many real world datasets subsume a linear or non-linear low-rank structure in a very low-dimensional space. Unfortunately, one often has very little or no information about the geometry of the space, resulting in a highly under-determined recovery problem. Under certain circumstances, state-of-the-art algorithms provide an exact recovery for linear low-rank structures but at the expense of highly inscalable algorithms which use nuclear norm. However, the case of non-linear structures remains unresolved. We revisit the problem of low-rank recovery from a totally different perspective, involving graphs which encode pairwise similarity between the data samples and features. Surprisingly, our analysis confirms that it is possible to recover many approximate linear and non-linear low-rank structures with recovery guarantees with a set of highly scalable and efficient algorithms. We call such data matrices as \textit{Low-Rank matrices on graphs} and show that many real world datasets satisfy this assumption approximately due to underlying stationarity. Our detailed theoretical and experimental analysis unveils the power of the simple, yet very novel recovery framework \textit{Fast Robust PCA on Graphs

arXiv.org e-Print Archive

Infoscience - École polytechnique fédérale de Lausanne

Compressive PCA for Low-Rank Matrices on Graphs

Author: Perraudin Nathanael
Puy Gilles
Shahid Nauman
Vandergheynst Pierre
Publication venue
Publication date: 01/01/2016
Field of study

We introduce a novel framework for an approxi- mate recovery of data matrices which are low-rank on graphs, from sampled measurements. The rows and columns of such matrices belong to the span of the first few eigenvectors of the graphs constructed between their rows and columns. We leverage this property to recover the non-linear low-rank structures efficiently from sampled data measurements, with a low cost (linear in n). First, a Resrtricted Isometry Property (RIP) condition is introduced for efficient uniform sampling of the rows and columns of such matrices based on the cumulative coherence of graph eigenvectors. Secondly, a state-of-the-art fast low-rank recovery method is suggested for the sampled data. Finally, several efficient, parallel and parameter-free decoders are presented along with their theoretical analysis for decoding the low-rank and cluster indicators for the full data matrix. Thus, we overcome the computational limitations of the standard linear low-rank recovery methods for big datasets. Our method can also be seen as a major step towards efficient recovery of non- linear low-rank structures. For a matrix of size n X p, on a single core machine, our method gains a speed up of

p^2/k

over Robust Principal Component Analysis (RPCA), where k << p is the subspace dimension. Numerically, we can recover a low-rank matrix of size 10304 X 1000, 100 times faster than Robust PCA

arXiv.org e-Print Archive

Infoscience - École polytechnique fédérale de Lausanne

INRIA a CCSD electronic archive server

HAL Descartes

HAL-Rennes 1

Time-frequency detection algorithm for gravitational wave bursts

Author: A. Abramovici
A. Viceré
A.R. Conway
B.J. Owen
C. Bradachia
C.L. Fryer
D.L. Donoho
D.L. Donoho
E.J. Groth
E.W. Leaver
J. Neyman
J.A. Faber
J.A. Faber
Julien Sylvestre
L. Blanchet
L. Blanchet
N. Arnaud
S. Mertens
S.D. Mohanty
T. Kailath
T. Pradier
T. Pradier
T. Zwerger
W.G. Anderson
W.G. Anderson
W.G. Anderson
W.H. Lee
Y.I. Ingster
É.É. Flanagan
É.É. Flanagan
Publication venue: 'American Physical Society (APS)'
Publication date: 14/10/2002
Field of study

An efficient algorithm is presented for the identification of short bursts of gravitational radiation in the data from broad-band interferometric detectors. The algorithm consists of three steps: pixels of the time-frequency representation of the data that have power above a fixed threshold are first identified. Clusters of such pixels that conform to a set of rules on their size and their proximity to other clusters are formed, and a final threshold is applied on the power integrated over all pixels in such clusters. Formal arguments are given to support the conjecture that this algorithm is very efficient for a wide class of signals. A precise model for the false alarm rate of this algorithm is presented, and it is shown using a number of representative numerical simulations to be accurate at the 1% level for most values of the parameters, with maximal error around 10%.Comment: 26 pages, 15 figures, to appear in PR

arXiv.org e-Print Archive

Crossref

CERN Document Server

ARDA: Automatic Relational Data Augmentation for Machine Learning

Author: Chepurko Nadiia
Fernandez Raul Castro
Karger David
Kraska Tim
Marcus Ryan
Zgraggen Emanuel
Publication venue
Publication date: 21/03/2020
Field of study

Automatic machine learning (\AML) is a family of techniques to automate the process of training predictive models, aiming to both improve performance and make machine learning more accessible. While many recent works have focused on aspects of the machine learning pipeline like model selection, hyperparameter tuning, and feature selection, relatively few works have focused on automatic data augmentation. Automatic data augmentation involves finding new features relevant to the user's predictive task with minimal ``human-in-the-loop'' involvement. We present \system, an end-to-end system that takes as input a dataset and a data repository, and outputs an augmented data set such that training a predictive model on this augmented dataset results in improved performance. Our system has two distinct components: (1) a framework to search and join data with the input data, based on various attributes of the input, and (2) an efficient feature selection algorithm that prunes out noisy or irrelevant features from the resulting join. We perform an extensive empirical evaluation of different system components and benchmark our feature selection algorithm on real-world datasets

arXiv.org e-Print Archive

DSpace@MIT

One-Class Classification: Taxonomy of Study and Review of Techniques

Author: Khan Shehroz S.
Madden Michael G.
Publication venue: 'Cambridge University Press (CUP)'
Publication date: 29/11/2013
Field of study

One-class classification (OCC) algorithms aim to build classification models when the negative class is either absent, poorly sampled or not well defined. This unique situation constrains the learning of efficient classifiers by defining class boundary just with the knowledge of positive class. The OCC problem has been considered and applied under many research themes, such as outlier/novelty detection and concept learning. In this paper we present a unified view of the general problem of OCC by presenting a taxonomy of study for OCC problems, which is based on the availability of training data, algorithms used and the application domains applied. We further delve into each of the categories of the proposed taxonomy and present a comprehensive literature review of the OCC algorithms, techniques and methodologies with a focus on their significance, limitations and applications. We conclude our paper by discussing some open research problems in the field of OCC and present our vision for future research.Comment: 24 pages + 11 pages of references, 8 figure

arXiv.org e-Print Archive

Access to Research at National University of Ireland, Galway