44,861 research outputs found
Foundational principles for large scale inference: Illustrations through correlation mining
When can reliable inference be drawn in the "Big Data" context? This paper
presents a framework for answering this fundamental question in the context of
correlation mining, with implications for general large scale inference. In
large scale data applications like genomics, connectomics, and eco-informatics
the dataset is often variable-rich but sample-starved: a regime where the
number of acquired samples (statistical replicates) is far fewer than the
number of observed variables (genes, neurons, voxels, or chemical
constituents). Much of recent work has focused on understanding the
computational complexity of proposed methods for "Big Data." Sample complexity
however has received relatively less attention, especially in the setting when
the sample size is fixed, and the dimension grows without bound. To
address this gap, we develop a unified statistical framework that explicitly
quantifies the sample complexity of various inferential tasks. Sampling regimes
can be divided into several categories: 1) the classical asymptotic regime
where the variable dimension is fixed and the sample size goes to infinity; 2)
the mixed asymptotic regime where both variable dimension and sample size go to
infinity at comparable rates; 3) the purely high dimensional asymptotic regime
where the variable dimension goes to infinity and the sample size is fixed.
Each regime has its niche but only the latter regime applies to exa-scale data
dimension. We illustrate this high dimensional framework for the problem of
correlation mining, where it is the matrix of pairwise and partial correlations
among the variables that are of interest. We demonstrate various regimes of
correlation mining based on the unifying perspective of high dimensional
learning rates and sample complexity for different structured covariance models
and different inference tasks
Learning and comparing functional connectomes across subjects
Functional connectomes capture brain interactions via synchronized
fluctuations in the functional magnetic resonance imaging signal. If measured
during rest, they map the intrinsic functional architecture of the brain. With
task-driven experiments they represent integration mechanisms between
specialized brain areas. Analyzing their variability across subjects and
conditions can reveal markers of brain pathologies and mechanisms underlying
cognition. Methods of estimating functional connectomes from the imaging signal
have undergone rapid developments and the literature is full of diverse
strategies for comparing them. This review aims to clarify links across
functional-connectivity methods as well as to expose different steps to perform
a group study of functional connectomes
Low-complexity Multiclass Encryption by Compressed Sensing
The idea that compressed sensing may be used to encrypt information from
unauthorised receivers has already been envisioned, but never explored in depth
since its security may seem compromised by the linearity of its encoding
process. In this paper we apply this simple encoding to define a general
private-key encryption scheme in which a transmitter distributes the same
encoded measurements to receivers of different classes, which are provided
partially corrupted encoding matrices and are thus allowed to decode the
acquired signal at provably different levels of recovery quality.
The security properties of this scheme are thoroughly analysed: firstly, the
properties of our multiclass encryption are theoretically investigated by
deriving performance bounds on the recovery quality attained by lower-class
receivers with respect to high-class ones. Then we perform a statistical
analysis of the measurements to show that, although not perfectly secure,
compressed sensing grants some level of security that comes at almost-zero cost
and thus may benefit resource-limited applications.
In addition to this we report some exemplary applications of multiclass
encryption by compressed sensing of speech signals, electrocardiographic tracks
and images, in which quality degradation is quantified as the impossibility of
some feature extraction algorithms to obtain sensitive information from
suitably degraded signal recoveries.Comment: IEEE Transactions on Signal Processing, accepted for publication.
Article in pres
- …