4,243 research outputs found
Learning Laplacian Matrix in Smooth Graph Signal Representations
The construction of a meaningful graph plays a crucial role in the success of
many graph-based representations and algorithms for handling structured data,
especially in the emerging field of graph signal processing. However, a
meaningful graph is not always readily available from the data, nor easy to
define depending on the application domain. In particular, it is often
desirable in graph signal processing applications that a graph is chosen such
that the data admit certain regularity or smoothness on the graph. In this
paper, we address the problem of learning graph Laplacians, which is equivalent
to learning graph topologies, such that the input data form graph signals with
smooth variations on the resulting topology. To this end, we adopt a factor
analysis model for the graph signals and impose a Gaussian probabilistic prior
on the latent variables that control these signals. We show that the Gaussian
prior leads to an efficient representation that favors the smoothness property
of the graph signals. We then propose an algorithm for learning graphs that
enforces such property and is based on minimizing the variations of the signals
on the learned graph. Experiments on both synthetic and real world data
demonstrate that the proposed graph learning framework can efficiently infer
meaningful graph topologies from signal observations under the smoothness
prior
A comparative study of covariance selection models for the inference of gene regulatory networks
Display Omitted Three different models for inferring gene networks from microarray data are proposed.The most sensitive approach is selected by an exhaustive simulation study.The method reveals a cross-talk between the isoprenoid biosynthesis pathways in Arabidopsis thaliana.The method highlights 9 genes in HRAS signature regulated by the transcription factor RREB1. MotivationThe inference, or 'reverse-engineering', of gene regulatory networks from expression data and the description of the complex dependency structures among genes are open issues in modern molecular biology. ResultsIn this paper we compared three regularized methods of covariance selection for the inference of gene regulatory networks, developed to circumvent the problems raising when the number of observations n is smaller than the number of genes p. The examined approaches provided three alternative estimates of the inverse covariance matrix: (a) the 'PINV' method is based on the Moore-Penrose pseudoinverse, (b) the 'RCM' method performs correlation between regression residuals and (c) '?2C' method maximizes a properly regularized log-likelihood function. Our extensive simulation studies showed that ?2C outperformed the other two methods having the most predictive partial correlation estimates and the highest values of sensitivity to infer conditional dependencies between genes even when a few number of observations was available. The application of this method for inferring gene networks of the isoprenoid biosynthesis pathways in Arabidopsis thaliana allowed to enlighten a negative partial correlation coefficient between the two hubs in the two isoprenoid pathways and, more importantly, provided an evidence of cross-talk between genes in the plastidial and the cytosolic pathways. When applied to gene expression data relative to a signature of HRAS oncogene in human cell cultures, the method revealed 9 genes (p-value<0.0005) directly interacting with HRAS, sharing the same Ras-responsive binding site for the transcription factor RREB1. This result suggests that the transcriptional activation of these genes is mediated by a common transcription factor downstream of Ras signaling. AvailabilitySoftware implementing the methods in the form of Matlab scripts are available at: http://users.ba.cnr.it/issia/iesina18/CovSelModelsCodes.zip
State-space solutions to the dynamic magnetoencephalography inverse problem using high performance computing
Determining the magnitude and location of neural sources within the brain
that are responsible for generating magnetoencephalography (MEG) signals
measured on the surface of the head is a challenging problem in functional
neuroimaging. The number of potential sources within the brain exceeds by an
order of magnitude the number of recording sites. As a consequence, the
estimates for the magnitude and location of the neural sources will be
ill-conditioned because of the underdetermined nature of the problem. One
well-known technique designed to address this imbalance is the minimum norm
estimator (MNE). This approach imposes an regularization constraint that
serves to stabilize and condition the source parameter estimates. However,
these classes of regularizer are static in time and do not consider the
temporal constraints inherent to the biophysics of the MEG experiment. In this
paper we propose a dynamic state-space model that accounts for both spatial and
temporal correlations within and across candidate intracortical sources. In our
model, the observation model is derived from the steady-state solution to
Maxwell's equations while the latent model representing neural dynamics is
given by a random walk process.Comment: Published in at http://dx.doi.org/10.1214/11-AOAS483 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Foundational principles for large scale inference: Illustrations through correlation mining
When can reliable inference be drawn in the "Big Data" context? This paper
presents a framework for answering this fundamental question in the context of
correlation mining, with implications for general large scale inference. In
large scale data applications like genomics, connectomics, and eco-informatics
the dataset is often variable-rich but sample-starved: a regime where the
number of acquired samples (statistical replicates) is far fewer than the
number of observed variables (genes, neurons, voxels, or chemical
constituents). Much of recent work has focused on understanding the
computational complexity of proposed methods for "Big Data." Sample complexity
however has received relatively less attention, especially in the setting when
the sample size is fixed, and the dimension grows without bound. To
address this gap, we develop a unified statistical framework that explicitly
quantifies the sample complexity of various inferential tasks. Sampling regimes
can be divided into several categories: 1) the classical asymptotic regime
where the variable dimension is fixed and the sample size goes to infinity; 2)
the mixed asymptotic regime where both variable dimension and sample size go to
infinity at comparable rates; 3) the purely high dimensional asymptotic regime
where the variable dimension goes to infinity and the sample size is fixed.
Each regime has its niche but only the latter regime applies to exa-scale data
dimension. We illustrate this high dimensional framework for the problem of
correlation mining, where it is the matrix of pairwise and partial correlations
among the variables that are of interest. We demonstrate various regimes of
correlation mining based on the unifying perspective of high dimensional
learning rates and sample complexity for different structured covariance models
and different inference tasks
- …