205 research outputs found
Overlapping stochastic block models with application to the French political blogosphere
Complex systems in nature and in society are often represented as networks,
describing the rich set of interactions between objects of interest. Many
deterministic and probabilistic clustering methods have been developed to
analyze such structures. Given a network, almost all of them partition the
vertices into disjoint clusters, according to their connection profile.
However, recent studies have shown that these techniques were too restrictive
and that most of the existing networks contained overlapping clusters. To
tackle this issue, we present in this paper the Overlapping Stochastic Block
Model. Our approach allows the vertices to belong to multiple clusters, and, to
some extent, generalizes the well-known Stochastic Block Model [Nowicki and
Snijders (2001)]. We show that the model is generically identifiable within
classes of equivalence and we propose an approximate inference procedure, based
on global and local variational techniques. Using toy data sets as well as the
French Political Blogosphere network and the transcriptional network of
Saccharomyces cerevisiae, we compare our work with other approaches.Comment: Published in at http://dx.doi.org/10.1214/10-AOAS382 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Defining a robust biological prior from Pathway Analysis to drive Network Inference
Inferring genetic networks from gene expression data is one of the most
challenging work in the post-genomic era, partly due to the vast space of
possible networks and the relatively small amount of data available. In this
field, Gaussian Graphical Model (GGM) provides a convenient framework for the
discovery of biological networks. In this paper, we propose an original
approach for inferring gene regulation networks using a robust biological prior
on their structure in order to limit the set of candidate networks.
Pathways, that represent biological knowledge on the regulatory networks,
will be used as an informative prior knowledge to drive Network Inference. This
approach is based on the selection of a relevant set of genes, called the
"molecular signature", associated with a condition of interest (for instance,
the genes involved in disease development). In this context, differential
expression analysis is a well established strategy. However outcome signatures
are often not consistent and show little overlap between studies. Thus, we will
dedicate the first part of our work to the improvement of the standard process
of biomarker identification to guarantee the robustness and reproducibility of
the molecular signature.
Our approach enables to compare the networks inferred between two conditions
of interest (for instance case and control networks) and help along the
biological interpretation of results. Thus it allows to identify differential
regulations that occur in these conditions. We illustrate the proposed approach
by applying our method to a study of breast cancer's response to treatment
Incomplete graphical model inference via latent tree aggregation
Graphical network inference is used in many fields such as genomics or
ecology to infer the conditional independence structure between variables, from
measurements of gene expression or species abundances for instance. In many
practical cases, not all variables involved in the network have been observed,
and the samples are actually drawn from a distribution where some variables
have been marginalized out. This challenges the sparsity assumption commonly
made in graphical model inference, since marginalization yields locally dense
structures, even when the original network is sparse. We present a procedure
for inferring Gaussian graphical models when some variables are unobserved,
that accounts both for the influence of missing variables and the low density
of the original network. Our model is based on the aggregation of spanning
trees, and the estimation procedure on the Expectation-Maximization algorithm.
We treat the graph structure and the unobserved nodes as missing variables and
compute posterior probabilities of edge appearance. To provide a complete
methodology, we also propose several model selection criteria to estimate the
number of missing nodes. A simulation study and an illustration flow cytometry
data reveal that our method has favorable edge detection properties compared to
existing graph inference techniques. The methods are implemented in an R
package
Inferring Multiple Graphical Structures
Gaussian Graphical Models provide a convenient framework for representing
dependencies between variables. Recently, this tool has received a high
interest for the discovery of biological networks. The literature focuses on
the case where a single network is inferred from a set of measurements, but, as
wetlab data is typically scarce, several assays, where the experimental
conditions affect interactions, are usually merged to infer a single network.
In this paper, we propose two approaches for estimating multiple related
graphs, by rendering the closeness assumption into an empirical prior or group
penalties. We provide quantitative results demonstrating the benefits of the
proposed approaches. The methods presented in this paper are embeded in the R
package 'simone' from version 1.0-0 and later
Clustering based on Random Graph Model embedding Vertex Features
Large datasets with interactions between objects are common to numerous
scientific fields (i.e. social science, internet, biology...). The interactions
naturally define a graph and a common way to explore or summarize such dataset
is graph clustering. Most techniques for clustering graph vertices just use the
topology of connections ignoring informations in the vertices features. In this
paper, we provide a clustering algorithm exploiting both types of data based on
a statistical model with latent structure characterizing each vertex both by a
vector of features as well as by its connectivity. We perform simulations to
compare our algorithm with existing approaches, and also evaluate our method
with real datasets based on hyper-textual documents. We find that our algorithm
successfully exploits whatever information is found both in the connectivity
pattern and in the features
Weighted-Lasso for Structured Network Inference from Time Course Data
We present a weighted-Lasso method to infer the parameters of a first-order
vector auto-regressive model that describes time course expression data
generated by directed gene-to-gene regulation networks. These networks are
assumed to own a prior internal structure of connectivity which drives the
inference method. This prior structure can be either derived from prior
biological knowledge or inferred by the method itself. We illustrate the
performance of this structure-based penalization both on synthetic data and on
two canonical regulatory networks, first yeast cell cycle regulation network by
analyzing Spellman et al's dataset and second E. coli S.O.S. DNA repair network
by analysing U. Alon's lab data
- …