3,542 research outputs found
Modeling heterogeneity in random graphs through latent space models: a selective review
We present a selective review on probabilistic modeling of heterogeneity in
random graphs. We focus on latent space models and more particularly on
stochastic block models and their extensions that have undergone major
developments in the last five years
Statistical clustering of temporal networks through a dynamic stochastic block model
Statistical node clustering in discrete time dynamic networks is an emerging
field that raises many challenges. Here, we explore statistical properties and
frequentist inference in a model that combines a stochastic block model (SBM)
for its static part with independent Markov chains for the evolution of the
nodes groups through time. We model binary data as well as weighted dynamic
random graphs (with discrete or continuous edges values). Our approach,
motivated by the importance of controlling for label switching issues across
the different time steps, focuses on detecting groups characterized by a stable
within group connectivity behavior. We study identifiability of the model
parameters, propose an inference procedure based on a variational expectation
maximization algorithm as well as a model selection criterion to select for the
number of groups. We carefully discuss our initialization strategy which plays
an important role in the method and compare our procedure with existing ones on
synthetic datasets. We also illustrate our approach on dynamic contact
networks, one of encounters among high school students and two others on animal
interactions. An implementation of the method is available as a R package
called dynsbm
Model Selection in Overlapping Stochastic Block Models
Networks are a commonly used mathematical model to describe the rich set of
interactions between objects of interest. Many clustering methods have been
developed in order to partition such structures, among which several rely on
underlying probabilistic models, typically mixture models. The relevant hidden
structure may however show overlapping groups in several applications. The
Overlapping Stochastic Block Model (2011) has been developed to take this
phenomenon into account. Nevertheless, the problem of the choice of the number
of classes in the inference step is still open. To tackle this issue, we
consider the proposed model in a Bayesian framework and develop a new criterion
based on a non asymptotic approximation of the marginal log-likelihood. We
describe how the criterion can be computed through a variational Bayes EM
algorithm, and demonstrate its efficiency by running it on both simulated and
real data.Comment: articl
Machine Learning and Integrative Analysis of Biomedical Big Data.
Recent developments in high-throughput technologies have accelerated the accumulation of massive amounts of omics data from multiple sources: genome, epigenome, transcriptome, proteome, metabolome, etc. Traditionally, data from each source (e.g., genome) is analyzed in isolation using statistical and machine learning (ML) methods. Integrative analysis of multi-omics and clinical data is key to new biomedical discoveries and advancements in precision medicine. However, data integration poses new computational challenges as well as exacerbates the ones associated with single-omics studies. Specialized computational approaches are required to effectively and efficiently perform integrative analysis of biomedical data acquired from diverse modalities. In this review, we discuss state-of-the-art ML-based approaches for tackling five specific computational challenges associated with integrative analysis: curse of dimensionality, data heterogeneity, missing data, class imbalance and scalability issues
- …