659 research outputs found

### Statistical clustering of temporal networks through a dynamic stochastic block model

Statistical node clustering in discrete time dynamic networks is an emerging
field that raises many challenges. Here, we explore statistical properties and
frequentist inference in a model that combines a stochastic block model (SBM)
for its static part with independent Markov chains for the evolution of the
nodes groups through time. We model binary data as well as weighted dynamic
random graphs (with discrete or continuous edges values). Our approach,
motivated by the importance of controlling for label switching issues across
the different time steps, focuses on detecting groups characterized by a stable
within group connectivity behavior. We study identifiability of the model
parameters, propose an inference procedure based on a variational expectation
maximization algorithm as well as a model selection criterion to select for the
number of groups. We carefully discuss our initialization strategy which plays
an important role in the method and compare our procedure with existing ones on
synthetic datasets. We also illustrate our approach on dynamic contact
networks, one of encounters among high school students and two others on animal
interactions. An implementation of the method is available as a R package
called dynsbm

### Convergence of the groups posterior distribution in latent or stochastic block models

We propose a unified framework for studying both latent and stochastic block
models, which are used to cluster simultaneously rows and columns of a data
matrix. In this new framework, we study the behaviour of the groups posterior
distribution, given the data. We characterize whether it is possible to
asymptotically recover the actual groups on the rows and columns of the matrix,
relying on a consistent estimate of the parameter. In other words, we establish
sufficient conditions for the groups posterior distribution to converge (as the
size of the data increases) to a Dirac mass located at the actual (random)
groups configuration. In particular, we highlight some cases where the model
assumes symmetries in the matrix of connection probabilities that prevents
recovering the original groups. We also discuss the validity of these results
when the proportion of non-null entries in the data matrix converges to zero.Comment: Published at http://dx.doi.org/10.3150/13-BEJ579 in the Bernoulli
(http://isi.cbs.nl/bernoulli/) by the International Statistical
Institute/Bernoulli Society (http://isi.cbs.nl/BS/bshome.htm

### Modeling heterogeneity in random graphs through latent space models: a selective review

We present a selective review on probabilistic modeling of heterogeneity in
random graphs. We focus on latent space models and more particularly on
stochastic block models and their extensions that have undergone major
developments in the last five years

### On efficient estimators of the proportion of true null hypotheses in a multiple testing setup

We consider the problem of estimating the proportion $\theta$ of true null
hypotheses in a multiple testing context. The setup is classically modeled
through a semiparametric mixture with two components: a uniform distribution on
interval $[0,1]$ with prior probability $\theta$ and a nonparametric density
$f$. We discuss asymptotic efficiency results and establish that two different
cases occur whether $f$ vanishes on a set with non null Lebesgue measure or
not. In the first case, we exhibit estimators converging at parametric rate,
compute the optimal asymptotic variance and conjecture that no estimator is
asymptotically efficient (i.e. attains the optimal asymptotic variance). In the
second case, we prove that the quadratic risk of any estimator does not
converge at parametric rate. We illustrate those results on simulated data

### Nonparametric estimation of the density of the alternative hypothesis in a multiple testing setup. Application to local false discovery rate estimation

In a multiple testing context, we consider a semiparametric mixture model
with two components where one component is known and corresponds to the
distribution of $p$-values under the null hypothesis and the other component
$f$ is nonparametric and stands for the distribution under the alternative
hypothesis. Motivated by the issue of local false discovery rate estimation, we
focus here on the estimation of the nonparametric unknown component $f$ in the
mixture, relying on a preliminary estimator of the unknown proportion $\theta$
of true null hypotheses. We propose and study the asymptotic properties of two
different estimators for this unknown component. The first estimator is a
randomly weighted kernel estimator. We establish an upper bound for its
pointwise quadratic risk, exhibiting the classical nonparametric rate of
convergence over a class of H\"older densities. To our knowledge, this is the
first result establishing convergence as well as corresponding rate for the
estimation of the unknown component in this nonparametric mixture. The second
estimator is a maximum smoothed likelihood estimator. It is computed through an
iterative algorithm, for which we establish a descent property. In addition,
these estimators are used in a multiple testing procedure in order to estimate
the local false discovery rate. Their respective performances are then compared
on synthetic data

### Adaptive procedures in convolution models with known or partially known noise distribution

In a convolution model, we observe random variables whose distribution is the
convolution of some unknown density f and some known or partially known noise
density g. In this paper, we focus on statistical procedures, which are
adaptive with respect to the smoothness parameter tau of unknown density f, and
also (in some cases) to some unknown parameter of the noise density g. In a
first part, we assume that g is known and polynomially smooth. We provide
goodness-of-fit procedures for the test H_0:f=f_0, where the alternative H_1 is
expressed with respect to L_2-norm. Our adaptive (w.r.t tau) procedure behaves
differently according to whether f_0 is polynomially or exponentially smooth. A
payment for adaptation is noted in both cases and for computing this, we
provide a non-uniform Berry-Esseen type theorem for degenerate U-statistics. In
the first case we prove that the payment for adaptation is optimal (thus
unavoidable). In a second part, we study a wider framework: a semiparametric
model, where g is exponentially smooth and stable, and its self-similarity
index s is unknown. In order to ensure identifiability, we restrict our
attention to polynomially smooth, Sobolev-type densities f. In this context, we
provide a consistent estimation procedure for s. This estimator is then
plugged-into three different procedures: estimation of the unknown density f,
of the functional \int f^2 and test of the hypothesis H_0. These procedures are
adaptive with respect to both s and tau and attain the rates which are known
optimal for known values of s and tau. As a by-product, when the noise is known
and exponentially smooth our testing procedure is adaptive for testing
Sobolev-type densities.Comment: 35 pages + annexe de 8 page

### Asymptotic normality and efficiency of the maximum likelihood estimator for the parameter of a ballistic random walk in a random environment

We consider a one dimensional ballistic random walk evolving in a parametric
independent and identically distributed random environment. We study the
asymptotic properties of the maximum likelihood estimator of the parameter
based on a single observation of the path till the time it reaches a distant
site. We prove an asymptotic normality result for this consistent estimator as
the distant site tends to infinity and establish that it achieves the
Cram\'er-Rao bound. We also explore in a simulation setting the numerical
behaviour of asymptotic confidence regions for the parameter value

### A semiparametric extension of the stochastic block model for longitudinal networks

To model recurrent interaction events in continuous time, an extension of the
stochastic block model is proposed where every individual belongs to a latent
group and interactions between two individuals follow a conditional
inhomogeneous Poisson process with intensity driven by the individuals' latent
groups. The model is shown to be identifiable and its estimation is based on a
semiparametric variational expectation-maximization algorithm. Two versions of
the method are developed, using either a nonparametric histogram approach (with
an adaptive choice of the partition size) or kernel intensity estimators. The
number of latent groups can be selected by an integrated classification
likelihood criterion. Finally, we demonstrate the performance of our procedure
on synthetic experiments, analyse two datasets to illustrate the utility of our
approach and comment on competing methods

### Identifiability of parameters in latent structure models with many observed variables

While hidden class models of various types arise in many statistical
applications, it is often difficult to establish the identifiability of their
parameters. Focusing on models in which there is some structure of independence
of some of the observed variables conditioned on hidden ones, we demonstrate a
general approach for establishing identifiability utilizing algebraic
arguments. A theorem of J. Kruskal for a simple latent-class model with finite
state space lies at the core of our results, though we apply it to a diverse
set of models. These include mixtures of both finite and nonparametric product
distributions, hidden Markov models and random graph mixture models, and lead
to a number of new results and improvements to old ones. In the parametric
setting, this approach indicates that for such models, the classical definition
of identifiability is typically too strong. Instead generic identifiability
holds, which implies that the set of nonidentifiable parameters has measure
zero, so that parameter inference is still meaningful. In particular, this
sheds light on the properties of finite mixtures of Bernoulli products, which
have been used for decades despite being known to have nonidentifiable
parameters. In the nonparametric setting, we again obtain identifiability only
when certain restrictions are placed on the distributions that are mixed, but
we explicitly describe the conditions.Comment: Published in at http://dx.doi.org/10.1214/09-AOS689 the Annals of
Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical
Statistics (http://www.imstat.org

- â€¦