1,623 research outputs found
Sparse adaptive Dirichlet-multinomial-like processes
Online estimation and modelling of i.i.d. data for short
sequences over large or complex ''alphabets'' is a ubiquitous
(sub)problem in machine learning, information theory, data
compression, statistical language processing, and document
analysis. The Dirichlet-Multinomial distribution (also called
Polya urn scheme) and extensions thereof are widely applied for
online i.i.d. estimation. Good a-priori choices for the
parameters in this regime are difficult to obtain though. I
derive an optimal adaptive choice for the main parameter via
tight, data-dependent redundancy bounds for a related model. The
1-line recommendation is to set the 'total mass' = 'precision' =
'concentration' parameter to m/2ln[(n+1)/m], where n
is the (past) sample size and m the number of different symbols
observed (so far). The resulting estimator is simple, online,
fast, and experimental performance is superb
Bayesian anomaly detection methods for social networks
Learning the network structure of a large graph is computationally demanding,
and dynamically monitoring the network over time for any changes in structure
threatens to be more challenging still. This paper presents a two-stage method
for anomaly detection in dynamic graphs: the first stage uses simple, conjugate
Bayesian models for discrete time counting processes to track the pairwise
links of all nodes in the graph to assess normality of behavior; the second
stage applies standard network inference tools on a greatly reduced subset of
potentially anomalous nodes. The utility of the method is demonstrated on
simulated and real data sets.Comment: Published in at http://dx.doi.org/10.1214/10-AOAS329 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
- …