11,235 research outputs found
Nonparametric Detection of Geometric Structures over Networks
Nonparametric detection of existence of an anomalous structure over a network
is investigated. Nodes corresponding to the anomalous structure (if one exists)
receive samples generated by a distribution q, which is different from a
distribution p generating samples for other nodes. If an anomalous structure
does not exist, all nodes receive samples generated by p. It is assumed that
the distributions p and q are arbitrary and unknown. The goal is to design
statistically consistent tests with probability of errors converging to zero as
the network size becomes asymptotically large. Kernel-based tests are proposed
based on maximum mean discrepancy that measures the distance between mean
embeddings of distributions into a reproducing kernel Hilbert space. Detection
of an anomalous interval over a line network is first studied. Sufficient
conditions on minimum and maximum sizes of candidate anomalous intervals are
characterized in order to guarantee the proposed test to be consistent. It is
also shown that certain necessary conditions must hold to guarantee any test to
be universally consistent. Comparison of sufficient and necessary conditions
yields that the proposed test is order-level optimal and nearly optimal
respectively in terms of minimum and maximum sizes of candidate anomalous
intervals. Generalization of the results to other networks is further
developed. Numerical results are provided to demonstrate the performance of the
proposed tests.Comment: Submitted for journal publication in November 2015. arXiv admin note:
text overlap with arXiv:1404.029
Detection of an anomalous cluster in a network
We consider the problem of detecting whether or not, in a given sensor
network, there is a cluster of sensors which exhibit an "unusual behavior."
Formally, suppose we are given a set of nodes and attach a random variable to
each node. We observe a realization of this process and want to decide between
the following two hypotheses: under the null, the variables are i.i.d. standard
normal; under the alternative, there is a cluster of variables that are i.i.d.
normal with positive mean and unit variance, while the rest are i.i.d. standard
normal. We also address surveillance settings where each sensor in the network
collects information over time. The resulting model is similar, now with a time
series attached to each node. We again observe the process over time and want
to decide between the null, where all the variables are i.i.d. standard normal,
and the alternative, where there is an emerging cluster of i.i.d. normal
variables with positive mean and unit variance. The growth models used to
represent the emerging cluster are quite general and, in particular, include
cellular automata used in modeling epidemics. In both settings, we consider
classes of clusters that are quite general, for which we obtain a lower bound
on their respective minimax detection rate and show that some form of scan
statistic, by far the most popular method in practice, achieves that same rate
to within a logarithmic factor. Our results are not limited to the normal
location model, but generalize to any one-parameter exponential family when the
anomalous clusters are large enough.Comment: Published in at http://dx.doi.org/10.1214/10-AOS839 the Annals of
Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical
Statistics (http://www.imstat.org
A network approach to topic models
One of the main computational and scientific challenges in the modern age is
to extract useful information from unstructured texts. Topic models are one
popular machine-learning approach which infers the latent topical structure of
a collection of documents. Despite their success --- in particular of its most
widely used variant called Latent Dirichlet Allocation (LDA) --- and numerous
applications in sociology, history, and linguistics, topic models are known to
suffer from severe conceptual and practical problems, e.g. a lack of
justification for the Bayesian priors, discrepancies with statistical
properties of real texts, and the inability to properly choose the number of
topics. Here we obtain a fresh view on the problem of identifying topical
structures by relating it to the problem of finding communities in complex
networks. This is achieved by representing text corpora as bipartite networks
of documents and words. By adapting existing community-detection methods --
using a stochastic block model (SBM) with non-parametric priors -- we obtain a
more versatile and principled framework for topic modeling (e.g., it
automatically detects the number of topics and hierarchically clusters both the
words and documents). The analysis of artificial and real corpora demonstrates
that our SBM approach leads to better topic models than LDA in terms of
statistical model selection. More importantly, our work shows how to formally
relate methods from community detection and topic modeling, opening the
possibility of cross-fertilization between these two fields.Comment: 22 pages, 10 figures, code available at https://topsbm.github.io
Automatic Bayesian Density Analysis
Making sense of a dataset in an automatic and unsupervised fashion is a
challenging problem in statistics and AI. Classical approaches for {exploratory
data analysis} are usually not flexible enough to deal with the uncertainty
inherent to real-world data: they are often restricted to fixed latent
interaction models and homogeneous likelihoods; they are sensitive to missing,
corrupt and anomalous data; moreover, their expressiveness generally comes at
the price of intractable inference. As a result, supervision from statisticians
is usually needed to find the right model for the data. However, since domain
experts are not necessarily also experts in statistics, we propose Automatic
Bayesian Density Analysis (ABDA) to make exploratory data analysis accessible
at large. Specifically, ABDA allows for automatic and efficient missing value
estimation, statistical data type and likelihood discovery, anomaly detection
and dependency structure mining, on top of providing accurate density
estimation. Extensive empirical evidence shows that ABDA is a suitable tool for
automatic exploratory analysis of mixed continuous and discrete tabular data.Comment: In proceedings of the Thirty-Third AAAI Conference on Artificial
Intelligence (AAAI-19
Resolving Structure in Human Brain Organization: Identifying Mesoscale Organization in Weighted Network Representations
Human brain anatomy and function display a combination of modular and
hierarchical organization, suggesting the importance of both cohesive
structures and variable resolutions in the facilitation of healthy cognitive
processes. However, tools to simultaneously probe these features of brain
architecture require further development. We propose and apply a set of methods
to extract cohesive structures in network representations of brain connectivity
using multi-resolution techniques. We employ a combination of soft
thresholding, windowed thresholding, and resolution in community detection,
that enable us to identify and isolate structures associated with different
weights. One such mesoscale structure is bipartivity, which quantifies the
extent to which the brain is divided into two partitions with high connectivity
between partitions and low connectivity within partitions. A second,
complementary mesoscale structure is modularity, which quantifies the extent to
which the brain is divided into multiple communities with strong connectivity
within each community and weak connectivity between communities. Our methods
lead to multi-resolution curves of these network diagnostics over a range of
spatial, geometric, and structural scales. For statistical comparison, we
contrast our results with those obtained for several benchmark null models. Our
work demonstrates that multi-resolution diagnostic curves capture complex
organizational profiles in weighted graphs. We apply these methods to the
identification of resolution-specific characteristics of healthy weighted graph
architecture and altered connectivity profiles in psychiatric disease.Comment: Comments welcom
- …