12,118 research outputs found
The Role of Normalization in the Belief Propagation Algorithm
An important part of problems in statistical physics and computer science can
be expressed as the computation of marginal probabilities over a Markov Random
Field. The belief propagation algorithm, which is an exact procedure to compute
these marginals when the underlying graph is a tree, has gained its popularity
as an efficient way to approximate them in the more general case. In this
paper, we focus on an aspect of the algorithm that did not get that much
attention in the literature, which is the effect of the normalization of the
messages. We show in particular that, for a large class of normalization
strategies, it is possible to focus only on belief convergence. Following this,
we express the necessary and sufficient conditions for local stability of a
fixed point in terms of the graph structure and the beliefs values at the fixed
point. We also explicit some connexion between the normalization constants and
the underlying Bethe Free Energy
Local stability of Belief Propagation algorithm with multiple fixed points
A number of problems in statistical physics and computer science can be
expressed as the computation of marginal probabilities over a Markov random
field. Belief propagation, an iterative message-passing algorithm, computes
exactly such marginals when the underlying graph is a tree. But it has gained
its popularity as an efficient way to approximate them in the more general
case, even if it can exhibits multiple fixed points and is not guaranteed to
converge. In this paper, we express a new sufficient condition for local
stability of a belief propagation fixed point in terms of the graph structure
and the beliefs values at the fixed point. This gives credence to the usual
understanding that Belief Propagation performs better on sparse graphs.Comment: arXiv admin note: substantial text overlap with arXiv:1101.417
Belief-propagation algorithm and the Ising model on networks with arbitrary distributions of motifs
We generalize the belief-propagation algorithm to sparse random networks with
arbitrary distributions of motifs (triangles, loops, etc.). Each vertex in
these networks belongs to a given set of motifs (generalization of the
configuration model). These networks can be treated as sparse uncorrelated
hypergraphs in which hyperedges represent motifs. Here a hypergraph is a
generalization of a graph, where a hyperedge can connect any number of
vertices. These uncorrelated hypergraphs are tree-like (hypertrees), which
crucially simplify the problem and allow us to apply the belief-propagation
algorithm to these loopy networks with arbitrary motifs. As natural examples,
we consider motifs in the form of finite loops and cliques. We apply the
belief-propagation algorithm to the ferromagnetic Ising model on the resulting
random networks. We obtain an exact solution of this model on networks with
finite loops or cliques as motifs. We find an exact critical temperature of the
ferromagnetic phase transition and demonstrate that with increasing the
clustering coefficient and the loop size, the critical temperature increases
compared to ordinary tree-like complex networks. Our solution also gives the
birth point of the giant connected component in these loopy networks.Comment: 9 pages, 4 figure
Clustering by soft-constraint affinity propagation: Applications to gene-expression data
Motivation: Similarity-measure based clustering is a crucial problem
appearing throughout scientific data analysis. Recently, a powerful new
algorithm called Affinity Propagation (AP) based on message-passing techniques
was proposed by Frey and Dueck \cite{Frey07}. In AP, each cluster is identified
by a common exemplar all other data points of the same cluster refer to, and
exemplars have to refer to themselves. Albeit its proved power, AP in its
present form suffers from a number of drawbacks. The hard constraint of having
exactly one exemplar per cluster restricts AP to classes of regularly shaped
clusters, and leads to suboptimal performance, {\it e.g.}, in analyzing gene
expression data. Results: This limitation can be overcome by relaxing the AP
hard constraints. A new parameter controls the importance of the constraints
compared to the aim of maximizing the overall similarity, and allows to
interpolate between the simple case where each data point selects its closest
neighbor as an exemplar and the original AP. The resulting soft-constraint
affinity propagation (SCAP) becomes more informative, accurate and leads to
more stable clustering. Even though a new {\it a priori} free-parameter is
introduced, the overall dependence of the algorithm on external tuning is
reduced, as robustness is increased and an optimal strategy for parameter
selection emerges more naturally. SCAP is tested on biological benchmark data,
including in particular microarray data related to various cancer types. We
show that the algorithm efficiently unveils the hierarchical cluster structure
present in the data sets. Further on, it allows to extract sparse gene
expression signatures for each cluster.Comment: 11 pages, supplementary material:
http://isiosf.isi.it/~weigt/scap_supplement.pd
Investigation of commuting Hamiltonian in quantum Markov network
Graphical Models have various applications in science and engineering which
include physics, bioinformatics, telecommunication and etc. Usage of graphical
models needs complex computations in order to evaluation of marginal
functions,so there are some powerful methods including mean field
approximation, belief propagation algorithm and etc. Quantum graphical models
have been recently developed in context of quantum information and computation,
and quantum statistical physics, which is possible by generalization of
classical probability theory to quantum theory. The main goal of this paper is
preparing a primary generalization of Markov network, as a type of graphical
models, to quantum case and applying in quantum statistical physics.We have
investigated the Markov network and the role of commuting Hamiltonian terms in
conditional independence with simple examples of quantum statistical physics.Comment: 11 pages, 8 figure
Detection of Early-Stage Enterprise Infection by Mining Large-Scale Log Data
Recent years have seen the rise of more sophisticated attacks including
advanced persistent threats (APTs) which pose severe risks to organizations and
governments by targeting confidential proprietary information. Additionally,
new malware strains are appearing at a higher rate than ever before. Since many
of these malware are designed to evade existing security products, traditional
defenses deployed by most enterprises today, e.g., anti-virus, firewalls,
intrusion detection systems, often fail at detecting infections at an early
stage.
We address the problem of detecting early-stage infection in an enterprise
setting by proposing a new framework based on belief propagation inspired from
graph theory. Belief propagation can be used either with "seeds" of compromised
hosts or malicious domains (provided by the enterprise security operation
center -- SOC) or without any seeds. In the latter case we develop a detector
of C&C communication particularly tailored to enterprises which can detect a
stealthy compromise of only a single host communicating with the C&C server.
We demonstrate that our techniques perform well on detecting enterprise
infections. We achieve high accuracy with low false detection and false
negative rates on two months of anonymized DNS logs released by Los Alamos
National Lab (LANL), which include APT infection attacks simulated by LANL
domain experts. We also apply our algorithms to 38TB of real-world web proxy
logs collected at the border of a large enterprise. Through careful manual
investigation in collaboration with the enterprise SOC, we show that our
techniques identified hundreds of malicious domains overlooked by
state-of-the-art security products
Shaping the learning landscape in neural networks around wide flat minima
Learning in Deep Neural Networks (DNN) takes place by minimizing a non-convex
high-dimensional loss function, typically by a stochastic gradient descent
(SGD) strategy. The learning process is observed to be able to find good
minimizers without getting stuck in local critical points, and that such
minimizers are often satisfactory at avoiding overfitting. How these two
features can be kept under control in nonlinear devices composed of millions of
tunable connections is a profound and far reaching open question. In this paper
we study basic non-convex one- and two-layer neural network models which learn
random patterns, and derive a number of basic geometrical and algorithmic
features which suggest some answers. We first show that the error loss function
presents few extremely wide flat minima (WFM) which coexist with narrower
minima and critical points. We then show that the minimizers of the
cross-entropy loss function overlap with the WFM of the error loss. We also
show examples of learning devices for which WFM do not exist. From the
algorithmic perspective we derive entropy driven greedy and message passing
algorithms which focus their search on wide flat regions of minimizers. In the
case of SGD and cross-entropy loss, we show that a slow reduction of the norm
of the weights along the learning process also leads to WFM. We corroborate the
results by a numerical study of the correlations between the volumes of the
minimizers, their Hessian and their generalization performance on real data.Comment: 37 pages (16 main text), 10 figures (7 main text
Cluster Variation Method in Statistical Physics and Probabilistic Graphical Models
The cluster variation method (CVM) is a hierarchy of approximate variational
techniques for discrete (Ising--like) models in equilibrium statistical
mechanics, improving on the mean--field approximation and the Bethe--Peierls
approximation, which can be regarded as the lowest level of the CVM. In recent
years it has been applied both in statistical physics and to inference and
optimization problems formulated in terms of probabilistic graphical models.
The foundations of the CVM are briefly reviewed, and the relations with
similar techniques are discussed. The main properties of the method are
considered, with emphasis on its exactness for particular models and on its
asymptotic properties.
The problem of the minimization of the variational free energy, which arises
in the CVM, is also addressed, and recent results about both provably
convergent and message-passing algorithms are discussed.Comment: 36 pages, 17 figure
- …