30 research outputs found
Improving variational methods via pairwise linear response identities
nference methods are often formulated as variational approximations: these approxima-tions allow easy evaluation of statistics by marginalization or linear response, but theseestimates can be inconsistent. We show that by introducing constraints on covariance, onecan ensure consistency of linear response with the variational parameters, and in so doinginference of marginal probability distributions is improved. For the Bethe approximationand its generalizations, improvements are achieved with simple choices of the constraints.The approximations are presented as variational frameworks; iterative procedures relatedto message passing are provided for finding the minim
Message Passing for Optimization and Control of Power Grid: Model of Distribution System with Redundancy
We use a power grid model with generators and consumption units to
optimize the grid and its control. Each consumer demand is drawn from a
predefined finite-size-support distribution, thus simulating the instantaneous
load fluctuations. Each generator has a maximum power capability. A generator
is not overloaded if the sum of the loads of consumers connected to a generator
does not exceed its maximum production. In the standard grid each consumer is
connected only to its designated generator, while we consider a more general
organization of the grid allowing each consumer to select one generator
depending on the load from a pre-defined consumer-dependent and sufficiently
small set of generators which can all serve the load. The model grid is
interconnected in a graph with loops, drawn from an ensemble of random
bipartite graphs, while each allowed configuration of loaded links represent a
set of graph covering trees. Losses, the reactive character of the grid and the
transmission-level connections between generators (and many other details
relevant to realistic power grid) are ignored in this proof-of-principles
study. We focus on the asymptotic limit and we show that the interconnects
allow significant expansion of the parameter domains for which the probability
of a generator overload is asymptotically zero. Our construction explores the
formal relation between the problem of grid optimization and the modern theory
of sparse graphical models. We also design heuristic algorithms that achieve
the asymptotically optimal selection of loaded links. We conclude discussing
the ability of this approach to include other effects, such as a more realistic
modeling of the power grid and related optimization and control algorithms.Comment: 10 page
Using Constraint Satisfaction Techniques and Variational Methods for Probabilistic Reasoning
RÉSUMÉ
Cette thèse présente un certain nombre de contributions à la recherche pour la création de systèmes efficaces de raisonnement probabiliste sur les modèles graphiques de problèmes issus d'une variété d'applications scientifiques et d'ingénierie. Ce thème touche plusieurs sous-disciplines de l'intelligence artificielle. Généralement, la plupart de ces problèmes ont des modèles graphiques expressifs qui se traduisent par de grands réseaux impliquant déterminisme et des cycles, ce qui représente souvent un goulot d'étranglement pour tout système d'inférence probabiliste et affaiblit son exactitude ainsi que son évolutivité.
Conceptuellement, notre recherche confirme les hypothèses suivantes. D'abord, les techniques de satisfaction de contraintes et méthodes variationnelles peuvent être exploitées pour obtenir des algorithmes précis et évolutifs pour l'inférence probabiliste en présence de cycles et de déterminisme. Deuxièmement, certaines parties intrinsèques de la structure du modèle graphique peuvent se révéler bénéfiques pour l'inférence probabiliste sur les grands modèles graphiques, au lieu de poser un défi important pour elle. Troisièmement, le re-paramétrage du modèle graphique permet d'ajouter à sa structure des caractéristiques puissantes qu'on peut utiliser pour améliorer l'inférence probabiliste.
La première contribution majeure de cette thèse est la formulation d'une nouvelle approche de passage de messages (message-passing) pour inférer dans un graphe de facteurs étendu qui combine des techniques de satisfaction de contraintes et des méthodes variationnelles. Contrairement au message-passing standard, il formule sa structure sous forme d'étapes de maximisation de l'espérance variationnelle. Ainsi, on a de nouvelles règles de mise à jour des marginaux qui augmentent une borne inférieure à chaque mise à jour de manière à éviter le dépassement d'un point fixe. De plus, lors de l'étape d'espérance, nous mettons à profit les structures locales dans le graphe de facteurs en utilisant la cohérence d'arc généralisée pour effectuer une approximation de champ moyen variationnel.
La deuxième contribution majeure est la formulation d'une stratégie en deux étapes qui utilise le déterminisme présent dans la structure du modèle graphique pour améliorer l'évolutivité du problème d'inférence probabiliste. Dans cette stratégie, nous prenons en compte le fait que si le modèle sous-jacent implique des contraintes inviolables en plus des préférences, alors c'est potentiellement un gaspillage d'allouer de la mémoire pour toutes les contraintes à l'avance lors de l'exécution de l'inférence. Pour éviter cela, nous commençons par la relaxation des préférences et effectuons l'inférence uniquement avec les contraintes inviolables. Cela permet d'éviter les calculs inutiles impliquant les préférences et de réduire la taille effective du réseau graphique.
Enfin, nous développons une nouvelle famille d'algorithmes d'inférence par le passage de messages dans un graphe de facteurs étendus, paramétrées par un facteur de lissage (smoothing parameter). Cette famille permet d'identifier les épines dorsales (backbones) d'une grappe qui contient des solutions potentiellement optimales. Ces épines dorsales ne sont pas seulement des parties des solutions optimales, mais elles peuvent également être exploitées pour intensifier l'inférence MAP en les fixant de manière itérative afin de réduire les parties complexes jusqu'à ce que le réseau se réduise à un seul qui peut être résolu avec précision en utilisant une méthode MAP d'inférence classique. Nous décrivons ensuite des variantes paresseuses de cette famille d'algorithmes.
Expérimentalement, une évaluation empirique approfondie utilisant des applications du monde réel démontre la précision, la convergence et l'évolutivité de l'ensemble de nos algorithmes et stratégies par rapport aux algorithmes d'inférence existants de l'état de l'art.----------ABSTRACT
This thesis presents a number of research contributions pertaining to the theme of creating efficient probabilistic reasoning systems based on graphical models of real-world problems from relational domains. These models arise in a variety of scientific and engineering applications. Thus, the theme impacts several sub-disciplines of Artificial Intelligence. Commonly, most of these problems have expressive graphical models that translate into large probabilistic networks involving determinism and cycles. Such graphical models frequently represent a bottleneck for any probabilistic inference system and weaken its accuracy and scalability.
Conceptually, our research here hypothesizes and confirms that: First, constraint satisfaction techniques and variational methods can be exploited to yield accurate and scalable algorithms for probabilistic inference in the presence of cycles and determinism. Second, some intrinsic parts of the structure of the graphical model can turn out to be beneficial to probabilistic inference on large networks, instead of posing a significant challenge to it. Third, the proper re-parameterization of the graphical model can provide its structure with characteristics that we can use to improve probabilistic inference.
The first major contribution of this thesis is the formulation of a novel message-passing approach to inference in an extended factor graph that combines constraint satisfaction techniques with variational methods. In contrast to standard message-passing, it formulates the Message-Passing structure as steps of variational expectation maximization. Thus it has new marginal update rules that increase a lower bound at each marginal update in a way that avoids overshooting a fixed point. Moreover, in its expectation step, we leverage the local structures in the factor graph by using generalized arc consistency to perform a variational mean-field approximation.
The second major contribution is the formulation of a novel two-stage strategy that uses the determinism present in the graphical model's structure to improve the scalability of probabilistic inference. In this strategy, we take into account the fact that if the underlying model involves mandatory constraints as well as preferences then it is potentially wasteful to allocate memory for all constraints in advance when performing inference. To avoid this, we start by relaxing preferences and performing inference with hard constraints only. This helps avoid irrelevant computations involving preferences, and reduces the effective size of the graphical network.
Finally, we develop a novel family of message-passing algorithms for inference in an extended factor graph, parameterized by a smoothing parameter. This family allows one to find the ”backbones” of a cluster that involves potentially optimal solutions. The cluster's backbones are not only portions of the optimal solutions, but they also can be exploited for scaling MAP inference by iteratively fixing them to reduce the complex parts until the network is simplified into one that can be solved accurately using any conventional MAP inference method. We then describe lazy variants of this family of algorithms.
One limiting case of our approach corresponds to lazy survey propagation, which in itself is novel method which can yield state of the art performance.
We provide a thorough empirical evaluation using real-world applications. Our experiments demonstrate improvements to the accuracy, convergence and scalability of all our proposed algorithms and strategies over existing state-of-the-art inference algorithms
Belief Propagation for Min-Cost Network Flow: Convergence and Correctness
Distributed, iterative algorithms operating with minimal data structure while performing little computation per iteration are popularly known as message passing in the recent literature. Belief propagation (BP), a prototypical message-passing algorithm, has gained a lot of attention across disciplines, including communications, statistics, signal processing, and machine learning as an attractive, scalable, general-purpose heuristic for a wide class of optimization and statistical inference problems. Despite its empirical success, the theoretical understanding of BP is far from complete.
With the goal of advancing the state of art of our understanding of BP, we study the performance of BP in the context of the capacitated minimum-cost network flow problem—a cornerstone in the development of the theory of polynomial-time algorithms for optimization problems and widely used in the practice of operations research. As the main result of this paper, we prove that BP converges to the optimal solution in pseudopolynomial time, provided that the optimal solution of the underlying network flow problem instance is unique and the problem parameters are integral. We further provide a simple modification of the BP to obtain a fully polynomial-time randomized approximation scheme (FPRAS) without requiring uniqueness of the optimal solution. This is the first instance where BP is proved to have fully polynomial running time. Our results thus provide a theoretical justification for the viability of BP as an attractive method to solve an important class of optimization problems.National Science Foundation (U.S.). Career Project (CNS 0546590)Natural Sciences and Engineering Research Council of Canada (NSERC). Postdoctoral FellowshipNational Science Foundation (U.S.). EMT Project (CCF 0829893)National Science Foundation (U.S.). (CMMI-0726733
Graphical models beyond standard settings: lifted decimation, labeling, and counting
With increasing complexity and growing problem sizes in AI and Machine Learning, inference and learning are still major issues in Probabilistic Graphical Models (PGMs). On the other hand, many problems are specified in such a way that symmetries arise from the underlying model structure. Exploiting these symmetries during inference, which is referred to as "lifted inference", has lead to significant efficiency gains. This thesis provides several enhanced versions of known algorithms that show to be liftable too and thereby applies lifting in "non-standard" settings. By doing so, the understanding of the applicability of lifted inference and lifting in general is extended. Among various other experiments, it is shown how lifted inference in combination with an innovative Web-based data harvesting pipeline is used to label author-paper-pairs with geographic information in online bibliographies. This results is a large-scale transnational bibliography containing affiliation information over time for roughly one million authors. Analyzing this dataset reveals the importance of understanding count data. Although counting is done literally everywhere, mainstream PGMs have widely been neglecting count data. In the case where the ranges of the random variables are defined over the natural numbers, crude approximations to the true distribution are often made by discretization or a Gaussian assumption. To handle count data, Poisson Dependency Networks (PDNs) are introduced which presents a new class of non-standard PGMs naturally handling count data
Statistical physics of neural systems
The ability of processing and storing information is considered a characteristic
trait of intelligent systems. In biological neural networks, learning is strongly
believed to take place at the synaptic level, in terms of modulation of synaptic
efficacy. It can be thus interpreted as the expression of a collective phenomena,
emerging when neurons connect each other in constituting a complex network of
interactions. In this work, we represent learning as an optimization problem, actually
implementing a local search, in the synaptic space, of specific configurations, known
as solutions and making a neural network able to accomplish a series of different
tasks. For instance, we would like the network to adapt the strength of its synaptic
connections, in order to be capable of classifying a series of objects, by assigning to
each object its corresponding class-label. Supported by a series of experiments, it
has been suggested that synapses may exploit a very few number of synaptic states
for encoding information. It is known that this feature makes learning in neural
networks a challenging task. Extending the large deviation analysis performed in
the extreme case of binary synaptic couplings, in this work, we prove the existence
of regions of the phase space, where solutions are organized in extremely dense
clusters. This picture turns out to be invariant to the tuning of all the parameters of
the model. Solutions within the clusters are more robust to noise, thus enhancing the
learning performances. This has inspired the design of new learning algorithms, as
well as it has clarified the effectiveness of the previously proposed ones. We further
provide quantitative evidence that the gain achievable when considering a greater
number of available synaptic states for encoding information, is consistent only up
to a very few number of bits. This is in line with the above mentioned experimental
results. Besides the challenging aspect of low precision synaptic connections, it is
also known that the neuronal environment is extremely noisy. Whether stochasticity
can enhance or worsen the learning performances is currently matter of debate. In
this work, we consider a neural network model where the synaptic connections are random variables, sampled according to a parametrized probability distribution.
We prove that, this source of stochasticity naturally drives towards regions of the
phase space at high densities of solutions. These regions are directly accessible by
means of gradient descent strategies, over the parameters of the synaptic couplings
distribution. We further set up a statistical physics analysis, through which we
show that solutions in the dense regions are characterized by robustness and good
generalization performances. Stochastic neural networks are also capable of building
abstract representations of input stimuli and then generating new input samples,
according to the inferred statistics of the input signal. In this regard, we propose a
new learning rule, called Delayed Correlation Matching (DCM), that relying on the
matching between time-delayed activity correlations, makes a neural network able
to store patterns of neuronal activity. When considering hidden neuronal states, the
DCM learning rule is also able to train Restricted Boltzmann Machines as generative
models. In this work, we further require the DCM learning rule to fulfil some
biological constraints, such as locality, sparseness of the neural coding and the Dale’s
principle. While retaining all these biological requirements, the DCM learning
rule has shown to be effective for different network topologies, and in both on-line
learning regimes and presence of correlated patterns. We further show that it is also
able to prevent the creation of spurious attractor states
Out of equilibrium Statistical Physics of learning
In the study of hard optimization problems, it is often unfeasible to achieve
a full analytic control on the dynamics of the algorithmic processes that
find solutions efficiently. In many cases, a static approach is able to provide
considerable insight into the dynamical properties of these algorithms: in fact,
the geometrical structures found in the energetic landscape can strongly affect
the stationary states and the optimal configurations reached by the solvers.
In this context, a classical Statistical Mechanics approach, relying on the
assumption of the asymptotic realization of a Boltzmann Gibbs equilibrium,
can yield misleading predictions when the studied algorithms comprise some
stochastic components that effectively drive these processes out of equilibrium.
Thus, it becomes necessary to develop some intuition on the relevant features
of the studied phenomena and to build an ad hoc Large Deviation analysis,
providing a more targeted and richer description of the geometrical properties
of the landscape. The present thesis focuses on the study of learning processes
in Artificial Neural Networks, with the aim of introducing an out of equilibrium
statistical physics framework, based on the introduction of a local entropy
potential, for supporting and inspiring algorithmic improvements in the field
of Deep Learning, and for developing models of neural computation that can
carry both biological and engineering interest