    Providing Information by Resource- Constrained Data Analysis

    The Collaborative Research Center SFB 876 (Providing Information by Resource-Constrained Data Analysis) brings together the research fields of data analysis (Data Mining, Knowledge Discovery in Data Bases, Machine Learning, Statistics) and embedded systems and enhances their methods such that information from distributed, dynamic masses of data becomes available anytime and anywhere. The research center approaches these problems with new algorithms respecting the resource constraints in the different scenarios. This Technical Report presents the work of the members of the integrated graduate school

    Unsupervised Induction of Frame-Based Linguistic Forms

    This thesis studies the use of bulk, structured, linguistic annotations in order to perform unsupervised induction of meaning for three kinds of linguistic forms: words, sentences, and documents. The primary linguistic annotation I consider throughout this thesis are frames, which encode core linguistic, background or societal knowledge necessary to understand abstract concepts and real-world situations. I begin with an overview of linguistically-based structured meaning representation; I then analyze available large-scale natural language processing (NLP) and linguistic resources and corpora for their abilities to accommodate bulk, automatically-obtained frame annotations. I then proceed to induce meanings of the different forms, progressing from the word level, to the sentence level, and finally to the document level. I first show how to use these bulk annotations in order to better encode linguistic- and cognitive science backed semantic expectations within word forms. I then demonstrate a straightforward approach for learning large lexicalized and refined syntactic fragments, which encode and memoize commonly used phrases and linguistic constructions. Next, I consider two unsupervised models for document and discourse understanding; one is a purely generative approach that naturally accommodates layer annotations and is the first to capture and unify a complete frame hierarchy. The other conditions on limited amounts of external annotations, imputing missing values when necessary, and can more readily scale to large corpora. These discourse models help improve document understanding and type-level understanding

    Learning with Graphs using Kernels from Propagated Information

    Traditional machine learning approaches are designed to learn from independent vector-valued data points. The assumption that instances are independent, however, is not always true. On the contrary, there are numerous domains where data points are cross-linked, for example social networks, where persons are linked by friendship relations. These relations among data points make traditional machine learning diffcult and often insuffcient. Furthermore, data points themselves can have complex structure, for example molecules or proteins constructed from various bindings of different atoms. Networked and structured data are naturally represented by graphs, and for learning we aimto exploit their structure to improve upon non-graph-based methods. However, graphs encountered in real-world applications often come with rich additional information. This naturally implies many challenges for representation and learning: node information is likely to be incomplete leading to partially labeled graphs, information can be aggregated from multiple sources and can therefore be uncertain, or additional information on nodes and edges can be derived from complex sensor measurements, thus being naturally continuous. Although learning with graphs is an active research area, learning with structured data, substantially modeling structural similarities of graphs, mostly assumes fully labeled graphs of reasonable sizes with discrete and certain node and edge information, and learning with networked data, naturally dealing with missing information and huge graphs, mostly assumes homophily and forgets about structural similarity. To close these gaps, we present a novel paradigm for learning with graphs, that exploits the intermediate results of iterative information propagation schemes on graphs. Originally developed for within-network relational and semi-supervised learning, these propagation schemes have two desirable properties: they capture structural information and they can naturally adapt to the aforementioned issues of real-world graph data. Additionally, information propagation can be efficiently realized by random walks leading to fast, flexible, and scalable feature and kernel computations. Further, by considering intermediate random walk distributions, we can model structural similarity for learning with structured and networked data. We develop several approaches based on this paradigm. In particular, we introduce propagation kernels for learning on the graph level and coinciding walk kernels and Markov logic sets for learning on the node level. Finally, we present two application domains where kernels from propagated information successfully tackle real-world problems

    Implementing Bayesian Inference with Neural Networks

    Embodied agents, be they animals or robots, acquire information about the world through their senses. Embodied agents, however, do not simply lose this information once it passes by, but rather process and store it for future use. The most general theory of how an agent can combine stored knowledge with new observations is Bayesian inference. In this dissertation I present a theory of how embodied agents can learn to implement Bayesian inference with neural networks. By neural network I mean both artificial and biological neural networks, and in my dissertation I address both kinds. On one hand, I develop theory for implementing Bayesian inference in deep generative models, and I show how to train multilayer perceptrons to compute approximate predictions for Bayesian filtering. On the other hand, I show that several models in computational neuroscience are special cases of the general theory that I develop in this dissertation, and I use this theory to model and explain several phenomena in neuroscience. The key contributions of this dissertation can be summarized as follows: - I develop a class of graphical model called nth-order harmoniums. An nth-order harmonium is an n-tuple of random variables, where the conditional distribution of each variable given all the others is always an element of the same exponential family. I show that harmoniums have a recursive structure which allows them to be analyzed at coarser and finer levels of detail. - I define a class of harmoniums called rectified harmoniums, which are constrained to have priors which are conjugate to their posteriors. As a consequence of this, rectified harmoniums afford efficient sampling and learning. - I develop deep harmoniums, which are harmoniums which can be represented by hierarchical, undirected graphs. I develop the theory of rectification for deep harmoniums, and develop a novel algorithm for training deep generative models. - I show how to implement a variety of optimal and near-optimal Bayes filters by combining the solution to Bayes' rule provided by rectified harmoniums, with predictions computed by a recurrent neural network. I then show how to train a neural network to implement Bayesian filtering when the transition and emission distributions are unknown. - I show how some well-established models of neural activity are special cases of the theory I present in this dissertation, and how these models can be generalized with the theory of rectification. - I show how the theory that I present can model several neural phenomena including proprioception and gain-field modulation of tuning curves. - I introduce a library for the programming language Haskell, within which I have implemented all the simulations presented in this dissertation. This library uses concepts from Riemannian geometry to provide a rigorous and efficient environment for implementing complex numerical simulations. I also use the results presented in this dissertation to argue for the fundamental role of neural computation in embodied cognition. I argue, in other words, that before we will be able to build truly intelligent robots, we will need to truly understand biological brains

    Computer Aided Verification

    This open access two-volume set LNCS 11561 and 11562 constitutes the refereed proceedings of the 31st International Conference on Computer Aided Verification, CAV 2019, held in New York City, USA, in July 2019. The 52 full papers presented together with 13 tool papers and 2 case studies, were carefully reviewed and selected from 258 submissions. The papers were organized in the following topical sections: Part I: automata and timed systems; security and hyperproperties; synthesis; model checking; cyber-physical systems and machine learning; probabilistic systems, runtime techniques; dynamical, hybrid, and reactive systems; Part II: logics, decision procedures; and solvers; numerical programs; verification; distributed systems and networks; verification and invariants; and concurrency

    Using Constraint Satisfaction Techniques and Variational Methods for Probabilistic Reasoning

    RÉSUMÉ Cette thèse présente un certain nombre de contributions à la recherche pour la création de systèmes efficaces de raisonnement probabiliste sur les modèles graphiques de problèmes issus d'une variété d'applications scientifiques et d'ingénierie. Ce thème touche plusieurs sous-disciplines de l'intelligence artificielle. Généralement, la plupart de ces problèmes ont des modèles graphiques expressifs qui se traduisent par de grands réseaux impliquant déterminisme et des cycles, ce qui représente souvent un goulot d'étranglement pour tout système d'inférence probabiliste et affaiblit son exactitude ainsi que son évolutivité. Conceptuellement, notre recherche confirme les hypothèses suivantes. D'abord, les techniques de satisfaction de contraintes et méthodes variationnelles peuvent être exploitées pour obtenir des algorithmes précis et évolutifs pour l'inférence probabiliste en présence de cycles et de déterminisme. Deuxièmement, certaines parties intrinsèques de la structure du modèle graphique peuvent se révéler bénéfiques pour l'inférence probabiliste sur les grands modèles graphiques, au lieu de poser un défi important pour elle. Troisièmement, le re-paramétrage du modèle graphique permet d'ajouter à sa structure des caractéristiques puissantes qu'on peut utiliser pour améliorer l'inférence probabiliste. La première contribution majeure de cette thèse est la formulation d'une nouvelle approche de passage de messages (message-passing) pour inférer dans un graphe de facteurs étendu qui combine des techniques de satisfaction de contraintes et des méthodes variationnelles. Contrairement au message-passing standard, il formule sa structure sous forme d'étapes de maximisation de l'espérance variationnelle. Ainsi, on a de nouvelles règles de mise à jour des marginaux qui augmentent une borne inférieure à chaque mise à jour de manière à éviter le dépassement d'un point fixe. De plus, lors de l'étape d'espérance, nous mettons à profit les structures locales dans le graphe de facteurs en utilisant la cohérence d'arc généralisée pour effectuer une approximation de champ moyen variationnel. La deuxième contribution majeure est la formulation d'une stratégie en deux étapes qui utilise le déterminisme présent dans la structure du modèle graphique pour améliorer l'évolutivité du problème d'inférence probabiliste. Dans cette stratégie, nous prenons en compte le fait que si le modèle sous-jacent implique des contraintes inviolables en plus des préférences, alors c'est potentiellement un gaspillage d'allouer de la mémoire pour toutes les contraintes à l'avance lors de l'exécution de l'inférence. Pour éviter cela, nous commençons par la relaxation des préférences et effectuons l'inférence uniquement avec les contraintes inviolables. Cela permet d'éviter les calculs inutiles impliquant les préférences et de réduire la taille effective du réseau graphique. Enfin, nous développons une nouvelle famille d'algorithmes d'inférence par le passage de messages dans un graphe de facteurs étendus, paramétrées par un facteur de lissage (smoothing parameter). Cette famille permet d'identifier les épines dorsales (backbones) d'une grappe qui contient des solutions potentiellement optimales. Ces épines dorsales ne sont pas seulement des parties des solutions optimales, mais elles peuvent également être exploitées pour intensifier l'inférence MAP en les fixant de manière itérative afin de réduire les parties complexes jusqu'à ce que le réseau se réduise à un seul qui peut être résolu avec précision en utilisant une méthode MAP d'inférence classique. Nous décrivons ensuite des variantes paresseuses de cette famille d'algorithmes. Expérimentalement, une évaluation empirique approfondie utilisant des applications du monde réel démontre la précision, la convergence et l'évolutivité de l'ensemble de nos algorithmes et stratégies par rapport aux algorithmes d'inférence existants de l'état de l'art.----------ABSTRACT This thesis presents a number of research contributions pertaining to the theme of creating efficient probabilistic reasoning systems based on graphical models of real-world problems from relational domains. These models arise in a variety of scientific and engineering applications. Thus, the theme impacts several sub-disciplines of Artificial Intelligence. Commonly, most of these problems have expressive graphical models that translate into large probabilistic networks involving determinism and cycles. Such graphical models frequently represent a bottleneck for any probabilistic inference system and weaken its accuracy and scalability. Conceptually, our research here hypothesizes and confirms that: First, constraint satisfaction techniques and variational methods can be exploited to yield accurate and scalable algorithms for probabilistic inference in the presence of cycles and determinism. Second, some intrinsic parts of the structure of the graphical model can turn out to be beneficial to probabilistic inference on large networks, instead of posing a significant challenge to it. Third, the proper re-parameterization of the graphical model can provide its structure with characteristics that we can use to improve probabilistic inference. The first major contribution of this thesis is the formulation of a novel message-passing approach to inference in an extended factor graph that combines constraint satisfaction techniques with variational methods. In contrast to standard message-passing, it formulates the Message-Passing structure as steps of variational expectation maximization. Thus it has new marginal update rules that increase a lower bound at each marginal update in a way that avoids overshooting a fixed point. Moreover, in its expectation step, we leverage the local structures in the factor graph by using generalized arc consistency to perform a variational mean-field approximation. The second major contribution is the formulation of a novel two-stage strategy that uses the determinism present in the graphical model's structure to improve the scalability of probabilistic inference. In this strategy, we take into account the fact that if the underlying model involves mandatory constraints as well as preferences then it is potentially wasteful to allocate memory for all constraints in advance when performing inference. To avoid this, we start by relaxing preferences and performing inference with hard constraints only. This helps avoid irrelevant computations involving preferences, and reduces the effective size of the graphical network. Finally, we develop a novel family of message-passing algorithms for inference in an extended factor graph, parameterized by a smoothing parameter. This family allows one to find the ”backbones” of a cluster that involves potentially optimal solutions. The cluster's backbones are not only portions of the optimal solutions, but they also can be exploited for scaling MAP inference by iteratively fixing them to reduce the complex parts until the network is simplified into one that can be solved accurately using any conventional MAP inference method. We then describe lazy variants of this family of algorithms. One limiting case of our approach corresponds to lazy survey propagation, which in itself is novel method which can yield state of the art performance. We provide a thorough empirical evaluation using real-world applications. Our experiments demonstrate improvements to the accuracy, convergence and scalability of all our proposed algorithms and strategies over existing state-of-the-art inference algorithms

    Ranking to Learn and Learning to Rank: On the Role of Ranking in Pattern Recognition Applications

    The last decade has seen a revolution in the theory and application of machine learning and pattern recognition. Through these advancements, variable ranking has emerged as an active and growing research area and it is now beginning to be applied to many new problems. The rationale behind this fact is that many pattern recognition problems are by nature ranking problems. The main objective of a ranking algorithm is to sort objects according to some criteria, so that, the most relevant items will appear early in the produced result list. Ranking methods can be analyzed from two different methodological perspectives: ranking to learn and learning to rank. The former aims at studying methods and techniques to sort objects for improving the accuracy of a machine learning model. Enhancing a model performance can be challenging at times. For example, in pattern classification tasks, different data representations can complicate and hide the different explanatory factors of variation behind the data. In particular, hand-crafted features contain many cues that are either redundant or irrelevant, which turn out to reduce the overall accuracy of the classifier. In such a case feature selection is used, that, by producing ranked lists of features, helps to filter out the unwanted information. Moreover, in real-time systems (e.g., visual trackers) ranking approaches are used as optimization procedures which improve the robustness of the system that deals with the high variability of the image streams that change over time. The other way around, learning to rank is necessary in the construction of ranking models for information retrieval, biometric authentication, re-identification, and recommender systems. In this context, the ranking model's purpose is to sort objects according to their degrees of relevance, importance, or preference as defined in the specific application.Comment: European PhD Thesis. arXiv admin note: text overlap with arXiv:1601.06615, arXiv:1505.06821, arXiv:1704.02665 by other author

    Ranking to Learn and Learning to Rank: On the Role of Ranking in Pattern Recognition Applications

    Agoric computation: trust and cyber-physical systems

    In the past two decades advances in miniaturisation and economies of scale have led to the emergence of billions of connected components that have provided both a spur and a blueprint for the development of smart products acting in specialised environments which are uniquely identifiable, localisable, and capable of autonomy. Adopting the computational perspective of multi-agent systems (MAS) as a technological abstraction married with the engineering perspective of cyber-physical systems (CPS) has provided fertile ground for designing, developing and deploying software applications in smart automated context such as manufacturing, power grids, avionics, healthcare and logistics, capable of being decentralised, intelligent, reconfigurable, modular, flexible, robust, adaptive and responsive. Current agent technologies are, however, ill suited for information-based environments, making it difficult to formalise and implement multiagent systems based on inherently dynamical functional concepts such as trust and reliability, which present special challenges when scaling from small to large systems of agents. To overcome such challenges, it is useful to adopt a unified approach which we term agoric computation, integrating logical, mathematical and programming concepts towards the development of agent-based solutions based on recursive, compositional principles, where smaller systems feed via directed information flows into larger hierarchical systems that define their global environment. Considering information as an integral part of the environment naturally defines a web of operations where components of a systems are wired in some way and each set of inputs and outputs are allowed to carry some value. These operations are stateless abstractions and procedures that act on some stateful cells that cumulate partial information, and it is possible to compose such abstractions into higher-level ones, using a publish-and-subscribe interaction model that keeps track of update messages between abstractions and values in the data. In this thesis we review the logical and mathematical basis of such abstractions and take steps towards the software implementation of agoric modelling as a framework for simulation and verification of the reliability of increasingly complex systems, and report on experimental results related to a few select applications, such as stigmergic interaction in mobile robotics, integrating raw data into agent perceptions, trust and trustworthiness in orchestrated open systems, computing the epistemic cost of trust when reasoning in networks of agents seeded with contradictory information, and trust models for distributed ledgers in the Internet of Things (IoT); and provide a roadmap for future developments of our research

    Graphical models beyond standard settings: lifted decimation, labeling, and counting

    With increasing complexity and growing problem sizes in AI and Machine Learning, inference and learning are still major issues in Probabilistic Graphical Models (PGMs). On the other hand, many problems are specified in such a way that symmetries arise from the underlying model structure. Exploiting these symmetries during inference, which is referred to as "lifted inference", has lead to significant efficiency gains. This thesis provides several enhanced versions of known algorithms that show to be liftable too and thereby applies lifting in "non-standard" settings. By doing so, the understanding of the applicability of lifted inference and lifting in general is extended. Among various other experiments, it is shown how lifted inference in combination with an innovative Web-based data harvesting pipeline is used to label author-paper-pairs with geographic information in online bibliographies. This results is a large-scale transnational bibliography containing affiliation information over time for roughly one million authors. Analyzing this dataset reveals the importance of understanding count data. Although counting is done literally everywhere, mainstream PGMs have widely been neglecting count data. In the case where the ranges of the random variables are defined over the natural numbers, crude approximations to the true distribution are often made by discretization or a Gaussian assumption. To handle count data, Poisson Dependency Networks (PDNs) are introduced which presents a new class of non-standard PGMs naturally handling count data