    Variational Counterfactual Prediction under Runtime Domain Corruption

    To date, various neural methods have been proposed for causal effect estimation based on observational data, where a default assumption is the same distribution and availability of variables at both training and inference (i.e., runtime) stages. However, distribution shift (i.e., domain shift) could happen during runtime, and bigger challenges arise from the impaired accessibility of variables. This is commonly caused by increasing privacy and ethical concerns, which can make arbitrary variables unavailable in the entire runtime data and imputation impractical. We term the co-occurrence of domain shift and inaccessible variables runtime domain corruption, which seriously impairs the generalizability of a trained counterfactual predictor. To counter runtime domain corruption, we subsume counterfactual prediction under the notion of domain adaptation. Specifically, we upper-bound the error w.r.t. the target domain (i.e., runtime covariates) by the sum of source domain error and inter-domain distribution distance. In addition, we build an adversarially unified variational causal effect model, named VEGAN, with a novel two-stage adversarial domain adaptation scheme to reduce the latent distribution disparity between treated and control groups first, and between training and runtime variables afterwards. We demonstrate that VEGAN outperforms other state-of-the-art baselines on individual-level treatment effect estimation in the presence of runtime domain corruption on benchmark datasets

    Sampling-Based Optimization for Multi-Agent Model Predictive Control

    We systematically review the Variational Optimization, Variational Inference and Stochastic Search perspectives on sampling-based dynamic optimization and discuss their connections to state-of-the-art optimizers and Stochastic Optimal Control (SOC) theory. A general convergence and sample complexity analysis on the three perspectives is provided through the unifying Stochastic Search perspective. We then extend these frameworks to their distributed versions for multi-agent control by combining them with consensus Alternating Direction Method of Multipliers (ADMM) to decouple the full problem into local neighborhood-level ones that can be solved in parallel. Model Predictive Control (MPC) algorithms are then developed based on these frameworks, leading to fully decentralized sampling-based dynamic optimizers. The capabilities of the proposed algorithms framework are demonstrated on multiple complex multi-agent tasks for vehicle and quadcopter systems in simulation. The results compare different distributed sampling-based optimizers and their centralized counterparts using unimodal Gaussian, mixture of Gaussians, and stein variational policies. The scalability of the proposed distributed algorithms is demonstrated on a 196-vehicle scenario where a direct application of centralized sampling-based methods is shown to be prohibitive

    Probabilistic Models of Motor Production

    N. Bernstein defined the ability of the central neural system (CNS) to control many degrees of freedom of a physical body with all its redundancy and flexibility as the main problem in motor control. He pointed at that man-made mechanisms usually have one, sometimes two degrees of freedom (DOF); when the number of DOF increases further, it becomes prohibitively hard to control them. The brain, however, seems to perform such control effortlessly. He suggested the way the brain might deal with it: when a motor skill is being acquired, the brain artificially limits the degrees of freedoms, leaving only one or two. As the skill level increases, the brain gradually "frees" the previously fixed DOF, applying control when needed and in directions which have to be corrected, eventually arriving to the control scheme where all the DOF are "free". This approach of reducing the dimensionality of motor control remains relevant even today. One the possibles solutions of the Bernstetin's problem is the hypothesis of motor primitives (MPs) - small building blocks that constitute complex movements and facilitite motor learnirng and task completion. Just like in the visual system, having a homogenious hierarchical architecture built of similar computational elements may be beneficial. Studying such a complicated object as brain, it is important to define at which level of details one works and which questions one aims to answer. David Marr suggested three levels of analysis: 1. computational, analysing which problem the system solves; 2. algorithmic, questioning which representation the system uses and which computations it performs; 3. implementational, finding how such computations are performed by neurons in the brain. In this thesis we stay at the first two levels, seeking for the basic representation of motor output. In this work we present a new model of motor primitives that comprises multiple interacting latent dynamical systems, and give it a full Bayesian treatment. Modelling within the Bayesian framework, in my opinion, must become the new standard in hypothesis testing in neuroscience. Only the Bayesian framework gives us guarantees when dealing with the inevitable plethora of hidden variables and uncertainty. The special type of coupling of dynamical systems we proposed, based on the Product of Experts, has many natural interpretations in the Bayesian framework. If the dynamical systems run in parallel, it yields Bayesian cue integration. If they are organized hierarchically due to serial coupling, we get hierarchical priors over the dynamics. If one of the dynamical systems represents sensory state, we arrive to the sensory-motor primitives. The compact representation that follows from the variational treatment allows learning of a motor primitives library. Learned separately, combined motion can be represented as a matrix of coupling values. We performed a set of experiments to compare different models of motor primitives. In a series of 2-alternative forced choice (2AFC) experiments participants were discriminating natural and synthesised movements, thus running a graphics Turing test. When available, Bayesian model score predicted the naturalness of the perceived movements. For simple movements, like walking, Bayesian model comparison and psychophysics tests indicate that one dynamical system is sufficient to describe the data. For more complex movements, like walking and waving, motion can be better represented as a set of coupled dynamical systems. We also experimentally confirmed that Bayesian treatment of model learning on motion data is superior to the simple point estimate of latent parameters. Experiments with non-periodic movements show that they do not benefit from more complex latent dynamics, despite having high kinematic complexity. By having a fully Bayesian models, we could quantitatively disentangle the influence of motion dynamics and pose on the perception of naturalness. We confirmed that rich and correct dynamics is more important than the kinematic representation. There are numerous further directions of research. In the models we devised, for multiple parts, even though the latent dynamics was factorized on a set of interacting systems, the kinematic parts were completely independent. Thus, interaction between the kinematic parts could be mediated only by the latent dynamics interactions. A more flexible model would allow a dense interaction on the kinematic level too. Another important problem relates to the representation of time in Markov chains. Discrete time Markov chains form an approximation to continuous dynamics. As time step is assumed to be fixed, we face with the problem of time step selection. Time is also not a explicit parameter in Markov chains. This also prohibits explicit optimization of time as parameter and reasoning (inference) about it. For example, in optimal control boundary conditions are usually set at exact time points, which is not an ecological scenario, where time is usually a parameter of optimization. Making time an explicit parameter in dynamics may alleviate this

    Roq: Robust Query Optimization Based on a Risk-aware Learned Cost Model

    Query optimizers in relational database management systems (RDBMSs) search for execution plans expected to be optimal for a given queries. They use parameter estimates, often inaccurate, and make assumptions that may not hold in practice. Consequently, they may select execution plans that are suboptimal at runtime, when these estimates and assumptions are not valid, which may result in poor query performance. Therefore, query optimizers do not sufficiently support robust query optimization. Recent years have seen a surge of interest in using machine learning (ML) to improve efficiency of data systems and reduce their maintenance overheads, with promising results obtained in the area of query optimization in particular. In this paper, inspired by these advancements, and based on several years of experience of IBM Db2 in this journey, we propose Robust Optimization of Queries, (Roq), a holistic framework that enables robust query optimization based on a risk-aware learning approach. Roq includes a novel formalization of the notion of robustness in the context of query optimization and a principled approach for its quantification and measurement based on approximate probabilistic ML. It also includes novel strategies and algorithms for query plan evaluation and selection. Roq also includes a novel learned cost model that is designed to predict query execution cost and the associated risks and performs query optimization accordingly. We demonstrate experimentally that Roq provides significant improvements to robust query optimization compared to the state-of-the-art.Comment: 13 pages, 9 figures, submitted to SIGMOD 202

    Autonomous Exploration over Continuous Domains

    Motion planning is an essential aspect of robot autonomy, and as such it has been studied for decades, producing a wide range of planning methodologies. Path planners are generally categorised as either trajectory optimisers or sampling-based planners. The latter is the predominant planning paradigm as it can resolve a path efficiently while explicitly reasoning about path safety. Yet, with a limited budget, the resulting paths are far from optimal. In contrast, state-of-the-art trajectory optimisers explicitly trade-off between path safety and efficiency to produce locally optimal paths. However, these planners cannot incorporate updates from a partially observed model such as an occupancy map and fail in planning around information gaps caused by incomplete sensor coverage. Autonomous exploration adds another twist to path planning. The objective of exploration is to safely and efficiently traverse through an unknown environment in order to map it. The desired output of such a process is a sequence of paths that efficiently and safely minimise the uncertainty of the map. However, optimising over the entire space of trajectories is computationally intractable. Therefore, most exploration algorithms relax the general formulation by optimising a simpler one, for example finding the single next best view, resulting in suboptimal performance. This thesis investigates methodologies for optimal and safe exploration over continuous paths. Contrary to existing exploration algorithms that break exploration into independent sub-problems of finding goal points and planning safe paths to these points, our holistic approach simultaneously optimises the coupled problems of where and how to explore. Thus, offering a shift in paradigm from next best view to next best path. With exploration defined as an optimisation problem over continuous paths, this thesis explores two different optimisation paradigms; Bayesian and functional

    ProSpar-GP: scalable Gaussian process modeling with massive non-stationary datasets

    Gaussian processes (GPs) are a popular class of Bayesian nonparametric models, but its training can be computationally burdensome for massive training datasets. While there has been notable work on scaling up these models for big data, existing methods typically rely on a stationary GP assumption for approximation, and can thus perform poorly when the underlying response surface is non-stationary, i.e., it has some regions of rapid change and other regions with little change. Such non-stationarity is, however, ubiquitous in real-world problems, including our motivating application for surrogate modeling of computer experiments. We thus propose a new Product of Sparse GP (ProSpar-GP) method for scalable GP modeling with massive non-stationary data. The ProSpar-GP makes use of a carefully-constructed product-of-experts formulation of sparse GP experts, where different experts are placed within local regions of non-stationarity. These GP experts are fit via a novel variational inference approach, which capitalizes on mini-batching and GPU acceleration for efficient optimization of inducing points and length-scale parameters for each expert. We further show that the ProSpar-GP is Kolmogorov-consistent, in that its generative distribution defines a valid stochastic process over the prediction space; such a property provides essential stability for variational inference, particularly in the presence of non-stationarity. We then demonstrate the improved performance of the ProSpar-GP over the state-of-the-art, in a suite of numerical experiments and an application for surrogate modeling of a satellite drag simulator
