64,292 research outputs found
Optimal control as a graphical model inference problem
We reformulate a class of non-linear stochastic optimal control problems
introduced by Todorov (2007) as a Kullback-Leibler (KL) minimization problem.
As a result, the optimal control computation reduces to an inference
computation and approximate inference methods can be applied to efficiently
compute approximate optimal controls. We show how this KL control theory
contains the path integral control method as a special case. We provide an
example of a block stacking task and a multi-agent cooperative game where we
demonstrate how approximate inference can be successfully applied to instances
that are too complex for exact computation. We discuss the relation of the KL
control approach to other inference approaches to control.Comment: 26 pages, 12 Figures; Machine Learning Journal (2012
Task-Oriented Communication for Multi-Device Cooperative Edge Inference
This paper investigates task-oriented communication for multi-device
cooperative edge inference, where a group of distributed low-end edge devices
transmit the extracted features of local samples to a powerful edge server for
inference. While cooperative edge inference can overcome the limited sensing
capability of a single device, it substantially increases the communication
overhead and may incur excessive latency. To enable low-latency cooperative
inference, we propose a learning-based communication scheme that optimizes
local feature extraction and distributed feature encoding in a task-oriented
manner, i.e., to remove data redundancy and transmit information that is
essential for the downstream inference task rather than reconstructing the data
samples at the edge server. Specifically, we leverage an information bottleneck
(IB) principle to extract the task-relevant feature at each edge device and
adopt a distributed information bottleneck (DIB) framework to formalize a
single-letter characterization of the optimal rate-relevance tradeoff for
distributed feature encoding. To admit flexible control of the communication
overhead, we extend the DIB framework to a distributed deterministic
information bottleneck (DDIB) objective that explicitly incorporates the
representational costs of the encoded features. As the IB-based objectives are
computationally prohibitive for high-dimensional data, we adopt variational
approximations to make the optimization problems tractable. To compensate the
potential performance loss due to the variational approximations, we also
develop a selective retransmission (SR) mechanism to identify the redundancy in
the encoded features of multiple edge devices to attain additional
communication overhead reduction. Extensive experiments evidence that the
proposed task-oriented communication scheme achieves a better rate-relevance
tradeoff than baseline methods.Comment: This paper was accepted to IEEE Transactions on Wireless
Communicatio
Implicit feedback-based group recommender system for internet of things applications
With the prevalence of Internet of Things (IoT)-based social media applications, the distance among people has been greatly shortened. As a result, recommender systems in IoT-based social media need to be developed oriented to groups of users rather than individual users. However, existing methods were highly dependent on explicit preference feedbacks, ignoring scenarios of implicit feedbacks. To remedy such gap, this paper proposes an implicit feedback-based group recommender system using probabilistic inference and non-cooperative game (GREPING) for IoT-based social media. Particularly, unknown process variables can be estimated from observable implicit feedbacks via Bayesian posterior probability inference. In addition, the globally optimal recommendation results can be calculated with the aid of non-cooperative game. Two groups of experiments are conducted to assess the GREPING from two aspects: efficiency and robustness. Experimental results show obvious promotion and considerable stability of the GREPING compared to baseline methods. © 2020 IEEE
A Regularized Opponent Model with Maximum Entropy Objective
In a single-agent setting, reinforcement learning (RL) tasks can be cast into
an inference problem by introducing a binary random variable o, which stands
for the "optimality". In this paper, we redefine the binary random variable o
in multi-agent setting and formalize multi-agent reinforcement learning (MARL)
as probabilistic inference. We derive a variational lower bound of the
likelihood of achieving the optimality and name it as Regularized Opponent
Model with Maximum Entropy Objective (ROMMEO). From ROMMEO, we present a novel
perspective on opponent modeling and show how it can improve the performance of
training agents theoretically and empirically in cooperative games. To optimize
ROMMEO, we first introduce a tabular Q-iteration method ROMMEO-Q with proof of
convergence. We extend the exact algorithm to complex environments by proposing
an approximate version, ROMMEO-AC. We evaluate these two algorithms on the
challenging iterated matrix game and differential game respectively and show
that they can outperform strong MARL baselines.Comment: Accepted to International Joint Conference on Artificial Intelligence
(IJCA2019
Game theory of mind
This paper introduces a model of ‘theory of mind’, namely, how we represent the intentions and goals of others to optimise our mutual interactions. We draw on ideas from optimum control and game theory to provide a ‘game theory of mind’. First, we consider the representations of goals in terms of value functions that are prescribed by utility or rewards. Critically, the joint value functions and ensuing behaviour are optimised recursively, under the assumption that I represent your value function, your representation of mine, your representation of my representation of yours, and so on ad infinitum. However, if we assume that the degree of recursion is bounded, then players need to estimate the opponent's degree of recursion (i.e., sophistication) to respond optimally. This induces a problem of inferring the opponent's sophistication, given behavioural exchanges. We show it is possible to deduce whether players make inferences about each other and quantify their sophistication on the basis of choices in sequential games. This rests on comparing generative models of choices with, and without, inference. Model comparison is demonstrated using simulated and real data from a ‘stag-hunt’. Finally, we note that exactly the same sophisticated behaviour can be achieved by optimising the utility function itself (through prosocial utility), producing unsophisticated but apparently altruistic agents. This may be relevant ethologically in hierarchal game theory and coevolution
Linear Regression from Strategic Data Sources
Linear regression is a fundamental building block of statistical data
analysis. It amounts to estimating the parameters of a linear model that maps
input features to corresponding outputs. In the classical setting where the
precision of each data point is fixed, the famous Aitken/Gauss-Markov theorem
in statistics states that generalized least squares (GLS) is a so-called "Best
Linear Unbiased Estimator" (BLUE). In modern data science, however, one often
faces strategic data sources, namely, individuals who incur a cost for
providing high-precision data.
In this paper, we study a setting in which features are public but
individuals choose the precision of the outputs they reveal to an analyst. We
assume that the analyst performs linear regression on this dataset, and
individuals benefit from the outcome of this estimation. We model this scenario
as a game where individuals minimize a cost comprising two components: (a) an
(agent-specific) disclosure cost for providing high-precision data; and (b) a
(global) estimation cost representing the inaccuracy in the linear model
estimate. In this game, the linear model estimate is a public good that
benefits all individuals. We establish that this game has a unique non-trivial
Nash equilibrium. We study the efficiency of this equilibrium and we prove
tight bounds on the price of stability for a large class of disclosure and
estimation costs. Finally, we study the estimator accuracy achieved at
equilibrium. We show that, in general, Aitken's theorem does not hold under
strategic data sources, though it does hold if individuals have identical
disclosure costs (up to a multiplicative factor). When individuals have
non-identical costs, we derive a bound on the improvement of the equilibrium
estimation cost that can be achieved by deviating from GLS, under mild
assumptions on the disclosure cost functions.Comment: This version (v3) extends the results on the sub-optimality of GLS
(Section 6) and improves writing in multiple places compared to v2. Compared
to the initial version v1, it also fixes an error in Theorem 6 (now Theorem
5), and extended many of the result
- …