9 research outputs found
Causal Dependence Tree Approximations of Joint Distributions for Multiple Random Processes
We investigate approximating joint distributions of random processes with
causal dependence tree distributions. Such distributions are particularly
useful in providing parsimonious representation when there exists causal
dynamics among processes. By extending the results by Chow and Liu on
dependence tree approximations, we show that the best causal dependence tree
approximation is the one which maximizes the sum of directed informations on
its edges, where best is defined in terms of minimizing the KL-divergence
between the original and the approximate distribution. Moreover, we describe a
low-complexity algorithm to efficiently pick this approximate distribution.Comment: 9 pages, 15 figure
Capacity of Binary State Symmetric Channel with and without Feedback and Transmission Cost
We consider a unit memory channel, called Binary State Symmetric Channel
(BSSC), in which the channel state is the modulo2 addition of the current
channel input and the previous channel output. We derive closed form
expressions for the capacity and corresponding channel input distribution, of
this BSSC with and without feedback and transmission cost. We also show that
the capacity of the BSSC is not increased by feedback, and it is achieved by a
first order symmetric Markov process
The Value of Information for Populations in Varying Environments
The notion of information pervades informal descriptions of biological
systems, but formal treatments face the problem of defining a quantitative
measure of information rooted in a concept of fitness, which is itself an
elusive notion. Here, we present a model of population dynamics where this
problem is amenable to a mathematical analysis. In the limit where any
information about future environmental variations is common to the members of
the population, our model is equivalent to known models of financial
investment. In this case, the population can be interpreted as a portfolio of
financial assets and previous analyses have shown that a key quantity of
Shannon's communication theory, the mutual information, sets a fundamental
limit on the value of information. We show that this bound can be violated when
accounting for features that are irrelevant in finance but inherent to
biological systems, such as the stochasticity present at the individual level.
This leads us to generalize the measures of uncertainty and information usually
encountered in information theory
Using Inverse Reinforcement Learning with Real Trajectories to Get More Trustworthy Pedestrian Simulation
Reinforcement learning is one of the most promising machine learning techniques to get intelligent behaviors for embodied agents in simulations. The output of the classic Temporal Difference family of Reinforcement Learning algorithms adopts the form of a value function expressed as a numeric table or a function approximator. The learned behavior is then derived using a greedy policy with respect to this value function. Nevertheless, sometimes the learned policy does not meet expectations, and the task of authoring is difficult and unsafe because the modification of one value or parameter in the learned value function has unpredictable consequences in the space of the policies it represents. This invalidates direct manipulation of the learned value function as a method to modify the derived behaviors. In this paper, we propose the use of Inverse Reinforcement Learning to incorporate real behavior traces in the learning process to shape the learned behaviors, thus increasing their trustworthiness (in terms of conformance to reality). To do so, we adapt the Inverse Reinforcement Learning framework to the navigation problem domain. Specifically, we use Soft Q-learning, an algorithm based on the maximum causal entropy principle, with MARL-Ped (a Reinforcement Learning-based pedestrian simulator) to include information from trajectories of real pedestrians in the process of learning how to navigate inside a virtual 3D space that represents the real environment. A comparison with the behaviors learned using a Reinforcement Learning classic algorithm (Sarsa(λ)) shows that the Inverse Reinforcement Learning behaviors adjust significantly better to the real trajectories
Maximum Causal Entropy Specification Inference from Demonstrations
In many settings (e.g., robotics) demonstrations provide a natural way to
specify tasks; however, most methods for learning from demonstrations either do
not provide guarantees that the artifacts learned for the tasks, such as
rewards or policies, can be safely composed and/or do not explicitly capture
history dependencies. Motivated by this deficit, recent works have proposed
learning Boolean task specifications, a class of Boolean non-Markovian rewards
which admit well-defined composition and explicitly handle historical
dependencies. This work continues this line of research by adapting maximum
causal entropy inverse reinforcement learning to estimate the posteriori
probability of a specification given a multi-set of demonstrations. The key
algorithmic insight is to leverage the extensive literature and tooling on
reduced ordered binary decision diagrams to efficiently encode a time unrolled
Markov Decision Process. This enables transforming a naive exponential time
algorithm into a polynomial time algorithm.Comment: Computer Aided Verification, 202