3,037 research outputs found
Learning Task Specifications from Demonstrations
Real world applications often naturally decompose into several sub-tasks. In
many settings (e.g., robotics) demonstrations provide a natural way to specify
the sub-tasks. However, most methods for learning from demonstrations either do
not provide guarantees that the artifacts learned for the sub-tasks can be
safely recombined or limit the types of composition available. Motivated by
this deficit, we consider the problem of inferring Boolean non-Markovian
rewards (also known as logical trace properties or specifications) from
demonstrations provided by an agent operating in an uncertain, stochastic
environment. Crucially, specifications admit well-defined composition rules
that are typically easy to interpret. In this paper, we formulate the
specification inference task as a maximum a posteriori (MAP) probability
inference problem, apply the principle of maximum entropy to derive an analytic
demonstration likelihood model and give an efficient approach to search for the
most likely specification in a large candidate pool of specifications. In our
experiments, we demonstrate how learning specifications can help avoid common
problems that often arise due to ad-hoc reward composition.Comment: NIPS 201
Bayesian fairness
We consider the problem of how decision making can be fair when the
underlying probabilistic model of the world is not known with certainty. We
argue that recent notions of fairness in machine learning need to explicitly
incorporate parameter uncertainty, hence we introduce the notion of {\em
Bayesian fairness} as a suitable candidate for fair decision rules. Using
balance, a definition of fairness introduced by Kleinberg et al (2016), we show
how a Bayesian perspective can lead to well-performing, fair decision rules
even under high uncertainty.Comment: 13 pages, 8 figures, to appear at AAAI 201
Streaming Video QoE Modeling and Prediction: A Long Short-Term Memory Approach
HTTP based adaptive video streaming has become a popular choice of streaming
due to the reliable transmission and the flexibility offered to adapt to
varying network conditions. However, due to rate adaptation in adaptive
streaming, the quality of the videos at the client keeps varying with time
depending on the end-to-end network conditions. Further, varying network
conditions can lead to the video client running out of playback content
resulting in rebuffering events. These factors affect the user satisfaction and
cause degradation of the user quality of experience (QoE). It is important to
quantify the perceptual QoE of the streaming video users and monitor the same
in a continuous manner so that the QoE degradation can be minimized. However,
the continuous evaluation of QoE is challenging as it is determined by complex
dynamic interactions among the QoE influencing factors. Towards this end, we
present LSTM-QoE, a recurrent neural network based QoE prediction model using a
Long Short-Term Memory (LSTM) network. The LSTM-QoE is a network of cascaded
LSTM blocks to capture the nonlinearities and the complex temporal dependencies
involved in the time varying QoE. Based on an evaluation over several publicly
available continuous QoE databases, we demonstrate that the LSTM-QoE has the
capability to model the QoE dynamics effectively. We compare the proposed model
with the state-of-the-art QoE prediction models and show that it provides
superior performance across these databases. Further, we discuss the state
space perspective for the LSTM-QoE and show the efficacy of the state space
modeling approaches for QoE prediction
Generalized planning: Non-deterministic abstractions and trajectory constraints
We study the characterization and computation of general policies for families of problems that share a structure characterized by a common reduction into a single abstract problem. Policies mu that solve the abstract problem P have been shown to solve all problems Q that reduce to P provided that mu terminates in Q. In this work, we shed light on why this termination condition is needed and how it can be removed. The key observation is that the abstract problem P captures the common structure among the concrete problems Q that is local (Markovian) but misses common structure that is global. We show how such global structure can be captured by means of trajectory constraints that in many cases can be expressed as LTL formulas, thus reducing generalized planning to LTL synthesis. Moreover, for a broad class of problems that involve integer variables that can be increased or decreased, trajectory constraints can be compiled away, reducing generalized planning to fully observable nondeterministic planning
Calibrated Fairness in Bandits
We study fairness within the stochastic, \emph{multi-armed bandit} (MAB)
decision making framework. We adapt the fairness framework of "treating similar
individuals similarly" to this setting. Here, an `individual' corresponds to an
arm and two arms are `similar' if they have a similar quality distribution.
First, we adopt a {\em smoothness constraint} that if two arms have a similar
quality distribution then the probability of selecting each arm should be
similar. In addition, we define the {\em fairness regret}, which corresponds to
the degree to which an algorithm is not calibrated, where perfect calibration
requires that the probability of selecting an arm is equal to the probability
with which the arm has the best quality realization. We show that a variation
on Thompson sampling satisfies smooth fairness for total variation distance,
and give an bound on fairness regret. This complements
prior work, which protects an on-average better arm from being less favored. We
also explain how to extend our algorithm to the dueling bandit setting.Comment: To be presented at the FAT-ML'17 worksho
- …