3,037 research outputs found

    Learning Task Specifications from Demonstrations

    Full text link
    Real world applications often naturally decompose into several sub-tasks. In many settings (e.g., robotics) demonstrations provide a natural way to specify the sub-tasks. However, most methods for learning from demonstrations either do not provide guarantees that the artifacts learned for the sub-tasks can be safely recombined or limit the types of composition available. Motivated by this deficit, we consider the problem of inferring Boolean non-Markovian rewards (also known as logical trace properties or specifications) from demonstrations provided by an agent operating in an uncertain, stochastic environment. Crucially, specifications admit well-defined composition rules that are typically easy to interpret. In this paper, we formulate the specification inference task as a maximum a posteriori (MAP) probability inference problem, apply the principle of maximum entropy to derive an analytic demonstration likelihood model and give an efficient approach to search for the most likely specification in a large candidate pool of specifications. In our experiments, we demonstrate how learning specifications can help avoid common problems that often arise due to ad-hoc reward composition.Comment: NIPS 201

    Bayesian fairness

    Get PDF
    We consider the problem of how decision making can be fair when the underlying probabilistic model of the world is not known with certainty. We argue that recent notions of fairness in machine learning need to explicitly incorporate parameter uncertainty, hence we introduce the notion of {\em Bayesian fairness} as a suitable candidate for fair decision rules. Using balance, a definition of fairness introduced by Kleinberg et al (2016), we show how a Bayesian perspective can lead to well-performing, fair decision rules even under high uncertainty.Comment: 13 pages, 8 figures, to appear at AAAI 201

    Streaming Video QoE Modeling and Prediction: A Long Short-Term Memory Approach

    Get PDF
    HTTP based adaptive video streaming has become a popular choice of streaming due to the reliable transmission and the flexibility offered to adapt to varying network conditions. However, due to rate adaptation in adaptive streaming, the quality of the videos at the client keeps varying with time depending on the end-to-end network conditions. Further, varying network conditions can lead to the video client running out of playback content resulting in rebuffering events. These factors affect the user satisfaction and cause degradation of the user quality of experience (QoE). It is important to quantify the perceptual QoE of the streaming video users and monitor the same in a continuous manner so that the QoE degradation can be minimized. However, the continuous evaluation of QoE is challenging as it is determined by complex dynamic interactions among the QoE influencing factors. Towards this end, we present LSTM-QoE, a recurrent neural network based QoE prediction model using a Long Short-Term Memory (LSTM) network. The LSTM-QoE is a network of cascaded LSTM blocks to capture the nonlinearities and the complex temporal dependencies involved in the time varying QoE. Based on an evaluation over several publicly available continuous QoE databases, we demonstrate that the LSTM-QoE has the capability to model the QoE dynamics effectively. We compare the proposed model with the state-of-the-art QoE prediction models and show that it provides superior performance across these databases. Further, we discuss the state space perspective for the LSTM-QoE and show the efficacy of the state space modeling approaches for QoE prediction

    Generalized planning: Non-deterministic abstractions and trajectory constraints

    Get PDF
    We study the characterization and computation of general policies for families of problems that share a structure characterized by a common reduction into a single abstract problem. Policies mu that solve the abstract problem P have been shown to solve all problems Q that reduce to P provided that mu terminates in Q. In this work, we shed light on why this termination condition is needed and how it can be removed. The key observation is that the abstract problem P captures the common structure among the concrete problems Q that is local (Markovian) but misses common structure that is global. We show how such global structure can be captured by means of trajectory constraints that in many cases can be expressed as LTL formulas, thus reducing generalized planning to LTL synthesis. Moreover, for a broad class of problems that involve integer variables that can be increased or decreased, trajectory constraints can be compiled away, reducing generalized planning to fully observable nondeterministic planning

    Calibrated Fairness in Bandits

    Get PDF
    We study fairness within the stochastic, \emph{multi-armed bandit} (MAB) decision making framework. We adapt the fairness framework of "treating similar individuals similarly" to this setting. Here, an `individual' corresponds to an arm and two arms are `similar' if they have a similar quality distribution. First, we adopt a {\em smoothness constraint} that if two arms have a similar quality distribution then the probability of selecting each arm should be similar. In addition, we define the {\em fairness regret}, which corresponds to the degree to which an algorithm is not calibrated, where perfect calibration requires that the probability of selecting an arm is equal to the probability with which the arm has the best quality realization. We show that a variation on Thompson sampling satisfies smooth fairness for total variation distance, and give an O~((kT)2/3)\tilde{O}((kT)^{2/3}) bound on fairness regret. This complements prior work, which protects an on-average better arm from being less favored. We also explain how to extend our algorithm to the dueling bandit setting.Comment: To be presented at the FAT-ML'17 worksho
    corecore