8,360 research outputs found

    Reduced Complexity Filtering with Stochastic Dominance Bounds: A Convex Optimization Approach

    Full text link
    This paper uses stochastic dominance principles to construct upper and lower sample path bounds for Hidden Markov Model (HMM) filters. Given a HMM, by using convex optimization methods for nuclear norm minimization with copositive constraints, we construct low rank stochastic marices so that the optimal filters using these matrices provably lower and upper bound (with respect to a partially ordered set) the true filtered distribution at each time instant. Since these matrices are low rank (say R), the computational cost of evaluating the filtering bounds is O(XR) instead of O(X2). A Monte-Carlo importance sampling filter is presented that exploits these upper and lower bounds to estimate the optimal posterior. Finally, using the Dobrushin coefficient, explicit bounds are given on the variational norm between the true posterior and the upper and lower bounds

    Risk-Sensitive Reinforcement Learning: A Constrained Optimization Viewpoint

    Full text link
    The classic objective in a reinforcement learning (RL) problem is to find a policy that minimizes, in expectation, a long-run objective such as the infinite-horizon discounted or long-run average cost. In many practical applications, optimizing the expected value alone is not sufficient, and it may be necessary to include a risk measure in the optimization process, either as the objective or as a constraint. Various risk measures have been proposed in the literature, e.g., mean-variance tradeoff, exponential utility, the percentile performance, value at risk, conditional value at risk, prospect theory and its later enhancement, cumulative prospect theory. In this article, we focus on the combination of risk criteria and reinforcement learning in a constrained optimization framework, i.e., a setting where the goal to find a policy that optimizes the usual objective of infinite-horizon discounted/average cost, while ensuring that an explicit risk constraint is satisfied. We introduce the risk-constrained RL framework, cover popular risk measures based on variance, conditional value-at-risk and cumulative prospect theory, and present a template for a risk-sensitive RL algorithm. We survey some of our recent work on this topic, covering problems encompassing discounted cost, average cost, and stochastic shortest path settings, together with the aforementioned risk measures in a constrained framework. This non-exhaustive survey is aimed at giving a flavor of the challenges involved in solving a risk-sensitive RL problem, and outlining some potential future research directions

    Data-driven satisficing measure and ranking

    Full text link
    We propose an computational framework for real-time risk assessment and prioritizing for random outcomes without prior information on probability distributions. The basic model is built based on satisficing measure (SM) which yields a single index for risk comparison. Since SM is a dual representation for a family of risk measures, we consider problems constrained by general convex risk measures and specifically by Conditional value-at-risk. Starting from offline optimization, we apply sample average approximation technique and argue the convergence rate and validation of optimal solutions. In online stochastic optimization case, we develop primal-dual stochastic approximation algorithms respectively for general risk constrained problems, and derive their regret bounds. For both offline and online cases, we illustrate the relationship between risk ranking accuracy with sample size (or iterations).Comment: 26 Pages, 6 Figure

    Probabilistic analysis of Online Bin Coloring algorithms via Stochastic Comparison

    Get PDF
    This paper proposes a new method for probabilistic analysis of online algorithms that is based on the notion of stochastic dominance. We develop the method for the Online Bin Coloring problem introduced by Krumke et al. Using methods for the stochastic comparison of Markov chains we establish the strong result that the performance of the online algorithm GreedyFit is stochastically dominated by the performance of the algorithm OneBin for any number of items processed. This result gives a more realistic picture than competitive analysis and explains the behavior observed in simulations.mathematical applications;
    corecore