95,045 research outputs found

    Detection thresholding using mutual information

    Get PDF
    In this paper, we introduce a novel non-parametric thresholding method that we term Mutual-Information Thresholding. In our approach, we choose the two detection thresholds for two input signals such that the mutual information between the thresholded signals is maximised. Two efficient algorithms implementing our idea are presented: one using dynamic programming to fully explore the quantised search space and the other method using the Simplex algorithm to perform gradient ascent to significantly speed up the search, under the assumption of surface convexity. We demonstrate the effectiveness of our approach in foreground detection (using multi-modal data) and as a component in a person detection system

    Risk and optimal policies in bandit experiments

    Full text link
    This paper provides a decision theoretic analysis of bandit experiments. The bandit setting corresponds to a dynamic programming problem, but solving this directly is typically infeasible. Working within the framework of diffusion asymptotics, we define a suitable notion of asymptotic Bayes risk for bandit settings. For normally distributed rewards, the minimal Bayes risk can be characterized as the solution to a nonlinear second-order partial differential equation (PDE). Using a limit of experiments approach, we show that this PDE characterization also holds asymptotically under both parametric and non-parametric distribution of the rewards. The approach further describes the state variables it is asymptotically sufficient to restrict attention to, and therefore suggests a practical strategy for dimension reduction. The upshot is that we can approximate the dynamic programming problem defining the bandit setting with a PDE which can be efficiently solved using sparse matrix routines. We derive near-optimal policies from the numerical solutions to these equations. The proposed policies substantially dominate existing methods such Thompson sampling. The framework also allows for substantial generalizations to the bandit problem such as time discounting and pure exploration motives

    Risk-sensitive Inverse Reinforcement Learning via Semi- and Non-Parametric Methods

    Full text link
    The literature on Inverse Reinforcement Learning (IRL) typically assumes that humans take actions in order to minimize the expected value of a cost function, i.e., that humans are risk neutral. Yet, in practice, humans are often far from being risk neutral. To fill this gap, the objective of this paper is to devise a framework for risk-sensitive IRL in order to explicitly account for a human's risk sensitivity. To this end, we propose a flexible class of models based on coherent risk measures, which allow us to capture an entire spectrum of risk preferences from risk-neutral to worst-case. We propose efficient non-parametric algorithms based on linear programming and semi-parametric algorithms based on maximum likelihood for inferring a human's underlying risk measure and cost function for a rich class of static and dynamic decision-making settings. The resulting approach is demonstrated on a simulated driving game with ten human participants. Our method is able to infer and mimic a wide range of qualitatively different driving styles from highly risk-averse to risk-neutral in a data-efficient manner. Moreover, comparisons of the Risk-Sensitive (RS) IRL approach with a risk-neutral model show that the RS-IRL framework more accurately captures observed participant behavior both qualitatively and quantitatively, especially in scenarios where catastrophic outcomes such as collisions can occur.Comment: Submitted to International Journal of Robotics Research; Revision 1: (i) Clarified minor technical points; (ii) Revised proof for Theorem 3 to hold under weaker assumptions; (iii) Added additional figures and expanded discussions to improve readabilit

    Assortment Optimization Under Consider-then-Choose Choice Models

    Get PDF
    Consider-then-choose models, borne out by empirical literature in marketing and psychology, explain that customers choose among alternatives in two phases, by first screening products to decide which alternatives to consider, before then ranking them. In this paper, we develop a dynamic programming framework to study the computational aspects of assortment optimization under consider-then-choose premises. Although non-parametric choice models generally lead to computationally intractable assortment optimization problems, we are able to show that for many empirically vetted assumptions on how customers consider and choose, our resulting dynamic program is efficient. Our approach unifies and subsumes several specialized settings analyzed in previous literature. Empirically, we demonstrate the predictive power of our modeling approach on a combination of synthetic and real industry data sets, where prediction errors are significantly reduced against common parametric choice models. In synthetic experiments, our algorithms lead to practical computation schemes that outperform a state-of-the-art integer programming solver in terms of running time, in several parameter regimes of interest

    Strategic polymorphism requires just two combinators!

    Get PDF
    In previous work, we introduced the notion of functional strategies: first-class generic functions that can traverse terms of any type while mixing uniform and type-specific behaviour. Functional strategies transpose the notion of term rewriting strategies (with coverage of traversal) to the functional programming paradigm. Meanwhile, a number of Haskell-based models and combinator suites were proposed to support generic programming with functional strategies. In the present paper, we provide a compact and matured reconstruction of functional strategies. We capture strategic polymorphism by just two primitive combinators. This is done without commitment to a specific functional language. We analyse the design space for implementational models of functional strategies. For completeness, we also provide an operational reference model for implementing functional strategies (in Haskell). We demonstrate the generality of our approach by reconstructing representative fragments of the Strafunski library for functional strategies.Comment: A preliminary version of this paper was presented at IFL 2002, and included in the informal preproceedings of the worksho
    corecore