95,045 research outputs found
Detection thresholding using mutual information
In this paper, we introduce a novel non-parametric thresholding method that we term Mutual-Information
Thresholding. In our approach, we choose the two detection thresholds for two input signals such that the
mutual information between the thresholded signals is maximised. Two efficient algorithms implementing our
idea are presented: one using dynamic programming to fully explore the quantised search space and the other
method using the Simplex algorithm to perform gradient ascent to significantly speed up the search, under the
assumption of surface convexity. We demonstrate the effectiveness of our approach in foreground detection
(using multi-modal data) and as a component in a person detection system
Risk and optimal policies in bandit experiments
This paper provides a decision theoretic analysis of bandit experiments. The
bandit setting corresponds to a dynamic programming problem, but solving this
directly is typically infeasible. Working within the framework of diffusion
asymptotics, we define a suitable notion of asymptotic Bayes risk for bandit
settings. For normally distributed rewards, the minimal Bayes risk can be
characterized as the solution to a nonlinear second-order partial differential
equation (PDE). Using a limit of experiments approach, we show that this PDE
characterization also holds asymptotically under both parametric and
non-parametric distribution of the rewards. The approach further describes the
state variables it is asymptotically sufficient to restrict attention to, and
therefore suggests a practical strategy for dimension reduction. The upshot is
that we can approximate the dynamic programming problem defining the bandit
setting with a PDE which can be efficiently solved using sparse matrix
routines. We derive near-optimal policies from the numerical solutions to these
equations. The proposed policies substantially dominate existing methods such
Thompson sampling. The framework also allows for substantial generalizations to
the bandit problem such as time discounting and pure exploration motives
Risk-sensitive Inverse Reinforcement Learning via Semi- and Non-Parametric Methods
The literature on Inverse Reinforcement Learning (IRL) typically assumes that
humans take actions in order to minimize the expected value of a cost function,
i.e., that humans are risk neutral. Yet, in practice, humans are often far from
being risk neutral. To fill this gap, the objective of this paper is to devise
a framework for risk-sensitive IRL in order to explicitly account for a human's
risk sensitivity. To this end, we propose a flexible class of models based on
coherent risk measures, which allow us to capture an entire spectrum of risk
preferences from risk-neutral to worst-case. We propose efficient
non-parametric algorithms based on linear programming and semi-parametric
algorithms based on maximum likelihood for inferring a human's underlying risk
measure and cost function for a rich class of static and dynamic
decision-making settings. The resulting approach is demonstrated on a simulated
driving game with ten human participants. Our method is able to infer and mimic
a wide range of qualitatively different driving styles from highly risk-averse
to risk-neutral in a data-efficient manner. Moreover, comparisons of the
Risk-Sensitive (RS) IRL approach with a risk-neutral model show that the RS-IRL
framework more accurately captures observed participant behavior both
qualitatively and quantitatively, especially in scenarios where catastrophic
outcomes such as collisions can occur.Comment: Submitted to International Journal of Robotics Research; Revision 1:
(i) Clarified minor technical points; (ii) Revised proof for Theorem 3 to
hold under weaker assumptions; (iii) Added additional figures and expanded
discussions to improve readabilit
Assortment Optimization Under Consider-then-Choose Choice Models
Consider-then-choose models, borne out by empirical literature in marketing and psychology, explain that customers choose among alternatives in two phases, by first screening products to decide which alternatives to consider, before then ranking them. In this paper, we develop a dynamic programming framework to study the computational aspects of assortment optimization under consider-then-choose premises. Although non-parametric choice models generally lead to computationally intractable assortment optimization problems, we are able to show that for many empirically vetted assumptions on how customers consider and choose, our resulting dynamic program is efficient. Our approach unifies and subsumes several specialized settings analyzed in previous literature. Empirically, we demonstrate the predictive power of our modeling approach on a combination of synthetic and real industry data sets, where prediction errors are significantly reduced against common parametric choice models. In synthetic experiments, our algorithms lead to practical computation schemes that outperform a state-of-the-art integer programming solver in terms of running time, in several parameter regimes of interest
Strategic polymorphism requires just two combinators!
In previous work, we introduced the notion of functional strategies:
first-class generic functions that can traverse terms of any type while mixing
uniform and type-specific behaviour. Functional strategies transpose the notion
of term rewriting strategies (with coverage of traversal) to the functional
programming paradigm. Meanwhile, a number of Haskell-based models and
combinator suites were proposed to support generic programming with functional
strategies.
In the present paper, we provide a compact and matured reconstruction of
functional strategies. We capture strategic polymorphism by just two primitive
combinators. This is done without commitment to a specific functional language.
We analyse the design space for implementational models of functional
strategies. For completeness, we also provide an operational reference model
for implementing functional strategies (in Haskell). We demonstrate the
generality of our approach by reconstructing representative fragments of the
Strafunski library for functional strategies.Comment: A preliminary version of this paper was presented at IFL 2002, and
included in the informal preproceedings of the worksho
- …