132 research outputs found
On predicting stopping time of human sequential decision-making using discounted satisficing heuristic
“Human sequential decision-making involves two essential questions: (i) what to choose next? , and (ii) when to stop? . Assuming that the human agents choose an alternative according to their preference order, our goal is to model and learn how human agents choose their stopping time while making sequential decisions. In contrary to traditional assumptions in the literature regarding how humans exhibit satisficing behavior on instantaneous utilities, we assume that humans employ a discounted satisficing heuristic to compute their stopping time, i.e., the human agent stops working if the total accumulated utility goes beyond a dynamic threshold that gets discounted with time. In this thesis, we model the stopping time in 3 scenarios where the payoff of the human worker is assumed as (i) single-attribute utility, (ii) multi-attribute utility with known weights, and (iii) multi-attribute utility with unknown weights. We propose algorithms to estimate the model parameters followed by predicting the stopping time in all three scenarios and present the simulation results to demonstrate the error performance. Simulation results are presented to demonstrate the convergence of prediction error of stopping time, in spite of the fact that model parameters converge to biased estimates. This observation is later justified using an illustrative example to show that there are multiple discounted satisficing models that explain the same stopping time decision. A novel web application is also developed to emulate a crowd-sourcing platform in our lab to capture multi-attribute information regarding the task in order to perform validations of the proposed algorithms on real data”--Abstract, page iii
Parallel Bayesian Optimization Using Satisficing Thompson Sampling for Time-Sensitive Black-Box Optimization
Bayesian optimization (BO) is widely used for black-box optimization
problems, and have been shown to perform well in various real-world tasks.
However, most of the existing BO methods aim to learn the optimal solution,
which may become infeasible when the parameter space is extremely large or the
problem is time-sensitive. In these contexts, switching to a satisficing
solution that requires less information can result in better performance. In
this work, we focus on time-sensitive black-box optimization problems and
propose satisficing Thompson sampling-based parallel Bayesian optimization
(STS-PBO) approaches, including synchronous and asynchronous versions. We shift
the target from an optimal solution to a satisficing solution that is easier to
learn. The rate-distortion theory is introduced to construct a loss function
that balances the amount of information that needs to be learned with
sub-optimality, and the Blahut-Arimoto algorithm is adopted to compute the
target solution that reaches the minimum information rate under the distortion
limit at each step. Both discounted and undiscounted Bayesian cumulative regret
bounds are theoretically derived for the proposed STS-PBO approaches. The
effectiveness of the proposed methods is demonstrated on a fast-charging design
problem of Lithium-ion batteries. The results are accordant with theoretical
analyses, and show that our STS-PBO methods outperform both sequential
counterparts and parallel BO with traditional Thompson sampling in both
synchronous and asynchronous settings
Robust Bayesian Satisficing
Distributional shifts pose a significant challenge to achieving robustness in
contemporary machine learning. To overcome this challenge, robust satisficing
(RS) seeks a robust solution to an unspecified distributional shift while
achieving a utility above a desired threshold. This paper focuses on the
problem of RS in contextual Bayesian optimization when there is a discrepancy
between the true and reference distributions of the context. We propose a novel
robust Bayesian satisficing algorithm called RoBOS for noisy black-box
optimization. Our algorithm guarantees sublinear lenient regret under certain
assumptions on the amount of distribution shift. In addition, we define a
weaker notion of regret called robust satisficing regret, in which our
algorithm achieves a sublinear upper bound independent of the amount of
distribution shift. To demonstrate the effectiveness of our method, we apply it
to various learning problems and compare it to other approaches, such as
distributionally robust optimization
Recommended from our members
Stochastic satisficing account of confidence in uncertain value-based decisions
Every day we make choices under uncertainty; choosing what route to work or which queue in a supermarket to take, for example. It is unclear how outcome variance, e.g. uncertainty about waiting time in a queue, affects decisions and confidence when outcome is stochastic and continuous. How does one evaluate and choose between an option with unreliable but high expected reward, and an option with more certain but lower expected reward? Here we used an experimental design where two choices’ payoffs took continuous values, to examine the effect of outcome variance on decision and confidence. We found that our participants’ probability of choosing the good (high expected reward) option decreased when the good or the bad options’ payoffs were more variable. Their confidence ratings were affected by outcome variability, but only when choosing the good option. Unlike perceptual detection tasks, confidence ratings correlated only weakly with decisions’ time, but correlated with the consistency of trial-by-trial choices. Inspired by the satisficing heuristic, we propose a “stochastic satisficing” (SSAT) model for evaluating options with continuous uncertain outcomes. In this model, options are evaluated by their probability of exceeding an acceptability threshold, and confidence reports scale with the chosen option’s thus-defined satisficing probability. Participants’ decisions were best explained by an expected reward model, while the SSAT model provided the best prediction of decision confidence. We further tested and verified the predictions of this model in a second experiment. Our model and experimental results generalize the models of metacognition from perceptual detection tasks to continuous-value based decisions. Finally, we discuss how the stochastic satisficing account of decision confidence serves psychological and social purposes associated with the evaluation, communication and justification of decision-making
Thompson Sampling with Virtual Helping Agents
We address the problem of online sequential decision making, i.e., balancing
the trade-off between exploiting the current knowledge to maximize immediate
performance and exploring the new information to gain long-term benefits using
the multi-armed bandit framework. Thompson sampling is one of the heuristics
for choosing actions that address this exploration-exploitation dilemma. We
first propose a general framework that helps heuristically tune the exploration
versus exploitation trade-off in Thompson sampling using multiple samples from
the posterior distribution. Utilizing this framework, we propose two algorithms
for the multi-armed bandit problem and provide theoretical bounds on the
cumulative regret. Next, we demonstrate the empirical improvement in the
cumulative regret performance of the proposed algorithm over Thompson Sampling.
We also show the effectiveness of the proposed algorithm on real-world
datasets. Contrary to the existing methods, our framework provides a mechanism
to vary the amount of exploration/ exploitation based on the task at hand.
Towards this end, we extend our framework for two additional problems, i.e.,
best arm identification and time-sensitive learning in bandits and compare our
algorithm with existing methods.Comment: 14 pages, 8 figure
Deciding What to Model: Value-Equivalent Sampling for Reinforcement Learning
The quintessential model-based reinforcement-learning agent iteratively
refines its estimates or prior beliefs about the true underlying model of the
environment. Recent empirical successes in model-based reinforcement learning
with function approximation, however, eschew the true model in favor of a
surrogate that, while ignoring various facets of the environment, still
facilitates effective planning over behaviors. Recently formalized as the value
equivalence principle, this algorithmic technique is perhaps unavoidable as
real-world reinforcement learning demands consideration of a simple,
computationally-bounded agent interacting with an overwhelmingly complex
environment, whose underlying dynamics likely exceed the agent's capacity for
representation. In this work, we consider the scenario where agent limitations
may entirely preclude identifying an exactly value-equivalent model,
immediately giving rise to a trade-off between identifying a model that is
simple enough to learn while only incurring bounded sub-optimality. To address
this problem, we introduce an algorithm that, using rate-distortion theory,
iteratively computes an approximately-value-equivalent, lossy compression of
the environment which an agent may feasibly target in lieu of the true model.
We prove an information-theoretic, Bayesian regret bound for our algorithm that
holds for any finite-horizon, episodic sequential decision-making problem.
Crucially, our regret bound can be expressed in one of two possible forms,
providing a performance guarantee for finding either the simplest model that
achieves a desired sub-optimality gap or, alternatively, the best model given a
limit on agent capacity.Comment: Accepted to Neural Information Processing Systems (NeurIPS) 202
- …