2,173 research outputs found
Crowd Counting with Decomposed Uncertainty
Research in neural networks in the field of computer vision has achieved
remarkable accuracy for point estimation. However, the uncertainty in the
estimation is rarely addressed. Uncertainty quantification accompanied by point
estimation can lead to a more informed decision, and even improve the
prediction quality. In this work, we focus on uncertainty estimation in the
domain of crowd counting. With increasing occurrences of heavily crowded events
such as political rallies, protests, concerts, etc., automated crowd analysis
is becoming an increasingly crucial task. The stakes can be very high in many
of these real-world applications. We propose a scalable neural network
framework with quantification of decomposed uncertainty using a bootstrap
ensemble. We demonstrate that the proposed uncertainty quantification method
provides additional insight to the crowd counting problem and is simple to
implement. We also show that our proposed method exhibits the state of the art
performances in many benchmark crowd counting datasets.Comment: Accepted in AAAI 2020 (Main Technical Track
Model-Based Reinforcement Learning with Multinomial Logistic Function Approximation
We study model-based reinforcement learning (RL) for episodic Markov decision
processes (MDP) whose transition probability is parametrized by an unknown
transition core with features of state and action. Despite much recent progress
in analyzing algorithms in the linear MDP setting, the understanding of more
general transition models is very restrictive. In this paper, we establish a
provably efficient RL algorithm for the MDP whose state transition is given by
a multinomial logistic model. To balance the exploration-exploitation
trade-off, we propose an upper confidence bound-based algorithm. We show that
our proposed algorithm achieves regret
bound where is the dimension of the transition core, is the horizon,
and is the total number of steps. To the best of our knowledge, this is the
first model-based RL algorithm with multinomial logistic function approximation
with provable guarantees. We also comprehensively evaluate our proposed
algorithm numerically and show that it consistently outperforms the existing
methods, hence achieving both provable efficiency and practical superior
performance.Comment: Accepted in AAAI 2023 (Main Technical Track
Model-based Offline Reinforcement Learning with Count-based Conservatism
In this paper, we propose a model-based offline reinforcement learning method
that integrates count-based conservatism, named . Our
method utilizes the count estimates of state-action pairs to quantify model
estimation error, marking the first algorithm of demonstrating the efficacy of
count-based conservatism in model-based offline deep RL to the best of our
knowledge. For our proposed method, we first show that the estimation error is
inversely proportional to the frequency of state-action pairs. Secondly, we
demonstrate that the learned policy under the count-based conservative model
offers near-optimality performance guarantees. Through extensive numerical
experiments, we validate that with hash code
implementation significantly outperforms existing offline RL algorithms on the
D4RL benchmark datasets. The code is accessible at
.Comment: Accepted in ICML 202
Recommended from our members
Sequential Decision Making with Combinatorial Actions and High-Dimensional Contexts
In interactive sequential decision-making systems, the learning agent needs to react to new information both in the short term and in the long term, and learn to generalize through repeated interactions with the environment. Unlike in offline learning environments, the new data that arrives is typically a function of previous actions taken by the agent. One of the key challenges is to efficiently use and generalize from data that may never reappear. Furthermore, in many real-world applications, the agent only receives partial feedback on the decisions it makes. This necessitates a balanced exploration-exploitation approach, where the agent needs to both efficiently collect relevant information in order to prepare for future arrivals of feedback, and produce the desired outcome in the current periods by exploiting the already collected information. In this thesis, we focus on two classes of fundamental sequential learning problems:
Contextual bandits with combinatorial actions and user choice (Chapter 2 and Chapter 3):
We investigate the dynamic assortment selection problem by combining statistical estimation of choice models and generalization using contextual information. For this problem, we design and analyze both UCB and Thomson sampling algorithms with rigorous performance guarantees and tractability.
High-dimensional contextual bandits (Chapter 4):
We investigate policies that can efficiently exploit the structure in high-dimensional data, e.g., sparsity. We design and analyze an efficient sparse contextual bandit algorithm that does not require to know the sparsity of the underlying parameter -- information that essentially all existing sparse bandit algorithms to date require
Combinatorial Neural Bandits
We consider a contextual combinatorial bandit problem where in each round a
learning agent selects a subset of arms and receives feedback on the selected
arms according to their scores. The score of an arm is an unknown function of
the arm's feature. Approximating this unknown score function with deep neural
networks, we propose algorithms: Combinatorial Neural UCB ()
and Combinatorial Neural Thompson Sampling (). We prove that
achieves or
regret, where is the
effective dimension of a neural tangent kernel matrix, is the size of a
subset of arms, and is the time horizon. For , we adapt an
optimistic sampling technique to ensure the optimism of the sampled
combinatorial action, achieving a worst-case (frequentist) regret of
. To the best of our knowledge, these
are the first combinatorial neural bandit algorithms with regret performance
guarantees. In particular, is the first Thompson sampling
algorithm with the worst-case regret guarantees for the general contextual
combinatorial bandit problem. The numerical experiments demonstrate the
superior performances of our proposed algorithms.Comment: Accepted in ICML 202
Squeeze All: Novel Estimator and Self-Normalized Bound for Linear Contextual Bandits
We propose a linear contextual bandit algorithm with
regret bound, where is the dimension of contexts and isthe time
horizon. Our proposed algorithm is equipped with a novel estimator in which
exploration is embedded through explicit randomization. Depending on the
randomization, our proposed estimator takes contributions either from contexts
of all arms or from selected contexts. We establish a self-normalized bound for
our estimator, which allows a novel decomposition of the cumulative regret into
\textit{additive} dimension-dependent terms instead of multiplicative terms. We
also prove a novel lower bound of under our problem
setting. Hence, the regret of our proposed algorithm matches the lower bound up
to logarithmic factors. The numerical experiments support the theoretical
guarantees and show that our proposed method outperforms the existing linear
bandit algorithms.Comment: Accepted in Artificial Intelligence and Statistics 202
- β¦