Search CORE

2,173 research outputs found

Crowd Counting with Decomposed Uncertainty

Author: Oh Min-hwan
Olsen Peder A.
Ramamurthy Karthikeyan Natesan
Publication venue
Publication date: 03/04/2020
Field of study

Research in neural networks in the field of computer vision has achieved remarkable accuracy for point estimation. However, the uncertainty in the estimation is rarely addressed. Uncertainty quantification accompanied by point estimation can lead to a more informed decision, and even improve the prediction quality. In this work, we focus on uncertainty estimation in the domain of crowd counting. With increasing occurrences of heavily crowded events such as political rallies, protests, concerts, etc., automated crowd analysis is becoming an increasingly crucial task. The stakes can be very high in many of these real-world applications. We propose a scalable neural network framework with quantification of decomposed uncertainty using a bootstrap ensemble. We demonstrate that the proposed uncertainty quantification method provides additional insight to the crowd counting problem and is simple to implement. We also show that our proposed method exhibits the state of the art performances in many benchmark crowd counting datasets.Comment: Accepted in AAAI 2020 (Main Technical Track

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

Model-Based Reinforcement Learning with Multinomial Logistic Function Approximation

Author: Hwang Taehyun
Oh Min-hwan
Publication venue
Publication date: 27/12/2022
Field of study

We study model-based reinforcement learning (RL) for episodic Markov decision processes (MDP) whose transition probability is parametrized by an unknown transition core with features of state and action. Despite much recent progress in analyzing algorithms in the linear MDP setting, the understanding of more general transition models is very restrictive. In this paper, we establish a provably efficient RL algorithm for the MDP whose state transition is given by a multinomial logistic model. To balance the exploration-exploitation trade-off, we propose an upper confidence bound-based algorithm. We show that our proposed algorithm achieves

\tilde{\mathcal{O}}(d \sqrt{H^3 T})

regret bound where

d

is the dimension of the transition core,

H

is the horizon, and

T

is the total number of steps. To the best of our knowledge, this is the first model-based RL algorithm with multinomial logistic function approximation with provable guarantees. We also comprehensively evaluate our proposed algorithm numerically and show that it consistently outperforms the existing methods, hence achieving both provable efficiency and practical superior performance.Comment: Accepted in AAAI 2023 (Main Technical Track

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

Model-based Offline Reinforcement Learning with Count-based Conservatism

Author: Kim Byeongchan
Oh Min-hwan
Publication venue
Publication date: 21/07/2023
Field of study

In this paper, we propose a model-based offline reinforcement learning method that integrates count-based conservatism, named

\texttt{Count-MORL}

. Our method utilizes the count estimates of state-action pairs to quantify model estimation error, marking the first algorithm of demonstrating the efficacy of count-based conservatism in model-based offline deep RL to the best of our knowledge. For our proposed method, we first show that the estimation error is inversely proportional to the frequency of state-action pairs. Secondly, we demonstrate that the learned policy under the count-based conservative model offers near-optimality performance guarantees. Through extensive numerical experiments, we validate that

\texttt{Count-MORL}

with hash code implementation significantly outperforms existing offline RL algorithms on the D4RL benchmark datasets. The code is accessible at

\href{https://github.com/oh-lab/Count-MORL}{https://github.com/oh-lab/Count-MORL}

.Comment: Accepted in ICML 202

arXiv.org e-Print Archive

Recommended from our members

Sequential Decision Making with Combinatorial Actions and High-Dimensional Contexts

Author: Oh Min-hwan
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2020
Field of study

In interactive sequential decision-making systems, the learning agent needs to react to new information both in the short term and in the long term, and learn to generalize through repeated interactions with the environment. Unlike in offline learning environments, the new data that arrives is typically a function of previous actions taken by the agent. One of the key challenges is to efficiently use and generalize from data that may never reappear. Furthermore, in many real-world applications, the agent only receives partial feedback on the decisions it makes. This necessitates a balanced exploration-exploitation approach, where the agent needs to both efficiently collect relevant information in order to prepare for future arrivals of feedback, and produce the desired outcome in the current periods by exploiting the already collected information. In this thesis, we focus on two classes of fundamental sequential learning problems: Contextual bandits with combinatorial actions and user choice (Chapter 2 and Chapter 3): We investigate the dynamic assortment selection problem by combining statistical estimation of choice models and generalization using contextual information. For this problem, we design and analyze both UCB and Thomson sampling algorithms with rigorous performance guarantees and tractability. High-dimensional contextual bandits (Chapter 4): We investigate policies that can efficiently exploit the structure in high-dimensional data, e.g., sparsity. We design and analyze an efficient sparse contextual bandit algorithm that does not require to know the sparsity of the underlying parameter -- information that essentially all existing sparse bandit algorithms to date require

Columbia University Academic Commons

Combinatorial Neural Bandits

Author: Chai Kyuwook
Hwang Taehyun
Oh Min-hwan
Publication venue
Publication date: 31/05/2023
Field of study

We consider a contextual combinatorial bandit problem where in each round a learning agent selects a subset of arms and receives feedback on the selected arms according to their scores. The score of an arm is an unknown function of the arm's feature. Approximating this unknown score function with deep neural networks, we propose algorithms: Combinatorial Neural UCB (

\texttt{CN-UCB}

) and Combinatorial Neural Thompson Sampling (

\texttt{CN-TS}

). We prove that

\texttt{CN-UCB}

achieves

\tilde{\mathcal{O}}(\tilde{d} \sqrt{T})

\tilde{\mathcal{O}}(\sqrt{\tilde{d} T K})

regret, where

\tilde{d}

is the effective dimension of a neural tangent kernel matrix,

K

is the size of a subset of arms, and

T

is the time horizon. For

\texttt{CN-TS}

, we adapt an optimistic sampling technique to ensure the optimism of the sampled combinatorial action, achieving a worst-case (frequentist) regret of

\tilde{\mathcal{O}}(\tilde{d} \sqrt{TK})

. To the best of our knowledge, these are the first combinatorial neural bandit algorithms with regret performance guarantees. In particular,

\texttt{CN-TS}

is the first Thompson sampling algorithm with the worst-case regret guarantees for the general contextual combinatorial bandit problem. The numerical experiments demonstrate the superior performances of our proposed algorithms.Comment: Accepted in ICML 202

arXiv.org e-Print Archive

Squeeze All: Novel Estimator and Self-Normalized Bound for Linear Contextual Bandits

Author: Kim Wonyoung
Oh Min-hwan
Paik Myunghee Cho
Publication venue
Publication date: 28/03/2023
Field of study

We propose a linear contextual bandit algorithm with

O(\sqrt{dT\log T})

regret bound, where

d

is the dimension of contexts and

T

isthe time horizon. Our proposed algorithm is equipped with a novel estimator in which exploration is embedded through explicit randomization. Depending on the randomization, our proposed estimator takes contributions either from contexts of all arms or from selected contexts. We establish a self-normalized bound for our estimator, which allows a novel decomposition of the cumulative regret into \textit{additive} dimension-dependent terms instead of multiplicative terms. We also prove a novel lower bound of

\Omega(\sqrt{dT})

under our problem setting. Hence, the regret of our proposed algorithm matches the lower bound up to logarithmic factors. The numerical experiments support the theoretical guarantees and show that our proposed method outperforms the existing linear bandit algorithms.Comment: Accepted in Artificial Intelligence and Statistics 202

arXiv.org e-Print Archive