2,173 research outputs found

    Crowd Counting with Decomposed Uncertainty

    Full text link
    Research in neural networks in the field of computer vision has achieved remarkable accuracy for point estimation. However, the uncertainty in the estimation is rarely addressed. Uncertainty quantification accompanied by point estimation can lead to a more informed decision, and even improve the prediction quality. In this work, we focus on uncertainty estimation in the domain of crowd counting. With increasing occurrences of heavily crowded events such as political rallies, protests, concerts, etc., automated crowd analysis is becoming an increasingly crucial task. The stakes can be very high in many of these real-world applications. We propose a scalable neural network framework with quantification of decomposed uncertainty using a bootstrap ensemble. We demonstrate that the proposed uncertainty quantification method provides additional insight to the crowd counting problem and is simple to implement. We also show that our proposed method exhibits the state of the art performances in many benchmark crowd counting datasets.Comment: Accepted in AAAI 2020 (Main Technical Track

    Model-Based Reinforcement Learning with Multinomial Logistic Function Approximation

    Full text link
    We study model-based reinforcement learning (RL) for episodic Markov decision processes (MDP) whose transition probability is parametrized by an unknown transition core with features of state and action. Despite much recent progress in analyzing algorithms in the linear MDP setting, the understanding of more general transition models is very restrictive. In this paper, we establish a provably efficient RL algorithm for the MDP whose state transition is given by a multinomial logistic model. To balance the exploration-exploitation trade-off, we propose an upper confidence bound-based algorithm. We show that our proposed algorithm achieves O~(dH3T)\tilde{\mathcal{O}}(d \sqrt{H^3 T}) regret bound where dd is the dimension of the transition core, HH is the horizon, and TT is the total number of steps. To the best of our knowledge, this is the first model-based RL algorithm with multinomial logistic function approximation with provable guarantees. We also comprehensively evaluate our proposed algorithm numerically and show that it consistently outperforms the existing methods, hence achieving both provable efficiency and practical superior performance.Comment: Accepted in AAAI 2023 (Main Technical Track

    Model-based Offline Reinforcement Learning with Count-based Conservatism

    Full text link
    In this paper, we propose a model-based offline reinforcement learning method that integrates count-based conservatism, named Count-MORL\texttt{Count-MORL}. Our method utilizes the count estimates of state-action pairs to quantify model estimation error, marking the first algorithm of demonstrating the efficacy of count-based conservatism in model-based offline deep RL to the best of our knowledge. For our proposed method, we first show that the estimation error is inversely proportional to the frequency of state-action pairs. Secondly, we demonstrate that the learned policy under the count-based conservative model offers near-optimality performance guarantees. Through extensive numerical experiments, we validate that Count-MORL\texttt{Count-MORL} with hash code implementation significantly outperforms existing offline RL algorithms on the D4RL benchmark datasets. The code is accessible at \href\href{https://github.com/oh-lab/Count-MORL}{https://github.com/oh-lab/Count-MORL}.Comment: Accepted in ICML 202

    Combinatorial Neural Bandits

    Full text link
    We consider a contextual combinatorial bandit problem where in each round a learning agent selects a subset of arms and receives feedback on the selected arms according to their scores. The score of an arm is an unknown function of the arm's feature. Approximating this unknown score function with deep neural networks, we propose algorithms: Combinatorial Neural UCB (CN-UCB\texttt{CN-UCB}) and Combinatorial Neural Thompson Sampling (CN-TS\texttt{CN-TS}). We prove that CN-UCB\texttt{CN-UCB} achieves O~(d~T)\tilde{\mathcal{O}}(\tilde{d} \sqrt{T}) or O~(d~TK)\tilde{\mathcal{O}}(\sqrt{\tilde{d} T K}) regret, where d~\tilde{d} is the effective dimension of a neural tangent kernel matrix, KK is the size of a subset of arms, and TT is the time horizon. For CN-TS\texttt{CN-TS}, we adapt an optimistic sampling technique to ensure the optimism of the sampled combinatorial action, achieving a worst-case (frequentist) regret of O~(d~TK)\tilde{\mathcal{O}}(\tilde{d} \sqrt{TK}). To the best of our knowledge, these are the first combinatorial neural bandit algorithms with regret performance guarantees. In particular, CN-TS\texttt{CN-TS} is the first Thompson sampling algorithm with the worst-case regret guarantees for the general contextual combinatorial bandit problem. The numerical experiments demonstrate the superior performances of our proposed algorithms.Comment: Accepted in ICML 202

    Squeeze All: Novel Estimator and Self-Normalized Bound for Linear Contextual Bandits

    Full text link
    We propose a linear contextual bandit algorithm with O(dTlog⁑T)O(\sqrt{dT\log T}) regret bound, where dd is the dimension of contexts and TT isthe time horizon. Our proposed algorithm is equipped with a novel estimator in which exploration is embedded through explicit randomization. Depending on the randomization, our proposed estimator takes contributions either from contexts of all arms or from selected contexts. We establish a self-normalized bound for our estimator, which allows a novel decomposition of the cumulative regret into \textit{additive} dimension-dependent terms instead of multiplicative terms. We also prove a novel lower bound of Ω(dT)\Omega(\sqrt{dT}) under our problem setting. Hence, the regret of our proposed algorithm matches the lower bound up to logarithmic factors. The numerical experiments support the theoretical guarantees and show that our proposed method outperforms the existing linear bandit algorithms.Comment: Accepted in Artificial Intelligence and Statistics 202
    • …
    corecore