21 research outputs found
Parameterized Indexed Value Function for Efficient Exploration in Reinforcement Learning
It is well known that quantifying uncertainty in the action-value estimates
is crucial for efficient exploration in reinforcement learning. Ensemble
sampling offers a relatively computationally tractable way of doing this using
randomized value functions. However, it still requires a huge amount of
computational resources for complex problems. In this paper, we present an
alternative, computationally efficient way to induce exploration using index
sampling. We use an indexed value function to represent uncertainty in our
action-value estimates. We first present an algorithm to learn parameterized
indexed value function through a distributional version of temporal difference
in a tabular setting and prove its regret bound. Then, in a computational point
of view, we propose a dual-network architecture, Parameterized Indexed Networks
(PINs), comprising one mean network and one uncertainty network to learn the
indexed value function. Finally, we show the efficacy of PINs through
computational experiments.Comment: 17 pages, 4 figures, Proceedings of the 34th AAAI Conference on
Artificial Intelligenc
Sample-Efficient Multi-Agent RL: An Optimization Perspective
We study multi-agent reinforcement learning (MARL) for the general-sum Markov
Games (MGs) under the general function approximation. In order to find the
minimum assumption for sample-efficient learning, we introduce a novel
complexity measure called the Multi-Agent Decoupling Coefficient (MADC) for
general-sum MGs. Using this measure, we propose the first unified algorithmic
framework that ensures sample efficiency in learning Nash Equilibrium, Coarse
Correlated Equilibrium, and Correlated Equilibrium for both model-based and
model-free MARL problems with low MADC. We also show that our algorithm
provides comparable sublinear regret to the existing works. Moreover, our
algorithm combines an equilibrium-solving oracle with a single objective
optimization subprocedure that solves for the regularized payoff of each
deterministic joint policy, which avoids solving constrained optimization
problems within data-dependent constraints (Jin et al. 2020; Wang et al. 2023)
or executing sampling procedures with complex multi-objective optimization
problems (Foster et al. 2023), thus being more amenable to empirical
implementation
Learning in Congestion Games with Bandit Feedback
In this paper, we investigate Nash-regret minimization in congestion games, a
class of games with benign theoretical structure and broad real-world
applications. We first propose a centralized algorithm based on the optimism in
the face of uncertainty principle for congestion games with (semi-)bandit
feedback, and obtain finite-sample guarantees. Then we propose a decentralized
algorithm via a novel combination of the Frank-Wolfe method and G-optimal
design. By exploiting the structure of the congestion game, we show the sample
complexity of both algorithms depends only polynomially on the number of
players and the number of facilities, but not the size of the action set, which
can be exponentially large in terms of the number of facilities. We further
define a new problem class, Markov congestion games, which allows us to model
the non-stationarity in congestion games. We propose a centralized algorithm
for Markov congestion games, whose sample complexity again has only polynomial
dependence on all relevant problem parameters, but not the size of the action
set.Comment: 34 pages, Thirty-sixth Conference on Neural Information Processing
Systems (NeurIPS 2022
A/B Testing and Best-arm Identification for Linear Bandits with Robustness to Non-stationarity
We investigate the fixed-budget best-arm identification (BAI) problem for
linear bandits in a potentially non-stationary environment. Given a finite arm
set , a fixed budget , and an unpredictable
sequence of parameters , an
algorithm will aim to correctly identify the best arm with probability as
high as possible. Prior work has addressed the stationary setting where
for all and demonstrated that the error probability
decreases as for a problem-dependent constant . But
in many real-world multivariate testing scenarios that motivate our
work, the environment is non-stationary and an algorithm expecting a stationary
setting can easily fail. For robust identification, it is well-known that if
arms are chosen randomly and non-adaptively from a G-optimal design over
at each time then the error probability decreases as
, where . As there exist environments where
, we are motivated to propose a novel
algorithm - that aims to obtain the best of both
worlds: robustness to non-stationarity and fast rates of identification in
benign settings. We characterize the error probability of
- and demonstrate empirically that the algorithm
indeed never performs worse than G-optimal design but compares favorably to the
best algorithms in the stationary setting.Comment: 25 pages, 6 figure
One Objective to Rule Them All: A Maximization Objective Fusing Estimation and Planning for Exploration
In online reinforcement learning (online RL), balancing exploration and
exploitation is crucial for finding an optimal policy in a sample-efficient
way. To achieve this, existing sample-efficient online RL algorithms typically
consist of three components: estimation, planning, and exploration. However, in
order to cope with general function approximators, most of them involve
impractical algorithmic components to incentivize exploration, such as
optimization within data-dependent level-sets or complicated sampling
procedures. To address this challenge, we propose an easy-to-implement RL
framework called \textit{Maximize to Explore} (\texttt{MEX}), which only needs
to optimize \emph{unconstrainedly} a single objective that integrates the
estimation and planning components while balancing exploration and exploitation
automatically. Theoretically, we prove that \texttt{MEX} achieves a sublinear
regret with general function approximations for Markov decision processes (MDP)
and is further extendable to two-player zero-sum Markov games (MG). Meanwhile,
we adapt deep RL baselines to design practical versions of \texttt{MEX}, in
both model-free and model-based manners, which can outperform baselines by a
stable margin in various MuJoCo environments with sparse rewards. Compared with
existing sample-efficient online RL algorithms with general function
approximations, \texttt{MEX} achieves similar sample efficiency while enjoying
a lower computational cost and is more compatible with modern deep RL methods
Substantial transition to clean household energy mix in rural China
The household energy mix has significant impacts on human health and climate, as it contributes greatly to many health- and climate-relevant air pollutants. Compared to the well-established urban energy statistical system, the rural household energy statistical system is incomplete and is often associated with high biases. Via a nationwide investigation, this study revealed high contributions to energy supply from coal and biomass fuels in the rural household energy sector, while electricity comprised ∼20%. Stacking (the use of multiple sources of energy) is significant, and the average number of energy types was 2.8 per household. Compared to 2012, the consumption of biomass and coals in 2017 decreased by 45% and 12%, respectively, while the gas consumption amount increased by 204%. Increased gas and decreased coal consumptions were mainly in cooking, while decreased biomass was in both cooking (41%) and heating (59%). The time-sharing fraction of electricity and gases (E&G) for daily cooking grew, reaching 69% in 2017, but for space heating, traditional solid fuels were still dominant, with the national average shared fraction of E&G being only 20%. The non-uniform spatial distribution and the non-linear increase in the fraction of E&G indicated challenges to achieving universal access to modern cooking energy by 2030, particularly in less-developed rural and mountainous areas. In some non-typical heating zones, the increased share of E&G for heating was significant and largely driven by income growth, but in typical heating zones, the time-sharing fraction was <5% and was not significantly increased, except in areas with policy intervention. The intervention policy not only led to dramatic increases in the clean energy fraction for heating but also accelerated the clean cooking transition. Higher income, higher education, younger age, less energy/stove stacking and smaller family size positively impacted the clean energy transition
Robust estimation of bacterial cell count from optical density
Optical density (OD) is widely used to estimate the density of cells in liquid culture, but cannot be compared between instruments without a standardized calibration protocol and is challenging to relate to actual cell count. We address this with an interlaboratory study comparing three simple, low-cost, and highly accessible OD calibration protocols across 244 laboratories, applied to eight strains of constitutive GFP-expressing E. coli. Based on our results, we recommend calibrating OD to estimated cell count using serial dilution of silica microspheres, which produces highly precise calibration (95.5% of residuals <1.2-fold), is easily assessed for quality control, also assesses instrument effective linear range, and can be combined with fluorescence calibration to obtain units of Molecules of Equivalent Fluorescein (MEFL) per cell, allowing direct comparison and data fusion with flow cytometry measurements: in our study, fluorescence per cell measurements showed only a 1.07-fold mean difference between plate reader and flow cytometry data
Recommended from our members
No Vaccine for Me: Investigating How Misinformation Targeting COVID-19 Vaccines Went Viral on Twitter
A major action to end the global pandemic of COVID-19 is the vaccine-rollout programs. However, with pervasive online misinformation targeting the vaccine, it has been difficult to reach the vaccination level required to form herd immunity. The wide spread of such misinformation can find its roots in the homophily and polarization of social media platforms, the long-lasting anti-vaccine sentiments, and the uniquely politicized narrative on pandemic in the U.S. Using an established dataset with tweets annotated with stances towards misinformation, the current research compared the features of post narrative and of their authors using natural language processing and network analysis techniques. The main findings included (1) people adopted vaccine misinformation used a less coherent but more provoking narrative, and attached more negative sentiment and concepts to the vaccines; (2) people adopted vaccine misinformation showed generally lower influence on social media; they relied more on fake news websites and followed more people holding hostile attitudes toward the vaccine, which formed newsfeeds with significantly more unreliable, anti-vaccine information that impacted their attitudes and behaviors. This research fostered the understanding of online discussion concerning vaccine misinformation, and may help platforms and public health experts to perform interventions at the right time and on the right targets when addressing vaccine hesitancy
Using Wasserstein GAN to generate high quality adversarial examples
Although Deep Neural Networks (DNNs) have state-of-the-art performance
in various machine learning tasks, in recent years, they are found to be
vulnerable to so-called adversarial examples Specifically, take x is an element of D on
which a neural network has very high classification accuracy. It is possible to
find some small perturbation Δx so that even though the difference between
x and x + Δx = x′ is almost imperceptible to humans, the given neural
network is very likely to incorrectly classify x + Δx.
Several gradient and optimization based methods have been proposed to
create such adversarial examples x′, but many of them cannot achieve high
speed and high quality x′ simultaneously. In this thesis, we propose a new
algorithm to generate adversarial examples based on Generative Adversarial
Networks (GANs), specifically, a modification to the training algorithm of
the Improved Wasserstein GAN. The trained generator is able to create x′
very similar to the original x while keeping the classification accuracy of the
target model as low as the state-of-the-art attack. Furthermore, although
training a GAN might be slow, after it is trained, it can generate adversarial
examples much faster than previous optimization-based methods. Our goal
is for this work to be used for further research on robust neural networks.U of I Onlyundergraduate senior thesis not recommended for open acces
Near-Optimal Randomized Exploration for Tabular MDP
We study exploration using randomized value functions in Thompson Sampling
(TS)-like algorithms in reinforcement learning. This type of algorithms enjoys
appealing empirical performance. We show that when we use 1) a single random
seed in each episode, and 2) a Bernstein-type magnitude of noise, we obtain a
worst-case regret bound for episodic
time-inhomogeneous Markov Decision Process where is the size of state
space, is the size of action space, is the planning horizon and is
the number of interactions. This bound polynomially improves all existing
bounds for TS-like algorithms based on randomized value functions, and for the
first time, matches the lower bound up to
logarithmic factors. Our result highlights that randomized exploration can be
near-optimal, which was previously only achieved by optimistic algorithms. To
achieve the desired result, we develop 1) a new clipping operation to ensure
both the probability being optimistic and the probability being pessimistic are
lower bounded by a constant, and 2) a new recursion formula for the absolute
value of estimations errors to analyze the regret.Comment: 42 page