13 research outputs found
Relative Importance Sampling For Off-Policy Actor-Critic in Deep Reinforcement Learning
Off-policy learning is more unstable compared to on-policy learning in
reinforcement learning (RL). One reason for the instability of off-policy
learning is a discrepancy between the target () and behavior (b) policy
distributions. The discrepancy between and b distributions can be
alleviated by employing a smooth variant of the importance sampling (IS), such
as the relative importance sampling (RIS). RIS has parameter
which controls smoothness. To cope with instability, we present the first
relative importance sampling-off-policy actor-critic (RIS-Off-PAC) model-free
algorithms in RL. In our method, the network yields a target policy (the
actor), a value function (the critic) assessing the current policy ()
using samples drawn from behavior policy. We use action value generated from
the behavior policy in reward function to train our algorithm rather than from
the target policy. We also use deep neural networks to train both actor and
critic. We evaluated our algorithm on a number of Open AI Gym benchmark
problems and demonstrate better or comparable performance to several
state-of-the-art RL baselines
Statistical Hardware Design With Multi-model Active Learning
With the rising complexity of numerous novel applications that serve our
modern society comes the strong need to design efficient computing platforms.
Designing efficient hardware is, however, a complex multi-objective problem
that deals with multiple parameters and their interactions. Given that there
are a large number of parameters and objectives involved in hardware design,
synthesizing all possible combinations is not a feasible method to find the
optimal solution. One promising approach to tackle this problem is statistical
modeling of a desired hardware performance. Here, we propose a model-based
active learning approach to solve this problem. Our proposed method uses
Bayesian models to characterize various aspects of hardware performance. We
also use transfer learning and Gaussian regression bootstrapping techniques in
conjunction with active learning to create more accurate models. Our proposed
statistical modeling method provides hardware models that are sufficiently
accurate to perform design space exploration as well as performance prediction
simultaneously. We use our proposed method to perform design space exploration
and performance prediction for various hardware setups, such as
micro-architecture design and OpenCL kernels for FPGA targets. Our experiments
show that the number of samples required to create performance models
significantly reduces while maintaining the predictive power of our proposed
statistical models. For instance, in our performance prediction setting, the
proposed method needs 65% fewer samples to create the model, and in the design
space exploration setting, our proposed method can find the best parameter
settings by exploring less than 50 samples.Comment: added a reference for GRP subsampling and corrected typo
Pathology Steered Stratification Network for Subtype Identification in Alzheimer's Disease
Alzheimer's disease (AD) is a heterogeneous, multifactorial neurodegenerative
disorder characterized by beta-amyloid, pathologic tau, and neurodegeneration.
There are no effective treatments for Alzheimer's disease at a late stage,
urging for early intervention. However, existing statistical inference
approaches of AD subtype identification ignore the pathological domain
knowledge, which could lead to ill-posed results that are sometimes
inconsistent with the essential neurological principles. Integrating systems
biology modeling with machine learning, we propose a novel pathology steered
stratification network (PSSN) that incorporates established domain knowledge in
AD pathology through a reaction-diffusion model, where we consider non-linear
interactions between major biomarkers and diffusion along brain structural
network. Trained on longitudinal multimodal neuroimaging data, the biological
model predicts long-term trajectories that capture individual progression
pattern, filling in the gaps between sparse imaging data available. A deep
predictive neural network is then built to exploit spatiotemporal dynamics,
link neurological examinations with clinical profiles, and generate subtype
assignment probability on an individual basis. We further identify an
evolutionary disease graph to quantify subtype transition probabilities through
extensive simulations. Our stratification achieves superior performance in both
inter-cluster heterogeneity and intra-cluster homogeneity of various clinical
scores. Applying our approach to enriched samples of aging populations, we
identify six subtypes spanning AD spectrum, where each subtype exhibits a
distinctive biomarker pattern that is consistent with its clinical outcome.
PSSN provides insights into pre-symptomatic diagnosis and practical guidance on
clinical treatments, which may be further generalized to other
neurodegenerative diseases
Adaptive TTL-Based Caching for Content Delivery
Content Delivery Networks (CDNs) deliver a majority of the user-requested
content on the Internet, including web pages, videos, and software downloads. A
CDN server caches and serves the content requested by users. Designing caching
algorithms that automatically adapt to the heterogeneity, burstiness, and
non-stationary nature of real-world content requests is a major challenge and
is the focus of our work. While there is much work on caching algorithms for
stationary request traffic, the work on non-stationary request traffic is very
limited. Consequently, most prior models are inaccurate for production CDN
traffic that is non-stationary.
We propose two TTL-based caching algorithms and provide provable guarantees
for content request traffic that is bursty and non-stationary. The first
algorithm called d-TTL dynamically adapts a TTL parameter using a stochastic
approximation approach. Given a feasible target hit rate, we show that the hit
rate of d-TTL converges to its target value for a general class of bursty
traffic that allows Markov dependence over time and non-stationary arrivals.
The second algorithm called f-TTL uses two caches, each with its own TTL. The
first-level cache adaptively filters out non-stationary traffic, while the
second-level cache stores frequently-accessed stationary traffic. Given
feasible targets for both the hit rate and the expected cache size, f-TTL
asymptotically achieves both targets. We implement d-TTL and f-TTL and evaluate
both algorithms using an extensive nine-day trace consisting of 500 million
requests from a production CDN server. We show that both d-TTL and f-TTL
converge to their hit rate targets with an error of about 1.3%. But, f-TTL
requires a significantly smaller cache size than d-TTL to achieve the same hit
rate, since it effectively filters out the non-stationary traffic for
rarely-accessed objects
Dynamic treatment regimes: Technical challenges and applications
Dynamic treatment regimes are of growing interest across the clinical sciences because these regimes provide one way to operationalize and thus inform sequential personalized clinical decision making. Formally, a dynamic treatment regime is a sequence of decision rules, one per stage of clinical intervention. Each decision rule maps up-to-date patient information to a recommended treatment. We briefly review a variety of approaches for using data to construct the decision rules. We then review a critical inferential challenge that results from nonregularity, which often arises in this area. In particular, nonregularity arises in inference for parameters in the optimal dynamic treatment regime; the asymptotic, limiting, distribution of estimators are sensitive to local perturbations. We propose and evaluate a locally consistent Adaptive Confidence Interval (ACI) for the parameters of the optimal dynamic treatment regime. We use data from the Adaptive Pharmacological and Behavioral Treatments for Children with ADHD Trial as an illustrative example. We conclude by highlighting and discussing emerging theoretical problems in this area
The Definitive Guide to Policy Gradients in Deep Reinforcement Learning: Theory, Algorithms and Implementations
In recent years, various powerful policy gradient algorithms have been
proposed in deep reinforcement learning. While all these algorithms build on
the Policy Gradient Theorem, the specific design choices differ significantly
across algorithms. We provide a holistic overview of on-policy policy gradient
algorithms to facilitate the understanding of both their theoretical
foundations and their practical implementations. In this overview, we include a
detailed proof of the continuous version of the Policy Gradient Theorem,
convergence results and a comprehensive discussion of practical algorithms. We
compare the most prominent algorithms on continuous control environments and
provide insights on the benefits of regularization. All code is available at
https://github.com/Matt00n/PolicyGradientsJax