13 research outputs found

    Relative Importance Sampling For Off-Policy Actor-Critic in Deep Reinforcement Learning

    Full text link
    Off-policy learning is more unstable compared to on-policy learning in reinforcement learning (RL). One reason for the instability of off-policy learning is a discrepancy between the target (π\pi) and behavior (b) policy distributions. The discrepancy between π\pi and b distributions can be alleviated by employing a smooth variant of the importance sampling (IS), such as the relative importance sampling (RIS). RIS has parameter β∈[0,1]\beta\in[0, 1] which controls smoothness. To cope with instability, we present the first relative importance sampling-off-policy actor-critic (RIS-Off-PAC) model-free algorithms in RL. In our method, the network yields a target policy (the actor), a value function (the critic) assessing the current policy (π\pi) using samples drawn from behavior policy. We use action value generated from the behavior policy in reward function to train our algorithm rather than from the target policy. We also use deep neural networks to train both actor and critic. We evaluated our algorithm on a number of Open AI Gym benchmark problems and demonstrate better or comparable performance to several state-of-the-art RL baselines

    Statistical Hardware Design With Multi-model Active Learning

    Full text link
    With the rising complexity of numerous novel applications that serve our modern society comes the strong need to design efficient computing platforms. Designing efficient hardware is, however, a complex multi-objective problem that deals with multiple parameters and their interactions. Given that there are a large number of parameters and objectives involved in hardware design, synthesizing all possible combinations is not a feasible method to find the optimal solution. One promising approach to tackle this problem is statistical modeling of a desired hardware performance. Here, we propose a model-based active learning approach to solve this problem. Our proposed method uses Bayesian models to characterize various aspects of hardware performance. We also use transfer learning and Gaussian regression bootstrapping techniques in conjunction with active learning to create more accurate models. Our proposed statistical modeling method provides hardware models that are sufficiently accurate to perform design space exploration as well as performance prediction simultaneously. We use our proposed method to perform design space exploration and performance prediction for various hardware setups, such as micro-architecture design and OpenCL kernels for FPGA targets. Our experiments show that the number of samples required to create performance models significantly reduces while maintaining the predictive power of our proposed statistical models. For instance, in our performance prediction setting, the proposed method needs 65% fewer samples to create the model, and in the design space exploration setting, our proposed method can find the best parameter settings by exploring less than 50 samples.Comment: added a reference for GRP subsampling and corrected typo

    Pathology Steered Stratification Network for Subtype Identification in Alzheimer's Disease

    Full text link
    Alzheimer's disease (AD) is a heterogeneous, multifactorial neurodegenerative disorder characterized by beta-amyloid, pathologic tau, and neurodegeneration. There are no effective treatments for Alzheimer's disease at a late stage, urging for early intervention. However, existing statistical inference approaches of AD subtype identification ignore the pathological domain knowledge, which could lead to ill-posed results that are sometimes inconsistent with the essential neurological principles. Integrating systems biology modeling with machine learning, we propose a novel pathology steered stratification network (PSSN) that incorporates established domain knowledge in AD pathology through a reaction-diffusion model, where we consider non-linear interactions between major biomarkers and diffusion along brain structural network. Trained on longitudinal multimodal neuroimaging data, the biological model predicts long-term trajectories that capture individual progression pattern, filling in the gaps between sparse imaging data available. A deep predictive neural network is then built to exploit spatiotemporal dynamics, link neurological examinations with clinical profiles, and generate subtype assignment probability on an individual basis. We further identify an evolutionary disease graph to quantify subtype transition probabilities through extensive simulations. Our stratification achieves superior performance in both inter-cluster heterogeneity and intra-cluster homogeneity of various clinical scores. Applying our approach to enriched samples of aging populations, we identify six subtypes spanning AD spectrum, where each subtype exhibits a distinctive biomarker pattern that is consistent with its clinical outcome. PSSN provides insights into pre-symptomatic diagnosis and practical guidance on clinical treatments, which may be further generalized to other neurodegenerative diseases

    Adaptive TTL-Based Caching for Content Delivery

    Full text link
    Content Delivery Networks (CDNs) deliver a majority of the user-requested content on the Internet, including web pages, videos, and software downloads. A CDN server caches and serves the content requested by users. Designing caching algorithms that automatically adapt to the heterogeneity, burstiness, and non-stationary nature of real-world content requests is a major challenge and is the focus of our work. While there is much work on caching algorithms for stationary request traffic, the work on non-stationary request traffic is very limited. Consequently, most prior models are inaccurate for production CDN traffic that is non-stationary. We propose two TTL-based caching algorithms and provide provable guarantees for content request traffic that is bursty and non-stationary. The first algorithm called d-TTL dynamically adapts a TTL parameter using a stochastic approximation approach. Given a feasible target hit rate, we show that the hit rate of d-TTL converges to its target value for a general class of bursty traffic that allows Markov dependence over time and non-stationary arrivals. The second algorithm called f-TTL uses two caches, each with its own TTL. The first-level cache adaptively filters out non-stationary traffic, while the second-level cache stores frequently-accessed stationary traffic. Given feasible targets for both the hit rate and the expected cache size, f-TTL asymptotically achieves both targets. We implement d-TTL and f-TTL and evaluate both algorithms using an extensive nine-day trace consisting of 500 million requests from a production CDN server. We show that both d-TTL and f-TTL converge to their hit rate targets with an error of about 1.3%. But, f-TTL requires a significantly smaller cache size than d-TTL to achieve the same hit rate, since it effectively filters out the non-stationary traffic for rarely-accessed objects

    Dynamic treatment regimes: Technical challenges and applications

    Get PDF
    Dynamic treatment regimes are of growing interest across the clinical sciences because these regimes provide one way to operationalize and thus inform sequential personalized clinical decision making. Formally, a dynamic treatment regime is a sequence of decision rules, one per stage of clinical intervention. Each decision rule maps up-to-date patient information to a recommended treatment. We briefly review a variety of approaches for using data to construct the decision rules. We then review a critical inferential challenge that results from nonregularity, which often arises in this area. In particular, nonregularity arises in inference for parameters in the optimal dynamic treatment regime; the asymptotic, limiting, distribution of estimators are sensitive to local perturbations. We propose and evaluate a locally consistent Adaptive Confidence Interval (ACI) for the parameters of the optimal dynamic treatment regime. We use data from the Adaptive Pharmacological and Behavioral Treatments for Children with ADHD Trial as an illustrative example. We conclude by highlighting and discussing emerging theoretical problems in this area

    The Definitive Guide to Policy Gradients in Deep Reinforcement Learning: Theory, Algorithms and Implementations

    Full text link
    In recent years, various powerful policy gradient algorithms have been proposed in deep reinforcement learning. While all these algorithms build on the Policy Gradient Theorem, the specific design choices differ significantly across algorithms. We provide a holistic overview of on-policy policy gradient algorithms to facilitate the understanding of both their theoretical foundations and their practical implementations. In this overview, we include a detailed proof of the continuous version of the Policy Gradient Theorem, convergence results and a comprehensive discussion of practical algorithms. We compare the most prominent algorithms on continuous control environments and provide insights on the benefits of regularization. All code is available at https://github.com/Matt00n/PolicyGradientsJax
    corecore