39 research outputs found

    A robust test for the stationarity assumption in sequential decision making

    Get PDF
    Reinforcement learning (RL) is a powerful technique that allows an autonomous agent to learn an optimal policy to maximize the expected return. The optimality of various RL algorithms relies on the stationarity assumption, which requires time-invariant state transition and reward functions. However, deviations from stationarity over extended periods often occur in real-world applications like robotics control, health care and digital marketing, resulting in suboptimal policies learned under stationary assumptions. In this paper, we propose a model-based doubly robust procedure for testing the stationarity assumption and detecting change points in offline RL settings with certain degree of homogeneity. Our proposed testing procedure is robust to model misspecifications and can effectively control type-I error while achieving high statistical power, especially in high-dimensional settings. Extensive comparative simulations and a real-world interventional mobile health example illustrate the advantages of our method in detecting change points and optimizing long-term rewards in high-dimensional, non-stationary environments

    A reinforcement learning framework for dynamic mediation analysis

    Get PDF
    Mediation analysis learns the causal effect transmitted via mediator variables between treatments and outcomes, and receives increasing attention in various scientific domains to elucidate causal relations. Most existing works focus on pointexposure studies where each subject only receives one treatment at a single time point. However, there are a number of applications (e.g., mobile health) where the treatments are sequentially assigned over time and the dynamic mediation effects are of primary interest. Proposing a reinforcement learning (RL) framework, we are the first to evaluate dynamic mediation effects in settings with infinite horizons. We decompose the average treatment effect into an immediate direct effect, an immediate mediation effect, a delayed direct effect, and a delayed mediation effect. Upon the identification of each effect component, we further develop robust and semi-parametrically efficient estimators under the RL framework to infer these causal effects. The superior performance of the proposed method is demonstrated through extensive numerical studies, theoretical results, and an analysis of a mobile health dataset. A Python implementation of the proposed procedure is available at https://github.com/linlinlin97/MediationRL

    A Reinforcement Learning Framework for Dynamic Mediation Analysis

    Full text link
    Mediation analysis learns the causal effect transmitted via mediator variables between treatments and outcomes and receives increasing attention in various scientific domains to elucidate causal relations. Most existing works focus on point-exposure studies where each subject only receives one treatment at a single time point. However, there are a number of applications (e.g., mobile health) where the treatments are sequentially assigned over time and the dynamic mediation effects are of primary interest. Proposing a reinforcement learning (RL) framework, we are the first to evaluate dynamic mediation effects in settings with infinite horizons. We decompose the average treatment effect into an immediate direct effect, an immediate mediation effect, a delayed direct effect, and a delayed mediation effect. Upon the identification of each effect component, we further develop robust and semi-parametrically efficient estimators under the RL framework to infer these causal effects. The superior performance of the proposed method is demonstrated through extensive numerical studies, theoretical results, and an analysis of a mobile health dataset

    Multivariate Dynamic Mediation Analysis under a Reinforcement Learning Framework

    Full text link
    Mediation analysis is an important analytic tool commonly used in a broad range of scientific applications. In this article, we study the problem of mediation analysis when there are multivariate and conditionally dependent mediators, and when the variables are observed over multiple time points. The problem is challenging, because the effect of a mediator involves not only the path from the treatment to this mediator itself at the current time point, but also all possible paths pointed to this mediator from its upstream mediators, as well as the carryover effects from all previous time points. We propose a novel multivariate dynamic mediation analysis approach. Drawing inspiration from the Markov decision process model that is frequently employed in reinforcement learning, we introduce a Markov mediation process paired with a system of time-varying linear structural equation models to formulate the problem. We then formally define the individual mediation effect, built upon the idea of simultaneous interventions and intervention calculus. We next derive the closed-form expression and propose an iterative estimation procedure under the Markov mediation process model. We study both the asymptotic property and the empirical performance of the proposed estimator, and further illustrate our method with a mobile health application

    DNet: distributional network for distributional individualized treatment effects

    Get PDF
    There is a growing interest in developing methods to estimate individualized treatment effects (ITEs) for various real-world applications, such as e-commerce and public health. This paper presents a novel architecture, called DNet, to infer distributional ITEs. DNet can learn the entire outcome distribution for each treatment, whereas most existing methods primarily focus on the conditional average treatment effect and ignore the conditional variance around its expectation. Additionally, our method excels in settings with heavy-tailed outcomes and outperforms state-of-the-art methods in extensive experiments on benchmark and real-world datasets. DNet has also been successfully deployed in a widely used mobile app with millions of daily active users

    Water entry of slender segmented projectile connected by spring

    Get PDF
    An object that enters the water experiences a large impact acceleration at the initial stage of water entry, which can cause structural damage to objects that are dropped or launched into the water. To reduce the peak impact acceleration, a spring-connected segmented projectile with compressible nose was designed. Through inertial measurement unit and high-speed camera, the influence of the nose compressibility on the initial impact acceleration was qualitatively investigated. The experimental results demonstrate that the introduction of a spring between the nose and the main body of the projectile can significantly suppresses the peak acceleration during the early stage of impact (0–50 ms). Furthermore, the maximum impact acceleration experienced by the main body is only related to the maximum compression of the nose without considering the spring stiffness. In addition, using the spring exerts a slight effect on the non-dimensional pinch-off times of the cavity but increases the initial velocity required for the occurrence of cavity pinch-off events on the side of the main bod

    Dynamics and hydrodynamic efficiency of diving beetle while swimming

    Get PDF
    Diving beetle, an excellent biological prototype for bionic underwater vehicles, can achieve forward swimming, backward swimming, and flexible cornering by swinging its two powerful hind legs. An in-depth study of the propulsion performance of them will contribute to the micro underwater vehicles. In this paper, the kinematic and dynamic parameters, and the hydrodynamic efficiency of the diving beetle are studied by analysis of swimming videos using Motion Capture Technology, combined with CFD simulations. The results show that the hind legs of diving beetle can achieve high propulsion force and low return resistance during one propulsion cycle at both forward and backward swimming modes. The propulsion efficiencies of forward and backward swimming are 0.47 and 0.30, respectively. Although the efficiency of backward swimming is lower, the diving beetle can reach a higher speed in a short time at this mode, which can help it avoid natural enemies. At backward swimming mode, there is a long period of passive swing of hind legs, larger drag exists at higher speed during the recovery stroke, which reduces the propulsion efficiency to a certain extent. Reasonable planning of the swing speed of the hind legs during the power stroke and the recovery stroke can obtain the highest propulsion efficiency of this propulsion method. This work will be useful for the development of a bionic propulsion system of micro underwater vehicle

    Effects of eigen and actual frequencies of soft elastic surfaces on droplet rebound from stationary flexible feather vanes

    Get PDF
    The aim of this paper is to investigate the effect of eigenfrequency and the actual frequency of the elastic surface for the droplet rebound. The elastic surface used in this study is the stationary flexible feather vanes. A fluid-structure interaction (FSI) numerical model is proposed to predict the phenomenon, and later is validated by the experimental that the droplets impact the stationary flexible feather vanes. The effect of mass and stiffness of the surface is analysed. First, the suitable combination of mass and stiffness of the surface will enhance the drop rebound. Second, a small mass system with higher eigenfrequency will decrease the minimum contact time. In the last, the actual frequencies of the elastic surface, approximate at 75 Hz, can accelerate the drop rebound for all cases

    Two-Photon Rabi Splitting in a Coupled System of a Nanocavity and Exciton Complexes

    Get PDF
    Two-photon Rabi splitting in a cavity-dot system provides a basis for multi-qubit coherent control in quantum photonic network. Here we report on two-photon Rabi splitting in a strongly coupled cavity-dot system. The quantum dot was grown intentionally large in size for large oscillation strength and small biexciton binding energy. Both exciton and biexciton transitions couple to a high quality factor photonic crystal cavity with large coupling strengths over 130 μ\mueV. Furthermore, the small binding energy enables the cavity to simultaneously couple with two exciton states. Thereby two-photon Rabi splitting between biexciton and cavity is achieved, which can be well reproduced by theoretical calculations with quantum master equations.Comment: 12 pages, 4 figure

    Titanium Nitride Film on Sapphire Substrate with Low Dielectric Loss for Superconducting Qubits

    Full text link
    Dielectric loss is one of the major decoherence sources of superconducting qubits. Contemporary high-coherence superconducting qubits are formed by material systems mostly consisting of superconducting films on substrate with low dielectric loss, where the loss mainly originates from the surfaces and interfaces. Among the multiple candidates for material systems, a combination of titanium nitride (TiN) film and sapphire substrate has good potential because of its chemical stability against oxidization, and high quality at interfaces. In this work, we report a TiN film deposited onto sapphire substrate achieving low dielectric loss at the material interface. Through the systematic characterizations of a series of transmon qubits fabricated with identical batches of TiN base layers, but different geometries of qubit shunting capacitors with various participation ratios of the material interface, we quantitatively extract the loss tangent value at the substrate-metal interface smaller than 8.9×10−48.9 \times 10^{-4} in 1-nm disordered layer. By optimizing the interface participation ratio of the transmon qubit, we reproducibly achieve qubit lifetimes of up to 300 μ\mus and quality factors approaching 8 million. We demonstrate that TiN film on sapphire substrate is an ideal material system for high-coherence superconducting qubits. Our analyses further suggest that the interface dielectric loss around the Josephson junction part of the circuit could be the dominant limitation of lifetimes for state-of-the-art transmon qubits
    corecore