49 research outputs found
Sure Screening for Gaussian Graphical Models
We propose {graphical sure screening}, or GRASS, a very simple and
computationally-efficient screening procedure for recovering the structure of a
Gaussian graphical model in the high-dimensional setting. The GRASS estimate of
the conditional dependence graph is obtained by thresholding the elements of
the sample covariance matrix. The proposed approach possesses the sure
screening property: with very high probability, the GRASS estimated edge set
contains the true edge set. Furthermore, with high probability, the size of the
estimated edge set is controlled. We provide a choice of threshold for GRASS
that can control the expected false positive rate. We illustrate the
performance of GRASS in a simulation study and on a gene expression data set,
and show that in practice it performs quite competitively with more complex and
computationally-demanding techniques for graph estimation
Conformal off-policy prediction
Off-policy evaluation is critical in a number of applications where new policies need to be evaluated offline before online deployment. Most existing methods focus on the expected return, define the target parameter through averaging and provide a point estimator only. In this paper, we develop a novel procedure to produce reliable interval estimators for a target policy’s return starting from any initial state. Our proposal accounts for the variability of the return around its expectation, focuses on the individual effect and offers valid uncertainty quantification. Our main idea lies in designing a pseudo policy that generates subsamples as if they were sampled from the target policy so that existing conformal prediction algorithms are applicable to prediction interval construction. Our methods are justified by theories, synthetic data and real data from short-video platforms
Robust Offline Policy Evaluation and Optimization with Heavy-Tailed Rewards
This paper endeavors to augment the robustness of offline reinforcement
learning (RL) in scenarios laden with heavy-tailed rewards, a prevalent
circumstance in real-world applications. We propose two algorithmic frameworks,
ROAM and ROOM, for robust off-policy evaluation (OPE) and offline policy
optimization (OPO), respectively. Central to our frameworks is the strategic
incorporation of the median-of-means method with offline RL, enabling
straightforward uncertainty estimation for the value function estimator. This
not only adheres to the principle of pessimism in OPO but also adeptly manages
heavy-tailed rewards. Theoretical results and extensive experiments demonstrate
that our two frameworks outperform existing methods on the logged dataset
exhibits heavy-tailed reward distributions
Robust offline reinforcement learning with heavy-tailed rewards
This paper endeavors to augment the robustness of offline reinforcement learning (RL) in scenarios laden with heavy-tailed rewards, a prevalent circumstance in real-world applications. We propose two algorithmic frameworks, ROAM and ROOM, for robust off-policy evaluation and offline policy optimization (OPO), respectively. Central to our frameworks is the strategic incorporation of the median-of-means method with offline RL, enabling straightforward uncertainty estimation for the value function estimator. This not only adheres to the principle of pessimism in OPO but also adeptly manages heavytailed rewards. Theoretical results and extensive experiments demonstrate that our two frameworks outperform existing methods on the logged dataset exhibits heavytailed reward distributions. The implementation of the proposal is available at https: //github.com/Mamba413/ROOM
Pattern transfer learning for reinforcement learning in order dispatching
Order dispatch is one of the central problems to ridesharing platforms. Recently, value-based reinforcement learning algorithms have shown promising performance to solve this task. However, in real-world applications, the demand-supply system is typically nonstationary over time, posing challenges to reutilizing data generated in different time periods to learn the value function. In this work, motivated by the fact that the relative relationship between the values of some states is largely stable across various environments, we propose a pattern transfer learning framework for value-based reinforcement learning in the order dispatch problem. Our method efficiently captures the value patterns by incorporating a concordance penalty. The superior performance of the proposed method is supported by experiments
A Reinforcement Learning Framework for Time-Dependent Causal Effects Evaluation in A/B Testing
A/B testing, or online experiment is a standard business strategy to compare
a new product with an old one in pharmaceutical, technological, and traditional
industries. Major challenges arise in online experiments where there is only
one unit that receives a sequence of treatments over time. In those
experiments, the treatment at a given time impacts current outcome as well as
future outcomes. The aim of this paper is to introduce a reinforcement learning
framework for carrying A/B testing, while characterizing the long-term
treatment effects. Our proposed testing procedure allows for sequential
monitoring and online updating, so it is generally applicable to a variety of
treatment designs in different industries. In addition, we systematically
investigate the theoretical properties (e.g., asymptotic distribution and
power) of our testing procedure. Finally, we apply our framework to both
synthetic datasets and a real-world data example obtained from a ride-sharing
company to illustrate its usefulness
DNet: distributional network for distributional individualized treatment effects
There is a growing interest in developing methods to estimate individualized treatment effects (ITEs) for various real-world applications, such as e-commerce and public health. This paper presents a novel architecture, called DNet, to infer distributional ITEs. DNet can learn the entire outcome distribution for each treatment, whereas most existing methods primarily focus on the conditional average treatment effect and ignore the conditional variance around its expectation. Additionally, our method excels in settings with heavy-tailed outcomes and outperforms state-of-the-art methods in extensive experiments on benchmark and real-world datasets. DNet has also been successfully deployed in a widely used mobile app with millions of daily active users
Dynamic causal effects evaluation in A/B testing with a reinforcement learning framework
A/B testing, or online experiment is a standard business strategy to compare a new product with an old one in pharmaceutical, technological, and traditional industries. Major challenges arise in online experiments of two-sided marketplace platforms (e.g., Uber) where there is only one unit that receives a sequence of treatments over time. In those experiments, the treatment at a given time impacts current outcome as well as future outcomes. The aim of this article is to introduce a reinforcement learning framework for carrying A/B testing in these experiments, while characterizing the long-term treatment effects. Our proposed testing procedure allows for sequential monitoring and online updating. It is generally applicable to a variety of treatment designs in different industries. In addition, we systematically investigate the theoretical properties (e.g., size and power) of our testing procedure. Finally, we apply our framework to both simulated data and a real-world data example obtained from a technological company to illustrate its advantage over the current practice. A Python implementation of our test is available at https://github.com/callmespring/CausalRL. Supplementary materials for this article are available online
Compositionally Complex Perovskite Oxides as a New Class of Li-Ion Solid Electrolytes
Compositionally complex ceramics (CCCs), including high-entropy ceramics
(HECs) as a subclass, offer new opportunities of materials discovery beyond the
traditional methodology of searching new stoichiometric compounds. Herein, we
establish new strategies of tailoring CCCs via a seamless combination of (1)
non-equimolar compositional designs and (2) controlling microstructures and
interfaces. Using oxide solid electrolytes for all-solid-state batteries as an
exemplar, we validate these new strategies via discovering a new class of
compositionally complex perovskite oxides (CCPOs) to show the possibility of
improving ionic conductivities beyond the limit of conventional doping. As an
example (amongst the 28 CCPOs examined), we demonstrate that the ionic
conductivity can be improved by >60% in
(Li0.375Sr0.4375)(Ta0.375Nb0.375Zr0.125Hf0.125)O3-{\delta}, in comparison with
the state-of-art (Li0.375Sr0.4375)(Ta0.75Zr0.25)O3-{\delta} (LSTZ) baseline,
via maintaining comparable electrochemical stability. Furthermore, the ionic
conductivity can be improved by another >70% via grain boundary (GB)
engineering, achieving >270% of the LSTZ baseline. This work suggests
transformative new strategies for designing and tailoring HECs and CCCs,
thereby opening a new window for discovering materials for energy storage and
many other applications
Genetic analysis and population structure of wild and cultivated wishbone flower (Torenia fournieri Lind.) lines related to specific floral color
Background The wishbone flower or Torenia fournieri Lind., an annual from tropical Indochina and southern China, is a popular ornamental plant, and many interspecific (T. fournieri × T. concolor) hybrid lines have been bred for the international market. The cultivated lines show a pattern of genetic similarity that correlates with floral color which informs on future breeding strategies. This study aimed to perform genetic analysis and population structure of cultivated hybrid lines comparing with closely related T. concolor wild populations. Methods We applied the retrotransposon based iPBS marker system for genotyping of a total of 136 accessions from 17 lines/populations of Torenia. These included 15 cultivated lines of three series: Duchess (A, B, C); Kauai (D, E, F, G, H, I, J); Little Kiss (K, L, M, N, P) and two wild T. concolor populations (Q and R). PCR products from each individual were applied to estimate the genetic diversity and differentiation between lines/populations. Results Genotyping results showed a pattern of genetic variation differentiating the 17 lines/populations characterized by their specific floral colors. The final PCoA analysis, phylogenetic tree construction, and Bayesian population structural bar plot all showed a clear subdivision of lines/populations analysed. The 15 cultivated hybrid lines and the wild population Q that collected from a small area showed the lowest genetic variability while the other wild population R which sampled from a larger area had the highest genetic variability. Discussion The extremely low genetic variability of 15 cultivated lines indicated that individual line has similar reduction in diversity/heterozygosity from a bottleneck event, and each retained a similar (but different from each other) content of the wild genetic diversity. The genetic variance for the two wild T. concolor populations could be due to our varied sampling methods. The two wild populations (Q, R) and the cultivated hybrid lines (I, K, M, N, P) are genetically more closely related, but strong positive correlations presented in cultivated lines A, C, E, M, and N. These results could be used to guide future Torenia breeding. Conclusions The genetic variation and population structure found in our study showed that cultivated hybrid lines had similar reduction in diversity/heterozygosity from a bottleneck event and each line retained a similar (but different from each other) content of the wild genetic diversity, especially when strong phenotypic selection of floral color overlaps. Generally, environmental factors could induce transposon activation and generate genetic variability which enabled the acceleration of the evolutionary process of wild Torenia species. Our study revealed that wild Torenia populations sampled from broad geographic region represent stronger species strength with outstanding genetic diversity, but selective breeding targeting a specific floral color decreased such genetic variability