312 research outputs found

    Reliable Off-policy Evaluation for Reinforcement Learning

    Full text link
    In a sequential decision-making problem, off-policy evaluation estimates the expected cumulative reward of a target policy using logged trajectory data generated from a different behavior policy, without execution of the target policy. Reinforcement learning in high-stake environments, such as healthcare and education, is often limited to off-policy settings due to safety or ethical concerns, or inability of exploration. Hence it is imperative to quantify the uncertainty of the off-policy estimate before deployment of the target policy. In this paper, we propose a novel framework that provides robust and optimistic cumulative reward estimates using one or multiple logged trajectories data. Leveraging methodologies from distributionally robust optimization, we show that with proper selection of the size of the distributional uncertainty set, these estimates serve as confidence bounds with non-asymptotic and asymptotic guarantees under stochastic or adversarial environments. Our results are also generalized to batch reinforcement learning and are supported by empirical analysis.Comment: 39 pages, 4 figure

    Construction of Barrier in a Fishing Game With Point Capture

    Get PDF
    This paper addresses a particular pursuit-evasion game, called as “fishing game” where a faster evader attempts to pass the gap between two pursuers. We are concerned with the conditions under which the evader or pursuers can win the game. This is a game of kind in which an essential aspect, barrier, separates the state space into disjoint parts associated with each player's winning region. We present a method of explicit policy to construct the barrier. This method divides the fishing game into two subgames related to the included angle and the relative distances between the evader and the pursuers, respectively, and then analyzes the possibility of capture or escape for each subgame to ascertain the analytical forms of the barrier. Furthermore, we fuse the games of kind and degree by solving the optimal control strategies in the minimum time for each player when the initial state lies in their winning regions. Along with the optimal strategies, the trajectories of the players are delineated and the upper bounds of their winning times are also derived

    Enhancement by Your Aesthetic: An Intelligible Unsupervised Personalized Enhancer for Low-Light Images

    Full text link
    Low-light image enhancement is an inherently subjective process whose targets vary with the user's aesthetic. Motivated by this, several personalized enhancement methods have been investigated. However, the enhancement process based on user preferences in these techniques is invisible, i.e., a "black box". In this work, we propose an intelligible unsupervised personalized enhancer (iUPEnhancer) for low-light images, which establishes the correlations between the low-light and the unpaired reference images with regard to three user-friendly attributions (brightness, chromaticity, and noise). The proposed iUP-Enhancer is trained with the guidance of these correlations and the corresponding unsupervised loss functions. Rather than a "black box" process, our iUP-Enhancer presents an intelligible enhancement process with the above attributions. Extensive experiments demonstrate that the proposed algorithm produces competitive qualitative and quantitative results while maintaining excellent flexibility and scalability. This can be validated by personalization with single/multiple references, cross-attribution references, or merely adjusting parameters.Comment: Accepted to ACM MM 202

    Multi-player pursuit–evasion games with one superior evader

    Get PDF
    Inspired by the hunting and foraging behaviors of group predators, this paper addresses a class of multi-player pursuit–evasion games with one superior evader, who moves faster than the pursuers. We are concerned with the conditions under which the pursuers can capture the evader, involving the minimum number and initial spatial distribution required as well as the cooperative strategies of the pursuers. We present some necessary or sufficient conditions to regularize the encirclement formed by the pursuers to the evader. Then we provide a cooperative scheme for the pursuers to maintain and shrink the encirclement until the evader is captured. Finally, we give some examples to illustrate the theoretical results

    High-Temperature Polyimide Dielectric Materials for Energy Storage

    Get PDF
    The availability of high-temperature dielectrics is key to develop advanced electronics and power systems that operate under extreme environmental conditions. In the past few years, many improvements have been made and many exciting developments have taken place. However, currently available candidate materials and methods still do not meet the applicable standards. Polyimide (PI) was found to be the preferred choice for high-temperature dielectric films development due to its thermal stability, dielectric properties, and flexibility. However, it has disadvantages such as a relatively low dielectric permittivity. This chapter presents an overview of recent progress on PI dielectric materials for high-temperature capacitive energy storage applications. In this way, a new molecular design of the skeleton structure of PI should be performed to balance size and thermal stability and to optimize energy storage property for high-temperature application. The improved performance can be generated via incorporation of inorganic units into polymers to form organic-inorganic hybrid and composite structures

    9a-Hy­droxy-3,8a-dimethyl-5-methyl­ene-4,4a,5,6,9,9a-hexa­hydro­naphtho­[2,3-b]furan-2(8aH)-one

    Get PDF
    The title compound, C15H18O3, was isolated from Lacta­rius piperatus (Fr.) S. F. Gary collected from the Kunming area in Yunnan province, China. The central cyclo­hexyl ring adopts a chair conformation, while the furan­one ring is close to planar (r.m.s. deviation = 0.0174 Å). The remaining methyl­ene cyclo­hexene ring has a flattened chair conformation. In the crystal, mol­ecules are linked via inter­molecular O—H⋯O and C—H⋯O hydrogen bonds into zigzag chains along the a axis

    Control of Intestinal Inflammation, Colitis-Associated Tumorigenesis, and Macrophage Polarization by Fibrinogen-Like Protein 2

    Get PDF
    Fibrinogen-like protein 2 (Fgl2) is critical for immune regulation in the inflammatory state. Elevated Fgl2 levels are observed in patients with inflammatory bowel disease (IBD), but little is known about its functional significance. In this study, we sought to investigate the role of Fgl2 in the development of intestinal inflammation and colitis-associated colorectal cancer (CAC). Here, we report that Fgl2 deficiency increased susceptibility to dextran sodium sulfate-induced colitis and CAC in a mouse model. During colitis development, the expression of the membrane-bound and secreted forms of Fgl2 (mFgl2 and sFgl2, respectively) in the colon were increased and predominantly expressed by colonic macrophages. In addition, using bone marrow chimeric mice, we determined that Fgl2 function in colitis is strictly related to its expression in the hematopoietic cells. Loss of Fgl2 induced the polarization of M1, but suppressed that of M2 both in vivo and in vitro, independent of intestinal inflammation. Thus, Fgl2 suppresses intestinal inflammation and CAC development through its role in macrophage polarization and may serve as a therapeutic target in inflammatory diseases, including IBD

    Macrocyclic colibactin induces DNA double-strand breaks via copper-mediated oxidative cleavage.

    Get PDF
    Colibactin is an assumed human gut bacterial genotoxin, whose biosynthesis is linked to the clb genomic island that has a widespread distribution in pathogenic and commensal human enterobacteria. Colibactin-producing gut microbes promote colon tumour formation and enhance the progression of colorectal cancer via cellular senescence and death induced by DNA double-strand breaks (DSBs); however, the chemical basis that contributes to the pathogenesis at the molecular level has not been fully characterized. Here, we report the discovery of colibactin-645, a macrocyclic colibactin metabolite that recapitulates the previously assumed genotoxicity and cytotoxicity. Colibactin-645 shows strong DNA DSB activity in vitro and in human cell cultures via a unique copper-mediated oxidative mechanism. We also delineate a complete biosynthetic model for colibactin-645, which highlights a unique fate of the aminomalonate-building monomer in forming the C-terminal 5-hydroxy-4-oxazolecarboxylic acid moiety through the activities of both the polyketide synthase ClbO and the amidase ClbL. This work thus provides a molecular basis for colibactin's DNA DSB activity and facilitates further mechanistic study of colibactin-related colorectal cancer incidence and prevention
    corecore