24 research outputs found

    On the Sample Complexity of Batch Reinforcement Learning with Policy-Induced Data

    Full text link
    We study the fundamental question of the sample complexity of learning a good policy in finite Markov decision processes (MDPs) when the data available for learning is obtained by following a logging policy that must be chosen without knowledge of the underlying MDP. Our main results show that the sample complexity, the minimum number of transitions necessary and sufficient to obtain a good policy, is an exponential function of the relevant quantities when the planning horizon HH is finite. In particular, we prove that the sample complexity of obtaining ϵ\epsilon-optimal policies is at least Ω(Amin(S1,H+1))\Omega(\mathrm{A}^{\min(\mathrm{S}-1, H+1)}) for γ\gamma-discounted problems, where S\mathrm{S} is the number of states, A\mathrm{A} is the number of actions, and HH is the effective horizon defined as H=ln(1/ϵ)ln(1/γ)H=\lfloor \tfrac{\ln(1/\epsilon)}{\ln(1/\gamma)} \rfloor; and it is at least Ω(Amin(S1,H)/ε2)\Omega(\mathrm{A}^{\min(\mathrm{S}-1, H)}/\varepsilon^2) for finite horizon problems, where HH is the planning horizon of the problem. This lower bound is essentially matched by an upper bound. For the average-reward setting we show that there is no algorithm finding ϵ\epsilon-optimal policies with a finite amount of data.Comment: 26 pages, 2 figure

    Provable Representation with Efficient Planning for Partial Observable Reinforcement Learning

    Full text link
    In most real-world reinforcement learning applications, state information is only partially observable, which breaks the Markov decision process assumption and leads to inferior performance for algorithms that conflate observations with state. Partially Observable Markov Decision Processes (POMDPs), on the other hand, provide a general framework that allows for partial observability to be accounted for in learning, exploration and planning, but presents significant computational and statistical challenges. To address these difficulties, we develop a representation-based perspective that leads to a coherent framework and tractable algorithmic approach for practical reinforcement learning from partial observations. We provide a theoretical analysis for justifying the statistical efficiency of the proposed algorithm, and also empirically demonstrate the proposed algorithm can surpass state-of-the-art performance with partial observations across various benchmarks, advancing reliable reinforcement learning towards more practical applications.Comment: The first two authors contribute equall

    Genre Analysis and Comparison of Chinese and American Presidents’ New Year Messages

    Full text link

    Reform on English Speech Course in China with the Guidance of Educational Psychology

    No full text

    Effect of early goal-directed activity on gastrointestinal function recovery after pancreatic surgery

    No full text
    Objective·To investigate the safety and feasibility of early goal-directed mobilization in the recovery of gastrointestinal function after pancreaticoduodenectomy.Methods·The non-contemporaneous controlled studies were conducted. Subjects who underwent pancreaticoduodenectomy were included. From Sep 2022 to May 2023, forty patients were selected as the control group, and forty patients were selected from June 2023 to February 2024 as the experimental group. The general clinical data of the two groups were collected. The control group was treated with the nursing routine after pancreaticoduodenectomy, and there were no specific requirements for the time and goal of early activity. The experimental group had daily activity goals established for early mobilization, which were performed by the patients and their families, while the rest of their care was identical to that of the control group. The main index of effectiveness evaluation was the time of first flatus and first defecation, and the secondary indexes included the time of first getting out of bed, the time of oral drinking, the time of the gastric tube removal, and the postoperative levels of K+, Na+, and Cl- on the 3rd day of the postoperative period. Safety evaluations included chyle leak, postoperative pancreatic fistula, biliary leak and delayed gastric emptying, postoperative hemorrhage, unplanned reoperation, unplanned extubation, falls and death.Results·There was no statistically significant difference in the general clinical data of the patients in the 2 groups. After the implementation of early goal-directed mobilization, the time of first flatus was advanced from (3.95±1.68) d to (2.88±0.91) d (t=-3.560, P=0.001), and the time of first defecation was advanced from (4.90±1.61) d to (3.80±1.30) d (t=-3.352, P=0.001). The time of first getting out of bed was advanced from (5.18±1.77) d to (2.30±0.88) d (t=-9.205, P<0.001), and the time of oral drinking was advanced from (4.10±1.89) d to (2.73±1.20) d (t=-3.883, P<0.001). Significant differences were also observed in postoperative day 3 Na+ (t=-2.745, P=0.008) and Cl- (t=-2.033, P=0.045) levels.Conclusion·Early goal-directed activity programs are safe and effective in promoting the recovery of gastrointestinal function after pancreaticoduodenectomy

    Ore characteristics of the sandstone-type Daying uranium deposit in the Ordos Basin, northwestern China

    Full text link
    The Ordos Basin is one of the top oil-, gas-, and coal-producing basins in China and is increasingly recognized as an important uranium mineralization province. Uranium deposits occur near the margin of the basin and are mainly hosted in the sandstones of the Jurassic Zhiluo Formation. The Daying uranium deposit in the Ordos Basin is one of the most important large sandstone-type uranium deposits in China. Based on thin section analysis and electron microprobe measurements, we used analytical chemical data to study the characteristics of the Daying uranium deposit, including the type, structure, particle size, material composition, chemical composition, form, and valence state of the uranium. The uranium mainly exists in three forms: an absorbed form, independent minerals, and uranium-bearing minerals. Most of the uranium in the ore is U4+, and the proportion of U6+ ranges from 18% to 55%, with an average of 33%. The proportion of U6+ is relatively high in the cores containing low-grade ore. This study provides a reference for determining the best smelting technology with which to further develop this deposit. </jats:p
    corecore