24 research outputs found
On the Sample Complexity of Batch Reinforcement Learning with Policy-Induced Data
We study the fundamental question of the sample complexity of learning a good
policy in finite Markov decision processes (MDPs) when the data available for
learning is obtained by following a logging policy that must be chosen without
knowledge of the underlying MDP. Our main results show that the sample
complexity, the minimum number of transitions necessary and sufficient to
obtain a good policy, is an exponential function of the relevant quantities
when the planning horizon is finite. In particular, we prove that the
sample complexity of obtaining -optimal policies is at least
for -discounted
problems, where is the number of states, is the
number of actions, and is the effective horizon defined as ; and it is at least
for finite horizon
problems, where is the planning horizon of the problem. This lower bound is
essentially matched by an upper bound. For the average-reward setting we show
that there is no algorithm finding -optimal policies with a finite
amount of data.Comment: 26 pages, 2 figure
Provable Representation with Efficient Planning for Partial Observable Reinforcement Learning
In most real-world reinforcement learning applications, state information is
only partially observable, which breaks the Markov decision process assumption
and leads to inferior performance for algorithms that conflate observations
with state. Partially Observable Markov Decision Processes (POMDPs), on the
other hand, provide a general framework that allows for partial observability
to be accounted for in learning, exploration and planning, but presents
significant computational and statistical challenges. To address these
difficulties, we develop a representation-based perspective that leads to a
coherent framework and tractable algorithmic approach for practical
reinforcement learning from partial observations. We provide a theoretical
analysis for justifying the statistical efficiency of the proposed algorithm,
and also empirically demonstrate the proposed algorithm can surpass
state-of-the-art performance with partial observations across various
benchmarks, advancing reliable reinforcement learning towards more practical
applications.Comment: The first two authors contribute equall
The Effectiveness of Explicit Instruction of Certain Decoding Skills in Improving Chinese EFL Listeners’ General Comprehension Performance
Synthesis and photo-property of 2-cyano boron-dipyrromethene and the application for detecting fluoride ion
Effect of early goal-directed activity on gastrointestinal function recovery after pancreatic surgery
Objective·To investigate the safety and feasibility of early goal-directed mobilization in the recovery of gastrointestinal function after pancreaticoduodenectomy.Methods·The non-contemporaneous controlled studies were conducted. Subjects who underwent pancreaticoduodenectomy were included. From Sep 2022 to May 2023, forty patients were selected as the control group, and forty patients were selected from June 2023 to February 2024 as the experimental group. The general clinical data of the two groups were collected. The control group was treated with the nursing routine after pancreaticoduodenectomy, and there were no specific requirements for the time and goal of early activity. The experimental group had daily activity goals established for early mobilization, which were performed by the patients and their families, while the rest of their care was identical to that of the control group. The main index of effectiveness evaluation was the time of first flatus and first defecation, and the secondary indexes included the time of first getting out of bed, the time of oral drinking, the time of the gastric tube removal, and the postoperative levels of K+, Na+, and Cl- on the 3rd day of the postoperative period. Safety evaluations included chyle leak, postoperative pancreatic fistula, biliary leak and delayed gastric emptying, postoperative hemorrhage, unplanned reoperation, unplanned extubation, falls and death.Results·There was no statistically significant difference in the general clinical data of the patients in the 2 groups. After the implementation of early goal-directed mobilization, the time of first flatus was advanced from (3.95±1.68) d to (2.88±0.91) d (t=-3.560, P=0.001), and the time of first defecation was advanced from (4.90±1.61) d to (3.80±1.30) d (t=-3.352, P=0.001). The time of first getting out of bed was advanced from (5.18±1.77) d to (2.30±0.88) d (t=-9.205, P<0.001), and the time of oral drinking was advanced from (4.10±1.89) d to (2.73±1.20) d (t=-3.883, P<0.001). Significant differences were also observed in postoperative day 3 Na+ (t=-2.745, P=0.008) and Cl- (t=-2.033, P=0.045) levels.Conclusion·Early goal-directed activity programs are safe and effective in promoting the recovery of gastrointestinal function after pancreaticoduodenectomy
New insights into the role of system sealing capacity in shale evolution under conditions analogous to geology: Implications for nanopore evolution
New insights into the role of system sealing capacity in shale evolution under conditions analogous to geology: Implications for organic matter evolution and petroleum generation
Ore characteristics of the sandstone-type Daying uranium deposit in the Ordos Basin, northwestern China
The Ordos Basin is one of the top oil-, gas-, and coal-producing basins in China and is increasingly recognized as an important uranium mineralization province. Uranium deposits occur near the margin of the basin and are mainly hosted in the sandstones of the Jurassic Zhiluo Formation. The Daying uranium deposit in the Ordos Basin is one of the most important large sandstone-type uranium deposits in China. Based on thin section analysis and electron microprobe measurements, we used analytical chemical data to study the characteristics of the Daying uranium deposit, including the type, structure, particle size, material composition, chemical composition, form, and valence state of the uranium. The uranium mainly exists in three forms: an absorbed form, independent minerals, and uranium-bearing minerals. Most of the uranium in the ore is U4+, and the proportion of U6+ ranges from 18% to 55%, with an average of 33%. The proportion of U6+ is relatively high in the cores containing low-grade ore. This study provides a reference for determining the best smelting technology with which to further develop this deposit. </jats:p
