Search CORE

24 research outputs found

On the Sample Complexity of Batch Reinforcement Learning with Policy-Induced Data

Author: Dai Bo
Lee Ilbin
Schuurmans Dale
Szepesvari Csaba
Xiao Chenjun
Publication venue
Publication date: 18/06/2021
Field of study

We study the fundamental question of the sample complexity of learning a good policy in finite Markov decision processes (MDPs) when the data available for learning is obtained by following a logging policy that must be chosen without knowledge of the underlying MDP. Our main results show that the sample complexity, the minimum number of transitions necessary and sufficient to obtain a good policy, is an exponential function of the relevant quantities when the planning horizon

H

is finite. In particular, we prove that the sample complexity of obtaining

\epsilon

-optimal policies is at least

\Omega(\mathrm{A}^{\min(\mathrm{S}-1, H+1)})

for

\gamma

-discounted problems, where

\mathrm{S}

is the number of states,

\mathrm{A}

is the number of actions, and

H

is the effective horizon defined as

H=\lfloor \tfrac{\ln(1/\epsilon)}{\ln(1/\gamma)} \rfloor

; and it is at least

\Omega(\mathrm{A}^{\min(\mathrm{S}-1, H)}/\varepsilon^2)

for finite horizon problems, where

H

is the planning horizon of the problem. This lower bound is essentially matched by an upper bound. For the average-reward setting we show that there is no algorithm finding

\epsilon

-optimal policies with a finite amount of data.Comment: 26 pages, 2 figure

arXiv.org e-Print Archive

Provable Representation with Efficient Planning for Partial Observable Reinforcement Learning

Author: Dai Bo
Ren Tongzheng
Schuurmans Dale
Xiao Chenjun
Zhang Hongming
Publication venue
Publication date: 10/06/2024
Field of study

In most real-world reinforcement learning applications, state information is only partially observable, which breaks the Markov decision process assumption and leads to inferior performance for algorithms that conflate observations with state. Partially Observable Markov Decision Processes (POMDPs), on the other hand, provide a general framework that allows for partial observability to be accounted for in learning, exploration and planning, but presents significant computational and statistical challenges. To address these difficulties, we develop a representation-based perspective that leads to a coherent framework and tractable algorithmic approach for practical reinforcement learning from partial observations. We provide a theoretical analysis for justifying the statistical efficiency of the proposed algorithm, and also empirically demonstrate the proposed algorithm can surpass state-of-the-art performance with partial observations across various benchmarks, advancing reliable reinforcement learning towards more practical applications.Comment: The first two authors contribute equall

arXiv.org e-Print Archive

The Effectiveness of Explicit Instruction of Certain Decoding Skills in Improving Chinese EFL Listeners’ General Comprehension Performance

Author: Chenjun Dai
Li Liu
Publication venue: Walter de Gruyter GmbH
Publication date: 01/01/2012
Field of study

Crossref

Genre Analysis and Comparison of Chinese and American Presidents’ New Year Messages

Author: Chenjun Dai
Yu Fan
Publication venue: Walter de Gruyter GmbH
Publication date: 01/01/2012
Field of study

Crossref

Reform on English Speech Course in China with the Guidance of Educational Psychology

Author: Chenjun Dai
Li Liu
Publication venue: St. Plum-Blossom Press, Pty , Ltd.
Publication date: 01/03/2011
Field of study

Crossref

Synthesis and photo-property of 2-cyano boron-dipyrromethene and the application for detecting fluoride ion

Author: Chenjun Wu
Fengyuan Dai
Jianbo Wang
Jinjin Shen
Qianshou Zong
Qingqing Wu
Publication venue: Elsevier BV
Publication date: 01/12/2015
Field of study

Crossref

Effect of early goal-directed activity on gastrointestinal function recovery after pancreatic surgery

Author: DAI Chenjun
DUAN Xiaolei
GAO Wenqing
YANG Fu
YAO Hui
YAO Wenjie
ZHANG Yun
Publication venue: Editorial Office of Journal of Shanghai Jiao Tong University (Medical Science)
Publication date: 01/10/2024
Field of study

Objective·To investigate the safety and feasibility of early goal-directed mobilization in the recovery of gastrointestinal function after pancreaticoduodenectomy.Methods·The non-contemporaneous controlled studies were conducted. Subjects who underwent pancreaticoduodenectomy were included. From Sep 2022 to May 2023, forty patients were selected as the control group, and forty patients were selected from June 2023 to February 2024 as the experimental group. The general clinical data of the two groups were collected. The control group was treated with the nursing routine after pancreaticoduodenectomy, and there were no specific requirements for the time and goal of early activity. The experimental group had daily activity goals established for early mobilization, which were performed by the patients and their families, while the rest of their care was identical to that of the control group. The main index of effectiveness evaluation was the time of first flatus and first defecation, and the secondary indexes included the time of first getting out of bed, the time of oral drinking, the time of the gastric tube removal, and the postoperative levels of K+, Na+, and Cl- on the 3rd day of the postoperative period. Safety evaluations included chyle leak, postoperative pancreatic fistula, biliary leak and delayed gastric emptying, postoperative hemorrhage, unplanned reoperation, unplanned extubation, falls and death.Results·There was no statistically significant difference in the general clinical data of the patients in the 2 groups. After the implementation of early goal-directed mobilization, the time of first flatus was advanced from (3.95±1.68) d to (2.88±0.91) d (t=-3.560, P=0.001), and the time of first defecation was advanced from (4.90±1.61) d to (3.80±1.30) d (t=-3.352, P=0.001). The time of first getting out of bed was advanced from (5.18±1.77) d to (2.30±0.88) d (t=-9.205, P<0.001), and the time of oral drinking was advanced from (4.10±1.89) d to (2.73±1.20) d (t=-3.883, P<0.001). Significant differences were also observed in postoperative day 3 Na+ (t=-2.745, P=0.008) and Cl- (t=-2.033, P=0.045) levels.Conclusion·Early goal-directed activity programs are safe and effective in promoting the recovery of gastrointestinal function after pancreaticoduodenectomy

Directory of Open Access Journals

New insights into the role of system sealing capacity in shale evolution under conditions analogous to geology: Implications for nanopore evolution

Author: Chenjun Wu
Dongjun Song
Jincai Tuo
Lina Sun
Long Su
Mingfeng Zhang
Shuang Dai
Publication venue: Elsevier BV
Publication date: 01/09/2022
Field of study

Crossref

New insights into the role of system sealing capacity in shale evolution under conditions analogous to geology: Implications for organic matter evolution and petroleum generation

Author: Chenjun Wu
Dongjun Song
Jincai Tuo
Lina Sun
Long Su
Mingfeng Zhang
Shuang Dai
Publication venue: Elsevier BV
Publication date: 01/06/2022
Field of study

Crossref

Ore characteristics of the sandstone-type Daying uranium deposit in the Ordos Basin, northwestern China

Author: Aisheng Miao
Chengyong Zhang
Chenjun Wu
Lu Liu
Mingjian Dai
Shuang Chen
Yangquan Jiao
Yunbiao Peng
Zilong Zhang
Publication venue: Canadian Science Publishing
Publication date: 01/08/2017
Field of study

The Ordos Basin is one of the top oil-, gas-, and coal-producing basins in China and is increasingly recognized as an important uranium mineralization province. Uranium deposits occur near the margin of the basin and are mainly hosted in the sandstones of the Jurassic Zhiluo Formation. The Daying uranium deposit in the Ordos Basin is one of the most important large sandstone-type uranium deposits in China. Based on thin section analysis and electron microprobe measurements, we used analytical chemical data to study the characteristics of the Daying uranium deposit, including the type, structure, particle size, material composition, chemical composition, form, and valence state of the uranium. The uranium mainly exists in three forms: an absorbed form, independent minerals, and uranium-bearing minerals. Most of the uranium in the ore is U4+, and the proportion of U6+ ranges from 18% to 55%, with an average of 33%. The proportion of U6+ is relatively high in the cores containing low-grade ore. This study provides a reference for determining the best smelting technology with which to further develop this deposit. </jats:p

Crossref