122 research outputs found
How to Query Human Feedback Efficiently in RL?
Reinforcement Learning with Human Feedback (RLHF) is a paradigm in which an
RL agent learns to optimize a task using pair-wise preference-based feedback
over trajectories, rather than explicit reward signals. While RLHF has
demonstrated practical success in fine-tuning language models, existing
empirical work does not address the challenge of how to efficiently sample
trajectory pairs for querying human feedback. In this study, we propose an
efficient sampling approach to acquiring exploratory trajectories that enable
accurate learning of hidden reward functions before collecting any human
feedback. Theoretical analysis demonstrates that our algorithm requires less
human feedback for learning the optimal policy under preference-based models
with linear parameterization and unknown transitions, compared to the existing
literature. Specifically, our framework can incorporate linear and low-rank
MDPs. Additionally, we investigate RLHF with action-based comparison feedback
and introduce an efficient querying algorithm tailored to this scenario
Provable Offline Reinforcement Learning with Human Feedback
In this paper, we investigate the problem of offline reinforcement learning
with human feedback where feedback is available in the form of preference
between trajectory pairs rather than explicit rewards. Our proposed algorithm
consists of two main steps: (1) estimate the implicit reward using Maximum
Likelihood Estimation (MLE) with general function approximation from offline
data and (2) solve a distributionally robust planning problem over a confidence
set around the MLE. We consider the general reward setting where the reward can
be defined over the whole trajectory and provide a novel guarantee that allows
us to learn any target policy with a polynomial number of samples, as long as
the target policy is covered by the offline data. This guarantee is the first
of its kind with general function approximation. To measure the coverage of the
target policy, we introduce a new single-policy concentrability coefficient,
which can be upper bounded by the per-trajectory concentrability coefficient.
We also establish lower bounds that highlight the necessity of such
concentrability and the difference from standard RL, where state-action-wise
rewards are directly observed. We further extend and analyze our algorithm when
the feedback is given over action pairs
Genetic dissection of rice grain shape using a recombinant inbred line population derived from two contrasting parents and fine mapping a pleiotropic quantitative trait locus qGL7
<p>Abstract</p> <p>Background</p> <p>The three-dimensional shape of grain, measured as grain length, width, and thickness (GL, GW, and GT), is one of the most important components of grain appearance in rice. Determining the genetic basis of variations in grain shape could facilitate efficient improvements in grain appearance. In this study, an F<sub>7:8 </sub>recombinant inbred line population (RIL) derived from a cross between <it>indica </it>and <it>japonica </it>cultivars (Nanyangzhan and Chuan7) contrasting in grain size was used for quantitative trait locus (QTL) mapping. A genetic linkage map was constructed with 164 simple sequence repeat (SSR) markers. The major aim of this study was to detect a QTL for grain shape and to fine map a minor QTL, <it>qGL7</it>.</p> <p>Results</p> <p>Four QTLs for GL were detected on chromosomes 3 and 7, and 10 QTLs for GW and 9 QTLs for GT were identified on chromosomes 2, 3, 5, 7, 9 and 10, respectively. A total of 28 QTLs were identified, of which several are reported for the first time; four major QTLs and six minor QTLs for grain shape were also commonly detected in both years. The minor QTL, <it>qGL7</it>, exhibited pleiotropic effects on GL, GW, GT, 1000-grain weight (TGW), and spikelets per panicle (SPP) and was further validated in a near isogenic F<sub>2 </sub>population (NIL-F<sub>2</sub>). Finally, <it>qGL7 </it>was narrowed down to an interval between InDel marker RID711 and SSR marker RM6389, covering a 258-kb region in the Nipponbare genome, and cosegregated with InDel markers RID710 and RID76.</p> <p>Conclusion</p> <p>Materials with very different phenotypes were used to develop mapping populations to detect QTLs because of their complex genetic background. Progeny tests proved that the minor QTL, <it>qGL7</it>, could display a single mendelian characteristic. Therefore, we suggested that minor QTLs for traits with high heritability could be isolated using a map-based cloning strategy in a large NIL-F<sub>2 </sub>population. In addition, combinations of different QTLs produced diverse grain shapes, which provide the ability to breed more varieties of rice to satisfy consumer preferences.</p
Provably Efficient CVaR RL in Low-rank MDPs
We study risk-sensitive Reinforcement Learning (RL), where we aim to maximize
the Conditional Value at Risk (CVaR) with a fixed risk tolerance . Prior
theoretical work studying risk-sensitive RL focuses on the tabular Markov
Decision Processes (MDPs) setting. To extend CVaR RL to settings where state
space is large, function approximation must be deployed. We study CVaR RL in
low-rank MDPs with nonlinear function approximation. Low-rank MDPs assume the
underlying transition kernel admits a low-rank decomposition, but unlike prior
linear models, low-rank MDPs do not assume the feature or state-action
representation is known. We propose a novel Upper Confidence Bound (UCB)
bonus-driven algorithm to carefully balance the interplay between exploration,
exploitation, and representation learning in CVaR RL. We prove that our
algorithm achieves a sample complexity of to yield an -optimal CVaR, where
is the length of each episode, is the capacity of action space, and is
the dimension of representations. Computational-wise, we design a novel
discretized Least-Squares Value Iteration (LSVI) algorithm for the CVaR
objective as the planning oracle and show that we can find the near-optimal
policy in a polynomial running time with a Maximum Likelihood Estimation
oracle. To our knowledge, this is the first provably efficient CVaR RL
algorithm in low-rank MDPs.Comment: The first three authors contribute equally and are ordered randoml
Analysis of corrections to the eikonal approximation
Various corrections to the eikonal approximations are studied for two- and
three-body nuclear collisions with the goal to extend the range of validity of
this approximation to beam energies of 10 MeV/nucleon. Wallace's correction
does not improve much the elastic-scattering cross sections obtained at the
usual eikonal approximation. On the contrary, a semiclassical approximation
that substitutes the impact parameter by a complex distance of closest approach
computed with the projectile-target optical potential efficiently corrects the
eikonal approximation. This opens the possibility to analyze data measured down
to 10 MeV/nucleon within eikonal-like reaction models.Comment: 10 pages, 8 figure
InstructBio: A Large-scale Semi-supervised Learning Paradigm for Biochemical Problems
In the field of artificial intelligence for science, it is consistently an
essential challenge to face a limited amount of labeled data for real-world
problems. The prevailing approach is to pretrain a powerful task-agnostic model
on a large unlabeled corpus but may struggle to transfer knowledge to
downstream tasks. In this study, we propose InstructMol, a semi-supervised
learning algorithm, to take better advantage of unlabeled examples. It
introduces an instructor model to provide the confidence ratios as the
measurement of pseudo-labels' reliability. These confidence scores then guide
the target model to pay distinct attention to different data points, avoiding
the over-reliance on labeled data and the negative influence of incorrect
pseudo-annotations. Comprehensive experiments show that InstructBio
substantially improves the generalization ability of molecular models, in not
only molecular property predictions but also activity cliff estimations,
demonstrating the superiority of the proposed method. Furthermore, our evidence
indicates that InstructBio can be equipped with cutting-edge pretraining
methods and used to establish large-scale and task-specific pseudo-labeled
molecular datasets, which reduces the predictive errors and shortens the
training process. Our work provides strong evidence that semi-supervised
learning can be a promising tool to overcome the data scarcity limitation and
advance molecular representation learning
Neuroendoscopy surgery for hypertensive intracerebral hemorrhage with concurrent brain herniation: a retrospective study of comparison with craniotomy
BackgroundHypertensive intracerebral hemorrhage combined with cerebral hernia (HIH-CH) is a serious condition. Neuroendoscopy can effectively remove intracranial hematoma, but there is no relevant research support for its utility in patients with HIH-CH. The purpose of this study is to investigate the efficacy and safety of neuroendoscopy in patients with HIH-CH.MethodsPatients with HIH-CH who received craniotomy or neuroendoscopy treatment were included. The patients were divided into craniotomy (CHE) group and neuroendoscopy (NEHE) group. Clinical data and follow-up outcome of the two groups were collected. The primary outcome was hematoma clearance.ResultsThe hematoma clearance rate (%) of patients in NEHE group was 97.65 (92.75, 100.00), and that of patients in CHE group was 95.00 (90.00, 100.00), p > 0.05. The operation time and intraoperative bleeding volume of patients in NEHE group were significantly less than those in CHE group (p < 0.05). There was no significant difference in the volume of residual hematoma and the incidence of rebleeding between the two groups (p > 0.05). The length of stay in ICU in NEHE group was significantly shorter than that in CHE group (p < 0.05).ConclusionNeuroendoscopy can safely and effectively remove the intracranial hematoma in patients with hypertensive intracerebral hemorrhage and cerebral hernia, significantly shorten the operation time, reduce the amount of intraoperative hemorrhage, shorten the ICU stay
- …