122 research outputs found

    How to Query Human Feedback Efficiently in RL?

    Full text link
    Reinforcement Learning with Human Feedback (RLHF) is a paradigm in which an RL agent learns to optimize a task using pair-wise preference-based feedback over trajectories, rather than explicit reward signals. While RLHF has demonstrated practical success in fine-tuning language models, existing empirical work does not address the challenge of how to efficiently sample trajectory pairs for querying human feedback. In this study, we propose an efficient sampling approach to acquiring exploratory trajectories that enable accurate learning of hidden reward functions before collecting any human feedback. Theoretical analysis demonstrates that our algorithm requires less human feedback for learning the optimal policy under preference-based models with linear parameterization and unknown transitions, compared to the existing literature. Specifically, our framework can incorporate linear and low-rank MDPs. Additionally, we investigate RLHF with action-based comparison feedback and introduce an efficient querying algorithm tailored to this scenario

    Provable Offline Reinforcement Learning with Human Feedback

    Full text link
    In this paper, we investigate the problem of offline reinforcement learning with human feedback where feedback is available in the form of preference between trajectory pairs rather than explicit rewards. Our proposed algorithm consists of two main steps: (1) estimate the implicit reward using Maximum Likelihood Estimation (MLE) with general function approximation from offline data and (2) solve a distributionally robust planning problem over a confidence set around the MLE. We consider the general reward setting where the reward can be defined over the whole trajectory and provide a novel guarantee that allows us to learn any target policy with a polynomial number of samples, as long as the target policy is covered by the offline data. This guarantee is the first of its kind with general function approximation. To measure the coverage of the target policy, we introduce a new single-policy concentrability coefficient, which can be upper bounded by the per-trajectory concentrability coefficient. We also establish lower bounds that highlight the necessity of such concentrability and the difference from standard RL, where state-action-wise rewards are directly observed. We further extend and analyze our algorithm when the feedback is given over action pairs

    Genetic dissection of rice grain shape using a recombinant inbred line population derived from two contrasting parents and fine mapping a pleiotropic quantitative trait locus qGL7

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The three-dimensional shape of grain, measured as grain length, width, and thickness (GL, GW, and GT), is one of the most important components of grain appearance in rice. Determining the genetic basis of variations in grain shape could facilitate efficient improvements in grain appearance. In this study, an F<sub>7:8 </sub>recombinant inbred line population (RIL) derived from a cross between <it>indica </it>and <it>japonica </it>cultivars (Nanyangzhan and Chuan7) contrasting in grain size was used for quantitative trait locus (QTL) mapping. A genetic linkage map was constructed with 164 simple sequence repeat (SSR) markers. The major aim of this study was to detect a QTL for grain shape and to fine map a minor QTL, <it>qGL7</it>.</p> <p>Results</p> <p>Four QTLs for GL were detected on chromosomes 3 and 7, and 10 QTLs for GW and 9 QTLs for GT were identified on chromosomes 2, 3, 5, 7, 9 and 10, respectively. A total of 28 QTLs were identified, of which several are reported for the first time; four major QTLs and six minor QTLs for grain shape were also commonly detected in both years. The minor QTL, <it>qGL7</it>, exhibited pleiotropic effects on GL, GW, GT, 1000-grain weight (TGW), and spikelets per panicle (SPP) and was further validated in a near isogenic F<sub>2 </sub>population (NIL-F<sub>2</sub>). Finally, <it>qGL7 </it>was narrowed down to an interval between InDel marker RID711 and SSR marker RM6389, covering a 258-kb region in the Nipponbare genome, and cosegregated with InDel markers RID710 and RID76.</p> <p>Conclusion</p> <p>Materials with very different phenotypes were used to develop mapping populations to detect QTLs because of their complex genetic background. Progeny tests proved that the minor QTL, <it>qGL7</it>, could display a single mendelian characteristic. Therefore, we suggested that minor QTLs for traits with high heritability could be isolated using a map-based cloning strategy in a large NIL-F<sub>2 </sub>population. In addition, combinations of different QTLs produced diverse grain shapes, which provide the ability to breed more varieties of rice to satisfy consumer preferences.</p

    Provably Efficient CVaR RL in Low-rank MDPs

    Full text link
    We study risk-sensitive Reinforcement Learning (RL), where we aim to maximize the Conditional Value at Risk (CVaR) with a fixed risk tolerance τ\tau. Prior theoretical work studying risk-sensitive RL focuses on the tabular Markov Decision Processes (MDPs) setting. To extend CVaR RL to settings where state space is large, function approximation must be deployed. We study CVaR RL in low-rank MDPs with nonlinear function approximation. Low-rank MDPs assume the underlying transition kernel admits a low-rank decomposition, but unlike prior linear models, low-rank MDPs do not assume the feature or state-action representation is known. We propose a novel Upper Confidence Bound (UCB) bonus-driven algorithm to carefully balance the interplay between exploration, exploitation, and representation learning in CVaR RL. We prove that our algorithm achieves a sample complexity of O~(H7A2d4τ2ϵ2)\tilde{O}\left(\frac{H^7 A^2 d^4}{\tau^2 \epsilon^2}\right) to yield an ϵ\epsilon-optimal CVaR, where HH is the length of each episode, AA is the capacity of action space, and dd is the dimension of representations. Computational-wise, we design a novel discretized Least-Squares Value Iteration (LSVI) algorithm for the CVaR objective as the planning oracle and show that we can find the near-optimal policy in a polynomial running time with a Maximum Likelihood Estimation oracle. To our knowledge, this is the first provably efficient CVaR RL algorithm in low-rank MDPs.Comment: The first three authors contribute equally and are ordered randoml

    Analysis of corrections to the eikonal approximation

    Full text link
    Various corrections to the eikonal approximations are studied for two- and three-body nuclear collisions with the goal to extend the range of validity of this approximation to beam energies of 10 MeV/nucleon. Wallace's correction does not improve much the elastic-scattering cross sections obtained at the usual eikonal approximation. On the contrary, a semiclassical approximation that substitutes the impact parameter by a complex distance of closest approach computed with the projectile-target optical potential efficiently corrects the eikonal approximation. This opens the possibility to analyze data measured down to 10 MeV/nucleon within eikonal-like reaction models.Comment: 10 pages, 8 figure

    InstructBio: A Large-scale Semi-supervised Learning Paradigm for Biochemical Problems

    Full text link
    In the field of artificial intelligence for science, it is consistently an essential challenge to face a limited amount of labeled data for real-world problems. The prevailing approach is to pretrain a powerful task-agnostic model on a large unlabeled corpus but may struggle to transfer knowledge to downstream tasks. In this study, we propose InstructMol, a semi-supervised learning algorithm, to take better advantage of unlabeled examples. It introduces an instructor model to provide the confidence ratios as the measurement of pseudo-labels' reliability. These confidence scores then guide the target model to pay distinct attention to different data points, avoiding the over-reliance on labeled data and the negative influence of incorrect pseudo-annotations. Comprehensive experiments show that InstructBio substantially improves the generalization ability of molecular models, in not only molecular property predictions but also activity cliff estimations, demonstrating the superiority of the proposed method. Furthermore, our evidence indicates that InstructBio can be equipped with cutting-edge pretraining methods and used to establish large-scale and task-specific pseudo-labeled molecular datasets, which reduces the predictive errors and shortens the training process. Our work provides strong evidence that semi-supervised learning can be a promising tool to overcome the data scarcity limitation and advance molecular representation learning

    Neuroendoscopy surgery for hypertensive intracerebral hemorrhage with concurrent brain herniation: a retrospective study of comparison with craniotomy

    Get PDF
    BackgroundHypertensive intracerebral hemorrhage combined with cerebral hernia (HIH-CH) is a serious condition. Neuroendoscopy can effectively remove intracranial hematoma, but there is no relevant research support for its utility in patients with HIH-CH. The purpose of this study is to investigate the efficacy and safety of neuroendoscopy in patients with HIH-CH.MethodsPatients with HIH-CH who received craniotomy or neuroendoscopy treatment were included. The patients were divided into craniotomy (CHE) group and neuroendoscopy (NEHE) group. Clinical data and follow-up outcome of the two groups were collected. The primary outcome was hematoma clearance.ResultsThe hematoma clearance rate (%) of patients in NEHE group was 97.65 (92.75, 100.00), and that of patients in CHE group was 95.00 (90.00, 100.00), p &gt; 0.05. The operation time and intraoperative bleeding volume of patients in NEHE group were significantly less than those in CHE group (p &lt; 0.05). There was no significant difference in the volume of residual hematoma and the incidence of rebleeding between the two groups (p &gt; 0.05). The length of stay in ICU in NEHE group was significantly shorter than that in CHE group (p &lt; 0.05).ConclusionNeuroendoscopy can safely and effectively remove the intracranial hematoma in patients with hypertensive intracerebral hemorrhage and cerebral hernia, significantly shorten the operation time, reduce the amount of intraoperative hemorrhage, shorten the ICU stay
    • …
    corecore