9 research outputs found

    Preference-grounded Token-level Guidance for Language Model Fine-tuning

    Full text link
    Aligning language models (LMs) with preferences is an important problem in natural language generation. A key challenge is that preferences are typically provided at the sequence level while LM training and generation both occur at the token level. There is, therefore, a granularity mismatch between the preference and the LM training losses, which may complicate the learning problem. In this paper, we address this issue by developing an alternate training process, where we iterate between grounding the sequence-level preference into token-level training guidance, and improving the LM with the learned guidance. For guidance learning, we design a framework that extends the pairwise-preference learning in imitation learning to both variable-length LM generation and utilizing the preference among multiple generations. For LM training, based on the amount of supervised data, we present two minimalist learning objectives that utilize the learned guidance. In experiments, our method performs competitively on two distinct representative LM tasks -- discrete-prompt generation and text summarization

    Regularizing a Model-based Policy Stationary Distribution to Stabilize Offline Reinforcement Learning

    Full text link
    Offline reinforcement learning (RL) extends the paradigm of classical RL algorithms to purely learning from static datasets, without interacting with the underlying environment during the learning process. A key challenge of offline RL is the instability of policy training, caused by the mismatch between the distribution of the offline data and the undiscounted stationary state-action distribution of the learned policy. To avoid the detrimental impact of distribution mismatch, we regularize the undiscounted stationary distribution of the current policy towards the offline data during the policy optimization process. Further, we train a dynamics model to both implement this regularization and better estimate the stationary distribution of the current policy, reducing the error induced by distribution mismatch. On a wide range of continuous-control offline RL datasets, our method indicates competitive performance, which validates our algorithm. The code is publicly available.Comment: International Conference on Machine Learning (ICML) 202

    A Regularized Implicit Policy for Offline Reinforcement Learning

    Full text link
    Offline reinforcement learning enables learning from a fixed dataset, without further interactions with the environment. The lack of environmental interactions makes the policy training vulnerable to state-action pairs far from the training dataset and prone to missing rewarding actions. For training more effective agents, we propose a framework that supports learning a flexible yet well-regularized fully-implicit policy. We further propose a simple modification to the classical policy-matching methods for regularizing with respect to the dual form of the Jensen--Shannon divergence and the integral probability metrics. We theoretically show the correctness of the policy-matching approach, and the correctness and a good finite-sample property of our modification. An effective instantiation of our framework through the GAN structure is provided, together with techniques to explicitly smooth the state-action mapping for robust generalization beyond the static dataset. Extensive experiments and ablation study on the D4RL dataset validate our framework and the effectiveness of our algorithmic designs

    Visual analysis of global research on immunotherapy for gastric cancer: A literature mining from 2012 to 2022

    No full text
    Gastric cancer (GC) is one of the most common malignancies. Immunotherapy becomes an indispensable part of GC. This study conducts bibliometric analysis of immunotherapy for GC to clarify the research status and identify potential new research directions. VOS viewer and CiteSpace visualization software were used to demonstrate collaborations and correlations. A total of 1141 English publications from 2012 to 2022 were included. The number of publications increased year by year. The publications were mainly from China (n = 579, 50.70%), followed by the United States. Fudan University published the most publications (n = 48, 4.21%). Frontiers in Oncology and Journal of Clinical Oncology ranked first in cited and co-cited journals, respectively. Kim Kyoung-Mee published the most publications on immunotherapy for GC (n = 14). The clustering of timeline view and co-cited references show the hotspot transformation on immunotherapy for GC. Initially, the hot topic was “cytokine-induced killer cells” and “myeloid-derived suppressor cells.” In recent years, the focus has turned to “targeted therapy.” “CAR-T” has become the hottest topic, and GC has entered precision therapy phase. Screening patients who can benefit from immunotherapy is key to improving prognosis. The combination of immunotherapy with other treatment options, such as chemotherapy and targeted therapy, is currently the focus of research. Chimeric antigen receptor T cell will be further studied in the future

    Efficacy and safety of totally laparoscopic gastrectomy with uncut Roux-en-Y for gastric cancer: a dual-center retrospective study

    No full text
    Abstract Background Uncut Roux-en-Y (URY) effectively alleviates the prevalent complexities connected with RY, such as Roux-en-Y stasis syndrome (RSS). Nevertheless, for gastric cancer (GC) patients, it is still controversial whether URY has an impact on long-term prognosis and whether it has fewer afferent loop recanalization. Therefore, compare whether URY and RY have differences in prognosis and long-term complications of GC patients undergoing totally laparoscopic gastrectomy (TLG). Methods We analyzed the data of patients who underwent TLG combined with digestive tract reconstruction from dual-center between 2016 and 2022. Only patients undergoing URY and RY were selected for analysis. Relapse-free survival (RFS) and overall survival (OS) were estimated. Bias between the groups was reduced by propensity score matching (PSM). The Cox proportional hazard regression model was used to further analyze the influence of URY on prognosis. Results Two hundred forty two GC patients were enrolled. The URY had significantly shorter operation time, liquid food intake time, and in-hospital stays than the RY (P < 0.001). The URY had fewer long-term and short-term postoperative complications than the RY, especially with regard to RSS, reflux esophagitis, and reflux gastritis. The 3-year and 5-year OS of the URY group and the RY group before PSM: 87.5% vs. 65.6% (P < 0.001) and 81.4% vs. 61.7% (P = 0.001). PSM and Cox multivariate analysis confirmed that compared to RY, URY can improve the short-term and long-term prognosis of GC patients. Conclusion TLG combined with URY for GC, especially for advanced, older, and poorly differentiated patients, may promote postoperative recovery and improve long-term prognosis

    The Efficacy and Safety of the Chinese Herbal Formula, JTTZ, for the Treatment of Type 2 Diabetes with Obesity and Hyperlipidemia: A Multicenter Randomized, Positive-Controlled, Open-Label Clinical Trial

    No full text
    Background and Aim. Studies have shown an increasing number of type 2 diabetes (T2D) patients with concomitant obesity and hyperlipidemia syndromes, resulting from relevant metabolic disorders. However, there are few medications and therapies, which can thoroughly address these issues. Therefore, the current study evaluated the efficacy and safety of using JTTZ, a Chinese herbal formula, to treat T2D with obesity and hyperlipidemia. Methods. A total of 450 participants with T2D (HbA1c ≥ 7.0%; waist circumference ≥ 90 cm and 80 cm in males and females, resp.; and triglycerides (TG) ≥ 1.7 mmol/L) were randomly assigned, in equal proportions, to two groups in this multicenter randomized, positive-controlled, open-label trial. One group received JTTZ formula, and the other received metformin (MET) for 12 consecutive weeks. The primary efficacy outcomes were changes in HbA1c, TG, weight, and waist circumference. Adverse reactions and hypoglycemia were monitored. Results. HbA1c decreased by 0.75 ± 1.32% and 0.71 ± 1.2% in the JTTZ and MET groups, respectively, after 12 weeks of treatment. TG levels in the JTTZ and MET groups were reduced by 0.64 ± 2.37 mmol/L and 0.37 ± 2.18 mmol/L, respectively. Weight was decreased by 2.47 ± 2.71 kg in the JTTZ group and by 2.03 ± 2.36 kg in the MET group. JTTZ also appeared to alleviate insulin resistance and increase HOMA-β. In addition, symptoms were significantly relieved in participants in the JTTZ group compared to those in the MET group. One case of hypoglycemia was reported in the MET group. No severe adverse events were reported in either group. Conclusions. The JTTZ formula led to safe and significant improvements in the blood glucose, blood lipids, and weight levels; relieved symptoms; and enhanced β cell function for T2D patients with obesity and hyperlipidemia. The JTTZ formula has shown that it could potentially be developed as an alternative medicine for patients with T2D, particularly those who cannot tolerate metformin or other hypoglycemic drugs. This trial was registered with Clinicaltrials.gov NCT01471275

    Highly Hydroxide-Conductive Nanostructured Solid Electrolyte via Predesigned Ionic Nanoaggregates

    No full text
    The creation of interconnected ionic nanoaggregates within solid electrolytes is a crucial yet challenging task for fabricating high-performance alkaline fuel cells. Herein, we present a facile and generic approach to embedding ionic nanoaggregates via predesigned hybrid core–shell nanoarchitecture within nonionic polymer membranes as follows: (i) synthesizing core–shell nanoparticles composed of SiO<sub>2</sub>/densely quaternary ammonium-functionalized polystyrene. Because of the spatial confinement effect of the SiO<sub>2</sub> “core”, the abundant hydroxide-conducting groups are locally aggregated in the functionalized polystyrene “shell”, forming ionic nanoaggregates bearing intrinsic continuous ion channels; (ii) embedding these ionic nanoaggregates (20–70 wt %) into the polysulfone matrix to construct interconnected hydroxide-conducting channels. The chemical composition, physical morphology, amount, and distribution of the ionic nanoaggregates are facilely regulated, leading to highly connected ion channels with high effective ion mobility comparable to that of the state-of-the-art Nafion. The resulting membranes display strikingly high hydroxide conductivity (188.1 mS cm<sup>–1</sup> at 80 °C), which is one of the highest results to date. The membranes also exhibit good mechanical properties. The independent manipulation of the conduction function and nonconduction function by the ionic nanoaggregates and nonionic polymer matrix, respectively, opens a new avenue, free of microphase separation, for designing high-performance solid electrolytes for diverse application realms
    corecore