15 research outputs found

    Secrets of RLHF in Large Language Models Part I: PPO

    Full text link
    Large language models (LLMs) have formulated a blueprint for the advancement of artificial general intelligence. Its primary objective is to function as a human-centric (helpful, honest, and harmless) assistant. Alignment with humans assumes paramount significance, and reinforcement learning with human feedback (RLHF) emerges as the pivotal technological paradigm underpinning this pursuit. Current technical routes usually include \textbf{reward models} to measure human preferences, \textbf{Proximal Policy Optimization} (PPO) to optimize policy model outputs, and \textbf{process supervision} to improve step-by-step reasoning capabilities. However, due to the challenges of reward design, environment interaction, and agent training, coupled with huge trial and error cost of large language models, there is a significant barrier for AI researchers to motivate the development of technical alignment and safe landing of LLMs. The stable training of RLHF has still been a puzzle. In the first report, we dissect the framework of RLHF, re-evaluate the inner workings of PPO, and explore how the parts comprising PPO algorithms impact policy agent training. We identify policy constraints being the key factor for the effective implementation of the PPO algorithm. Therefore, we explore the PPO-max, an advanced version of PPO algorithm, to efficiently improve the training stability of the policy model. Based on our main results, we perform a comprehensive analysis of RLHF abilities compared with SFT models and ChatGPT. The absence of open-source implementations has posed significant challenges to the investigation of LLMs alignment. Therefore, we are eager to release technical reports, reward models and PPO code

    Visible-Light Photocatalytic Reduction of Aryl Halides as a Source of Aryl Radicals

    No full text
    Aryl- and heteroaryl units are present in a wide variety of natural products, pharmaceuticals, and functional materials. The method for reduction of aryl halides with ubiquitous distribution is highly sought after for late-stage construction of various aromatic compounds. The visible-light-driven reduction of aryl halides to aryl radicals by electron transfer provides an efficient, simple, and environmentally friendly method for the construction of aromatic compounds. This review summarizes the recent progress in the generation of aryl radicals by visible-light-driven reduction of aryl halides with metal complexes, organic compounds, semiconductors as catalysts, and alkali-assisted reaction system. The ability and mechanism of reduction of aromatic halides in various visible light induced systems are summarized, intending to illustrate a comprehensive introduction of this research topic to the readers

    Physics-Based Electrothermal Stress Evaluation Approach of IGBT Modules Combined With Artificial Neural Network Model

    No full text
    Due to the disparate timescale behavior in the electrical and thermal aspects, achieving a balance between simulation efficiency and accuracy in electrothermal analysis of insulated gate bipolar transistor (IGBT) modules has been a challenging task. A physical-based electrothermal stress evaluation approach combining with artificial neural network (ANN) model is proposed in this article, which significantly improves performance in circuit simulation. The training data for ANN models are derived from the Hefner physical model, a well-established model integrated in Saber. By re-expressing the Hefner model using MATLAB scripts, high-precision data can be efficiently obtained. Double-pulse experiments show that the switching transient characterized by the Hefner model have high precision, with an error within 5% compared to the experimental data. Additionally, the transient behavior of IGBT devices is further described by a two-layer feed-forward ANN, trained using datasets obtained by varying parasitic or operating parameters in the re-expressed Hefner model. Combining the physical model with the ANN models, the proposed approach can simulate not only transient electrical behavior but also long-term thermal behavior with accurate switching energy. This approach has been implemented in MATLAB/Simulink and verified with Saber for system-level circuit simulation. The electrothermal stress evaluation results show that the simulation efficiency is significantly improved (180 times faster than Saber under the simulation settings in this article), while maintaining high precision, and the error is within 2.5%. Experimental results also validate the accuracy of proposed model in predicting the voltage and current stress, with a maximum error of 1.5%

    pH-Sensitive Poly(Ī²-amino ester)s Nanocarriers Facilitate the Inhibition of Drug Resistance in Breast Cancer Cells

    No full text
    Multidrug resistance (MDR) remains an unmet challenge in chemotherapy. Stimuli-responsive nanocarriers emerge as a promising tool to overcome MDR. Herein, pH-sensitive poly(β-amino ester)s polymers (PHP)-based micellar nanoparticles were synthesized for enhanced doxorubicin (DOX) delivery in drug resistant breast cancer MCF-7/ADR cells. DOX-loaded PHP micelles showed rapid cell-internalization and lysosomal escape in MCF-7/ADR cells. The cytotoxicity assays showed relatively higher cell inhibition of DOX-loaded PHP micelles than that of free DOX against MCF-7/ADR cells. Further mechanistic studies showed that PHP micelles were able to inhibit P-glycoprotein (P-gp) activity by lowering mitochondrial membrane potentials and ATP levels. These results suggested that the enhanced antitumor effect might be attributed to PHP-mediated lysosomal escape and drug efflux inhibition. Therefore, PHP would be a promising pH-responsive nanocarrier for enhanced intracellular drug delivery and overcoming MDR in cancer cells

    Sideā€Gate BNā€MoS2 Transistor for Reconfigurable Multifunctional Electronics

    No full text
    Abstract Developing 2D reconfigurable multifunctional devices is of great potential in further miniaturizing the chip area and simplifying circuit design. 2D van der Waals (vdW) heterostructures offer a novel approach to realizing reconfigurable multifunctional devices. Despite the numerous previous reports that have integrated various functions in a single 2D heterostructures device, most of those devices are based on a complex multilayer heterostructure or an airā€unstable channel material, limiting their ability to be applied in integrated circuits. There is an urgent need to develop 2D reconfigurable multifunctional devices that have a simple structure and stable electrical properties. In this work, a sideā€gate reconfigurable device is illustrated based on simple BNā€MoS2 vdW heterostructures. Three different functions in a single device have been achieved, including a diode, doubleā€sideā€gate reconfigurable logic transistor, and top floating gate memory. A lateral n+ā€n homojunction is created along the MoS2 channel and the rectification ratio is above 105. Reconfigurable logic operations (OR, AND) can be achieved in a single doubleā€sideā€gate device and the current on/off ratio is ā‰ˆt 104. Moreover, the device can act as a floating gate memory under back gate operation. Those results pave the way for integrating the same reconfigurable multifunctional devices to realize complex electronic systems

    PKM2 is involved in neuropathic pain by regulating ERK and STAT3 activation in rat spinal cord

    No full text
    Abstract Background Pyruvate kinase isozymes M2 (PKM2), as a member of pyruvate kinase family, plays a role of glycolytic enzyme in glucose metabolism. It also functions as protein kinase in cell proliferation, signaling, immunity, and gene transcription. In this study, the role of PKM2 in neuropathic pain induced by chronic constriction injury (CCI) was investigated. Methods Rats were randomly grouped to establish CCI models. PKM2, extracellular regulated protein kinases (EKR), p-ERK, signal transducers and activators of transcription (STAT3), p-STAT3, phosphoinositide 3-kinase/protein kinase B (PI3K/AKT) and p-PI3K/AKT proteins expression in spinal cord was examined by Western blot analysis. Cellular location of PKM2 was examined by immunofluorescence. Knockdown of PKM2 was achieved by intrathecal injection of specific small interfering RNA (siRNA). Von Frey filaments and radiant heat tests were performed to determine mechanical allodynia and thermal hyperalgesia respectively. Lactate and adenosine triphosphate (ATP) contents were measured by specific kits. Tumor necrosis factor alpha (TNF-Ī±) and interleukin-1 beta (IL-1Ī²) levels were detected by ELISA kits. Results CCI markedly increased PKM2 level in rat spinal cord. Double immunofluorescent staining showed that PKM2 co-localized with neuron, astrocyte, and microglia. Intrathecal injection of PKM2 siRNA not only attenuated CCI-induced ERK and STAT3 activation, but also attenuated mechanical allodynia and thermal hyperalgesia induced by CCI. However, PKM2 siRNA failed to inhibit the activation of AKT. In addition, PKM2 siRNA significantly suppressed the production of lactate and pro-inflammatory mediators. Conclusion Our findings demonstrate that inhibiting PKM2 expression effectively attenuates CCI-induced neuropathic pain and inflammatory responses in rats, possibly through regulating ERK and STAT3 signaling pathway
    corecore