51 research outputs found

    Secrets of RLHF in Large Language Models Part II: Reward Modeling

    Full text link
    Reinforcement Learning from Human Feedback (RLHF) has become a crucial technology for aligning language models with human values and intentions, enabling models to produce more helpful and harmless responses. Reward models are trained as proxies for human preferences to drive reinforcement learning optimization. While reward models are often considered central to achieving high performance, they face the following challenges in practical applications: (1) Incorrect and ambiguous preference pairs in the dataset may hinder the reward model from accurately capturing human intent. (2) Reward models trained on data from a specific distribution often struggle to generalize to examples outside that distribution and are not suitable for iterative RLHF training. In this report, we attempt to address these two issues. (1) From a data perspective, we propose a method to measure the strength of preferences within the data, based on a voting mechanism of multiple reward models. Experimental results confirm that data with varying preference strengths have different impacts on reward model performance. We introduce a series of novel methods to mitigate the influence of incorrect and ambiguous preferences in the dataset and fully leverage high-quality preference data. (2) From an algorithmic standpoint, we introduce contrastive learning to enhance the ability of reward models to distinguish between chosen and rejected responses, thereby improving model generalization. Furthermore, we employ meta-learning to enable the reward model to maintain the ability to differentiate subtle differences in out-of-distribution samples, and this approach can be utilized for iterative RLHF optimization

    Periodontal Tissue Regeneration Using Fibroblast Growth Factor -2: Randomized Controlled Phase II Clinical Trial

    Get PDF
    Background: The options for medical use of signaling molecules as stimulators of tissue regeneration are currently limited. Preclinical evidence suggests that fibroblast growth factor (FGF)-2 can promote periodontal regeneration. This study aimed to clarify the activity of FGF-2 in stimulating regeneration of periodontal tissue lost by periodontitis and to evaluate the safety of such stimulation. Methodology/Principal Findings: We used recombinant human FGF-2 with 3% hydroxypropylcellulose (HPC) as vehicle and conducted a randomized double-blinded controlled trial involving 13 facilities. Subjects comprised 74 patients displaying a 2- or 3-walled vertical bone defect as measured ?3 mm apical to the bone crest. Patients were randomly assigned to 4 groups: Group P, given HPC with no FGF-2; Group L, given HPC containing 0.03% FGF-2; Group M, given HPC cotaining 0.1% FGF-2; and Group H, given HPC Containing 0.3% FGF-2. Each patient underwent flap operation during which we administered 200 μL of the appropriate investigational drug to the bone defect. Before and for 36 weeks following administration, patients underwent periodontal tissue inspections and standardized radiography of the region under investigation. As a result, a significant difference (p = 0.021) in rate of increase in alveolar bone height was identified between Group P (23.92%) and Group H (58.62%) at 36 weeks. The linear increase in alveolar bone height at 36 weeks in Group P and H was 0.95 mm and 1.85 mm, respectively (p = 0.132). No serious adverse events attribute to the investigational drug were identified. Conclusions: Although no statistically significant differences were noted for gains in clinical attachment level and alveolar bone gain for FGF-2 groups versus Group P, the significant difference in rate of increase in alveolar bone height (p = 0.021) between Groups P and H at 36 weeks suggests that some efficacy could be expected from FGF-2 in stimulating regeneration of periodontal tissue in patients with periodontitis

    Research on self-learning control method for aircraft engine above idle state

    No full text
    The iterative learning control for aircraft engine above idle state is studied. An approach combining the proportional integral iterative learning with the traditional proportional integral derivative controller is proposed and then this hybrid iterative learning controller is constructed to control the speed of three typical engine models. In the simulation study, the proposed method is applied to the nonlinear component level engine model, state variable engine model, and linear parameter-varying engine model; the results show that the performance of the proposed hybrid iterative learning controller is much better than the traditional proportional integral derivative controller

    Numerical model of A.C. glow discharge plasma anemometer via the coupling of gas flow and plasma model

    No full text
    A new approach to build the numerical modeling of AC (alternating current) plasma anemometer is proposed. Firstly, the plasma model and gas flow model utilized in the proposed method are introduced. The plasma model (xpdp2) is built by PIC/MCC modeling method, while gas flow field model is the fluid model. By combining the flow field model and plasma model, the proposed anemometer model could be obtained. Then the effects of flow velocity on the ion density distribution, electron density distribution and electric potential distribution are studied from micro perspective, and the results show that charged particles move towards the direction of flow velocity. Another facts can also be observed, the movement of electron is not obvious, and flow velocity has no effect on the electronic potential. Finally, the effects of supply voltage, discharge frequency and electrode spacing on the discharge characteristics are investigated from macro perspective, and the results show that there is a nearly linear relationship between flow velocity and gap voltage, which indicate that the plasma anemometer could be applied for flow velocity measurement. The simulation result shows that linear relationships are pretty good when the frequencies are 2 MHz and 3.65 MHz. In addition, the result also shows that, within our chosen distance, small spacing is more suitable for high frequency plasma anemometer

    Research on the Plasma Anemometer Based on AC Glow Discharge

    No full text
    A new plasma anemometer based on AC glow discharge is designed in this article. Firstly, theoretical analysis of plasma anemometer working principle is introduced to prove the feasibility of the experimental measurement method. Then the experiments are carried out to study the effects of different parameters on the static discharge characteristics of the plasma anemometer system, by which the system optimization methods are obtained. Finally, several groups of appropriate parameters are selected to build the plasma anemometer system based on resistance capacitance coupling negative feedback AC glow discharge, and different airflow speeds are applied to obtain the achievable velocity measurement range. The results show that there is a linear relationship between airflow velocity and discharge current in an allowable error range, which can be applied for airflow velocity measurement. Negative feedback coupling module, which is composed of the coupling resistance and the coupling capacitance, has good effects on improving the system stability. The measurement range of the airflow velocity is significantly increased when the electrode gap is 3 mm, coupling resistance is 470 Ω, and coupling capacitance is 220 pF

    Fiber-Optic Fabry–Perot Sensor for Simultaneous Measurement of Tilt Angle and Vibration Acceleration

    No full text

    Predicting Critical Nodes in Temporal Networks by Dynamic Graph Convolutional Networks

    No full text
    Many real-world systems can be expressed in temporal networks with nodes playing different roles in structure and function, and edges representing the relationships between nodes. Identifying critical nodes can help us control the spread of public opinions or epidemics, predict leading figures in academia, conduct advertisements for various commodities and so on. However, it is rather difficult to identify critical nodes, because the network structure changes over time in temporal networks. In this paper, considering the sequence topological information of temporal networks, a novel and effective learning framework based on the combination of special graph convolutional and long short-term memory network (LSTM) is proposed to identify nodes with the best spreading ability. The special graph convolutional network can embed nodes in each sequential weighted snapshot and LSTM is used to predict the future importance of timing-embedded features. The effectiveness of the approach is evaluated by a weighted Susceptible-Infected-Recovered model. Experimental results on four real-world temporal networks demonstrate that the proposed method outperforms both traditional and deep learning benchmark methods in terms of the Kendall Ï„ coefficient and top k hit rate

    Effects of pin fins and vortex generators on thermal performance in a microchannel with Al2O3 nanofluids

    No full text
    This paper performs a comparative analysis to obtain the optimal cross-section shape and parameters of both pin fins and vortex generators. A novel combined structure with pin fins and vortex generators is proposed to enhance thermal performance of an integrated microchannel heat sink. Effects of nanoparticle diameter and volume fraction are investigated using Al2O3 nanofluid and DI-water as working fluid. Pin fins and vortex generators cause enhancements of flow disturbance and heat transfer on microchannel heat sinks. Results indicate that oval pin fins have better improvements of thermal/hydraulic performance compared to round and diamond pin fins. The oval pin fin with 0.4 mm spacing and 0.1 mm height presents the highest overall performance factor in the Reynolds number range of 340–640. Presence of vortices intensifies the mixing of the hot fluid near bottom surface and cold fluid near top surface. The optimal vortex generator with length of 0.08 mm and height of 0.06 mm provides a 30% increase in overall performance factor compared to the rectangular microchannel at Reynolds number of 340. Mechanism of heat transfer enhancement is analyzed by investigating flow velocity, temperature distribution and field synergy angle distribution in microchannels. Based on the field synergy principle, it is found that a small and uniformly distributed synergy angle is achieved in the integrated microchannel. According to comparisons of the overall performance factor and total thermal resistance, the optimal nanoparticle diameter and Al2O3 volume fraction of nanofluids are 20 nm and 4%, respectively

    A Microgrid Energy Management Strategy Considering Carbon Quota Guided Demand Response

    No full text
    In order to reduce the forecast output error caused by the randomness and volatility of renewable energy in microgrid operation, a microgrid energy management strategy considering carbon quota guided demand response is proposed. A two-layer model predictive control (MPC) energy management model is constructed. The upper layer guides electric vehicles to participate in the demand response of microgrid by constructing a carbon emission quota mechanism to realize the economic operation of microgrid and reduce carbon emissions. The lower layer uses the model predictive control rolling optimization and the power fluctuation caused by the prediction error of renewable energy is suppressed by the short time scale model predictive control. The results of calculation analysis show that the proposed energy management strategy can effectively guide electric vehicles or other controllable loads to participate in demand response and realize low-carbon economic dispatch and stable operation of microgrid
    • …
    corecore