90 research outputs found
Beyond Reverse KL: Generalizing Direct Preference Optimization with Diverse Divergence Constraints
The increasing capabilities of large language models (LLMs) raise
opportunities for artificial general intelligence but concurrently amplify
safety concerns, such as potential misuse of AI systems, necessitating
effective AI alignment. Reinforcement Learning from Human Feedback (RLHF) has
emerged as a promising pathway towards AI alignment but brings forth challenges
due to its complexity and dependence on a separate reward model. Direct
Preference Optimization (DPO) has been proposed as an alternative, and it
remains equivalent to RLHF under the reverse KL regularization constraint. This
paper presents -DPO, a generalized approach to DPO by incorporating diverse
divergence constraints. We show that under certain -divergences, including
Jensen-Shannon divergence, forward KL divergences and -divergences, the
complex relationship between the reward and optimal policy can also be
simplified by addressing the Karush-Kuhn-Tucker conditions. This eliminates the
need for estimating the normalizing constant in the Bradley-Terry model and
enables a tractable mapping between the reward function and the optimal policy.
Our approach optimizes LLMs to align with human preferences in a more efficient
and supervised manner under a broad set of divergence constraints. Empirically,
adopting these divergences ensures a balance between alignment performance and
generation diversity. Importantly, -DPO outperforms PPO-based methods in
divergence efficiency, and divergence constraints directly influence expected
calibration error (ECE).Comment: Preprin
MoTiAC: Multi-Objective Actor-Critics for Real-Time Bidding
Online real-time bidding (RTB) is known as a complex auction game where ad
platforms seek to consider various influential key performance indicators
(KPIs), like revenue and return on investment (ROI). The trade-off among these
competing goals needs to be balanced on a massive scale. To address the
problem, we propose a multi-objective reinforcement learning algorithm, named
MoTiAC, for the problem of bidding optimization with various goals.
Specifically, in MoTiAC, instead of using a fixed and linear combination of
multiple objectives, we compute adaptive weights overtime on the basis of how
well the current state agrees with the agent's prior. In addition, we provide
interesting properties of model updating and further prove that Pareto
optimality could be guaranteed. We demonstrate the effectiveness of our method
on a real-world commercial dataset. Experiments show that the model outperforms
all state-of-the-art baselines.Comment: 8 Pages, Extensive Experiment
A Survey on Knowledge-Enhanced Pre-trained Language Models
Natural Language Processing (NLP) has been revolutionized by the use of
Pre-trained Language Models (PLMs) such as BERT. Despite setting new records in
nearly every NLP task, PLMs still face a number of challenges including poor
interpretability, weak reasoning capability, and the need for a lot of
expensive annotated data when applied to downstream tasks. By integrating
external knowledge into PLMs,
\textit{\underline{K}nowledge-\underline{E}nhanced \underline{P}re-trained
\underline{L}anguage \underline{M}odels} (KEPLMs) have the potential to
overcome the above-mentioned limitations. In this paper, we examine KEPLMs
systematically through a series of studies. Specifically, we outline the common
types and different formats of knowledge to be integrated into KEPLMs, detail
the existing methods for building and evaluating KEPLMS, present the
applications of KEPLMs in downstream tasks, and discuss the future research
directions. Researchers will benefit from this survey by gaining a quick and
comprehensive overview of the latest developments in this field.Comment: 19 pages, 12 figures, 192 reference
Active Policy Improvement from Multiple Black-box Oracles
Reinforcement learning (RL) has made significant strides in various complex
domains. However, identifying an effective policy via RL often necessitates
extensive exploration. Imitation learning aims to mitigate this issue by using
expert demonstrations to guide exploration. In real-world scenarios, one often
has access to multiple suboptimal black-box experts, rather than a single
optimal oracle. These experts do not universally outperform each other across
all states, presenting a challenge in actively deciding which oracle to use and
in which state. We introduce MAPS and MAPS-SE, a class of policy improvement
algorithms that perform imitation learning from multiple suboptimal oracles. In
particular, MAPS actively selects which of the oracles to imitate and improve
their value function estimates, and MAPS-SE additionally leverages an active
state exploration criterion to determine which states one should explore. We
provide a comprehensive theoretical analysis and demonstrate that MAPS and
MAPS-SE enjoy sample efficiency advantage over the state-of-the-art policy
improvement algorithms. Empirical results show that MAPS-SE significantly
accelerates policy optimization via state-wise imitation learning from multiple
oracles across a broad spectrum of control tasks in the DeepMind Control Suite.
Our code is publicly available at: https://github.com/ripl/maps
Chromium phosphide CrP as highly active and stable electrocatalysts for oxygen electroreduction in alkaline media
Catalysts for oxygen reduction reaction (ORR) are key components in emerging energy technologies such as fuel cells and metal-air batteries. Developing low-cost, high performance and stable electrocatalysts is critical for the extensive implementation of these technologies. Herein, we present a procedure to prepare colloidal chromium phosphide CrP nanocrystals and we test their performance as ORR electrocatalyst. CrP-based catalysts exhibited remarkable activities with a limiting current density of 4.94¿mA¿cm-2 at 0.2¿V, a half-potential of 0.65¿V and an onset potential of 0.8¿V at 1600¿rpm, which are comparable to commercial Pt/C. Advantageously, CrP-based catalysts displayed much higher stabilities and higher tolerances to methanol in alkaline solution. Using density functional theory calculations, we demonstrate CrP to provide a very strong chemisorption of O2 that facilitates its reduction and explains the excellent ORR performance experimentally demonstrated.Postprint (author's final draft
Chromium phosphide CrP as highly active and stable electrocatalysts for oxygen electroreduction in alkaline media
Catalysts for oxygen reduction reaction (ORR) are key components in emerging energy technologies such as fuel cells and metal-air batteries. Developing low-cost, high performance and stable electrocatalysts is critical for the extensive implementation of these technologies. Herein, we present a procedure to prepare colloidal chromium phosphide CrP nanocrystals and we test their performance as ORR electrocatalyst. CrP-based catalysts exhibited remarkable activities with a limiting current density of 4.94 mA cm at 0.2 V, a half-potential of 0.65 V and an onset potential of 0.8 V at 1600 rpm, which are comparable to commercial Pt/C. Advantageously, CrP-based catalysts displayed much higher stabilities and higher tolerances to methanol in alkaline solution. Using density functional theory calculations, we demonstrate CrP to provide a very strong chemisorption of O that facilitates its reduction and explains the excellent ORR performance experimentally demonstrated
SnS2/g-C3N4/graphite nanocomposites as durable lithium-ion battery anode with high pseudocapacitance contribution
Altres ajuts: the CERCA Programme /Generalitat de Catalunya. Part of the present work has been performed in the framework of Universitat Autònoma de Barcelona Materials Science PhD program.Tin disulfide is a promising anode material for Li-ion batteries (LIB) owing to its high theoretical capacity and the abundance of its composing elements. However, bare SnS suffers from low electrical conductivity and large volume expansion, which results in poor rate performance and cycling stability. Herein, we present a solution-based strategy to grow SnS nanostructures within a matrix of porous g-CN (CN) and high electrical conductivity graphite plates (GPs). We test the resulting nanocomposite as anode in LIBs. First, SnS nanostructures with different geometries are tested, to find out that thin SnS nanoplates (SnS-NPLs) provide the highest performances. Such SnS-NPLs, incorporated into hierarchical SnS/CN/GP nanocomposites, display excellent rate capabilities (536.5 mA h g at 2.0 A g) and an outstanding stability (∼99.7% retention after 400 cycles), which are partially associated with a high pseudocapacitance contribution (88.8% at 1.0 mV s). The excellent electrochemical properties of these nanocomposites are ascribed to the synergy created between the three nanocomposite components: i) thin SnS-NPLs provide a large surface for rapid Li-ion intercalation and a proper geometry to stand volume expansions during lithiation/delithiation cycles; ii) porous CN prevents SnS-NPLs aggregation, habilitates efficient channels for Li-ion diffusion and buffer stresses associated to SnS volume changes; and iii) conductive GPs allow an efficient charge transport
Compositionally tuned NixSn alloys as anode materials for lithium-ion and sodium-ion batteries with a high pseudocapacitive contribution
Nickel tin alloy nanoparticles (NPs) with tuned composition NixSn (0.6 ≤ x ≤ 1.9) were synthesized by a solution-based procedure and used as anode materials for Li-ion batteries (LIBs) and Na-ion batteries (SIBs). Among the compositions tested, Ni₀₉Sn-based electrodes exhibited the best performance in both LIBs and SIBs. As LIB anodes, Ni₀₉Sn-based electrodes delivered charge-discharge capacities of 980 mAh g⁻¹ after 340 cycles at 0.2 A g⁻¹ rate, which surpassed their maximum theoretical capacity considering that only Sn is lithiated. A kinetic characterization of the charge-discharge process demonstrated the electrode performance to be aided by a significant pseudocapacitive contribution that compensated for the loss of energy storage capacity associated to the solid-electrolyte interphase formation. This significant pseudocapacitive contribution, which not only translated into higher capacities but also longer durability, was associated to the small size of the crystal domains and the proper electrode composition. The performance of NixSn-based electrodes toward Na-ion storage was also characterized, reaching significant capacities above 200 mAh g⁻¹ at 0.1 A g⁻¹ but with a relatively fast fade over 120 continuous cycles. A relatively larger pseudocapacitive contribution was obtained in Ni Sn-based electrodes for SIBs when compared with LIBs, consistently with the lower contribution of the Na ion diffusion associated to its larger size
- …