Search CORE

94 research outputs found

Beyond Reverse KL: Generalizing Direct Preference Optimization with Diverse Divergence Constraints

Author: Chen Yuxin
Jiang Yibo
Liu Han
Wang Chaoqi
Yang Chenghao
Publication venue
Publication date: 28/09/2023
Field of study

The increasing capabilities of large language models (LLMs) raise opportunities for artificial general intelligence but concurrently amplify safety concerns, such as potential misuse of AI systems, necessitating effective AI alignment. Reinforcement Learning from Human Feedback (RLHF) has emerged as a promising pathway towards AI alignment but brings forth challenges due to its complexity and dependence on a separate reward model. Direct Preference Optimization (DPO) has been proposed as an alternative, and it remains equivalent to RLHF under the reverse KL regularization constraint. This paper presents

f

-DPO, a generalized approach to DPO by incorporating diverse divergence constraints. We show that under certain

f

-divergences, including Jensen-Shannon divergence, forward KL divergences and

\alpha

-divergences, the complex relationship between the reward and optimal policy can also be simplified by addressing the Karush-Kuhn-Tucker conditions. This eliminates the need for estimating the normalizing constant in the Bradley-Terry model and enables a tractable mapping between the reward function and the optimal policy. Our approach optimizes LLMs to align with human preferences in a more efficient and supervised manner under a broad set of divergence constraints. Empirically, adopting these divergences ensures a balance between alignment performance and generation diversity. Importantly,

f

-DPO outperforms PPO-based methods in divergence efficiency, and divergence constraints directly influence expected calibration error (ECE).Comment: Preprin

arXiv.org e-Print Archive

MoTiAC: Multi-Objective Actor-Critics for Real-Time Bidding

Author: Chen Guihai
Chen Qiong
Gao Xiaofeng
Liu Gongshen
Liu Haishan
Lu Junwei
Yang Chaoqi
Publication venue
Publication date: 18/02/2020
Field of study

Online real-time bidding (RTB) is known as a complex auction game where ad platforms seek to consider various influential key performance indicators (KPIs), like revenue and return on investment (ROI). The trade-off among these competing goals needs to be balanced on a massive scale. To address the problem, we propose a multi-objective reinforcement learning algorithm, named MoTiAC, for the problem of bidding optimization with various goals. Specifically, in MoTiAC, instead of using a fixed and linear combination of multiple objectives, we compute adaptive weights overtime on the basis of how well the current state agrees with the agent's prior. In addition, we provide interesting properties of model updating and further prove that Pareto optimality could be guaranteed. We demonstrate the effectiveness of our method on a real-world commercial dataset. Experiments show that the model outperforms all state-of-the-art baselines.Comment: 8 Pages, Extensive Experiment

arXiv.org e-Print Archive

A Survey on Knowledge-Enhanced Pre-trained Language Models

Author: Chen Yong
Li Yifei
Liu Xiangyu
Shang Yanlei
Zhang Dell
Zhen Chaoqi
Publication venue
Publication date: 27/12/2022
Field of study

Natural Language Processing (NLP) has been revolutionized by the use of Pre-trained Language Models (PLMs) such as BERT. Despite setting new records in nearly every NLP task, PLMs still face a number of challenges including poor interpretability, weak reasoning capability, and the need for a lot of expensive annotated data when applied to downstream tasks. By integrating external knowledge into PLMs, \textit{\underline{K}nowledge-\underline{E}nhanced \underline{P}re-trained \underline{L}anguage \underline{M}odels} (KEPLMs) have the potential to overcome the above-mentioned limitations. In this paper, we examine KEPLMs systematically through a series of studies. Specifically, we outline the common types and different formats of knowledge to be integrated into KEPLMs, detail the existing methods for building and evaluating KEPLMS, present the applications of KEPLMs in downstream tasks, and discuss the future research directions. Researchers will benefit from this survey by gaining a quick and comprehensive overview of the latest developments in this field.Comment: 19 pages, 12 figures, 192 reference

arXiv.org e-Print Archive

Active Policy Improvement from Multiple Black-box Oracles

Author: Chen Yuxin
Liu Xuefeng
Walter Matthew R.
Wang Chaoqi
Yoneda Takuma
Publication venue
Publication date: 17/06/2023
Field of study

Reinforcement learning (RL) has made significant strides in various complex domains. However, identifying an effective policy via RL often necessitates extensive exploration. Imitation learning aims to mitigate this issue by using expert demonstrations to guide exploration. In real-world scenarios, one often has access to multiple suboptimal black-box experts, rather than a single optimal oracle. These experts do not universally outperform each other across all states, presenting a challenge in actively deciding which oracle to use and in which state. We introduce MAPS and MAPS-SE, a class of policy improvement algorithms that perform imitation learning from multiple suboptimal oracles. In particular, MAPS actively selects which of the oracles to imitate and improve their value function estimates, and MAPS-SE additionally leverages an active state exploration criterion to determine which states one should explore. We provide a comprehensive theoretical analysis and demonstrate that MAPS and MAPS-SE enjoy sample efficiency advantage over the state-of-the-art policy improvement algorithms. Empirical results show that MAPS-SE significantly accelerates policy optimization via state-wise imitation learning from multiple oracles across a broad spectrum of control tasks in the DeepMind Control Suite. Our code is publicly available at: https://github.com/ripl/maps

arXiv.org e-Print Archive

Chromium phosphide CrP as highly active and stable electrocatalysts for oxygen electroreduction in alkaline media

Author: Du Ruifeng
Liu Junfeng
Llorca Piqué Jordi
Yu Xiaoting
Zhang Chaoqi
Zhang Ting
Publication venue: 'Elsevier BV'
Publication date: 05/11/2019
Field of study

Catalysts for oxygen reduction reaction (ORR) are key components in emerging energy technologies such as fuel cells and metal-air batteries. Developing low-cost, high performance and stable electrocatalysts is critical for the extensive implementation of these technologies. Herein, we present a procedure to prepare colloidal chromium phosphide CrP nanocrystals and we test their performance as ORR electrocatalyst. CrP-based catalysts exhibited remarkable activities with a limiting current density of 4.94¿mA¿cm-2 at 0.2¿V, a half-potential of 0.65¿V and an onset potential of 0.8¿V at 1600¿rpm, which are comparable to commercial Pt/C. Advantageously, CrP-based catalysts displayed much higher stabilities and higher tolerances to methanol in alkaline solution. Using density functional theory calculations, we demonstrate CrP to provide a very strong chemisorption of O2 that facilitates its reduction and explains the excellent ORR performance experimentally demonstrated.Postprint (author's final draft

UPCommons. Portal del coneixement obert de la UPC

Chromium phosphide CrP as highly active and stable electrocatalysts for oxygen electroreduction in alkaline media

Author: Arbiol i Cobos Jordi
Cabot Andreu
Du Ruifeng
Liu Junfeng
Llorca Jordi
Meyns Michaela
Wang Ying
Yu Xiaoting
Zhang Chaoqi
Zhang Ting
Publication venue: 'Elsevier BV'
Publication date: 01/01/2019
Field of study

Catalysts for oxygen reduction reaction (ORR) are key components in emerging energy technologies such as fuel cells and metal-air batteries. Developing low-cost, high performance and stable electrocatalysts is critical for the extensive implementation of these technologies. Herein, we present a procedure to prepare colloidal chromium phosphide CrP nanocrystals and we test their performance as ORR electrocatalyst. CrP-based catalysts exhibited remarkable activities with a limiting current density of 4.94 mA cm at 0.2 V, a half-potential of 0.65 V and an onset potential of 0.8 V at 1600 rpm, which are comparable to commercial Pt/C. Advantageously, CrP-based catalysts displayed much higher stabilities and higher tolerances to methanol in alkaline solution. Using density functional theory calculations, we demonstrate CrP to provide a very strong chemisorption of O that facilitates its reduction and explains the excellent ORR performance experimentally demonstrated

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Electronic Publication Information Center

Diposit Digital de Documents de la UAB

Digital.CSIC

SnS2/g-C3N4/graphite nanocomposites as durable lithium-ion battery anode with high pseudocapacitance contribution

Author: Arbiol i Cobos Jordi
Cabot Andreu
Du Ruifeng
Han Xu
Li Junshan
Liu Jun
Llorca Jordi
Wang Xiang.
Xu Xijun
Zhang Chaoqi
Zuo Yong
Publication venue: 'Elsevier BV'
Publication date: 01/01/2020
Field of study

Altres ajuts: the CERCA Programme /Generalitat de Catalunya. Part of the present work has been performed in the framework of Universitat Autònoma de Barcelona Materials Science PhD program.Tin disulfide is a promising anode material for Li-ion batteries (LIB) owing to its high theoretical capacity and the abundance of its composing elements. However, bare SnS suffers from low electrical conductivity and large volume expansion, which results in poor rate performance and cycling stability. Herein, we present a solution-based strategy to grow SnS nanostructures within a matrix of porous g-CN (CN) and high electrical conductivity graphite plates (GPs). We test the resulting nanocomposite as anode in LIBs. First, SnS nanostructures with different geometries are tested, to find out that thin SnS nanoplates (SnS-NPLs) provide the highest performances. Such SnS-NPLs, incorporated into hierarchical SnS/CN/GP nanocomposites, display excellent rate capabilities (536.5 mA h g at 2.0 A g) and an outstanding stability (∼99.7% retention after 400 cycles), which are partially associated with a high pseudocapacitance contribution (88.8% at 1.0 mV s). The excellent electrochemical properties of these nanocomposites are ascribed to the synergy created between the three nanocomposite components: i) thin SnS-NPLs provide a large surface for rapid Li-ion intercalation and a proper geometry to stand volume expansions during lithiation/delithiation cycles; ii) porous CN prevents SnS-NPLs aggregation, habilitates efficient channels for Li-ion diffusion and buffer stresses associated to SnS volume changes; and iii) conductive GPs allow an efficient charge transport

UPCommons. Portal del coneixement obert de la UPC

Diposit Digital de Documents de la UAB

Digital.CSIC

If LLM Is the Wizard, Then Code Is the Wand: A Survey on How Code Empowers Large Language Models to Serve as Intelligent Agents

Author: Cao Xu
Fung Yi R.
Huang Zixuan
Ji Heng
Li Sha
Liu Jiateng
Wang Xingyao
Wang Yiquan
Wu John
Yang Chaoqi
Yang Ke
Zhai Chengxiang
Publication venue
Publication date: 08/01/2024
Field of study

The prominent large language models (LLMs) of today differ from past language models not only in size, but also in the fact that they are trained on a combination of natural language and formal language (code). As a medium between humans and computers, code translates high-level goals into executable steps, featuring standard syntax, logical consistency, abstraction, and modularity. In this survey, we present an overview of the various benefits of integrating code into LLMs' training data. Specifically, beyond enhancing LLMs in code generation, we observe that these unique properties of code help (i) unlock the reasoning ability of LLMs, enabling their applications to a range of more complex natural language tasks; (ii) steer LLMs to produce structured and precise intermediate steps, which can then be connected to external execution ends through function calls; and (iii) take advantage of code compilation and execution environment, which also provides diverse feedback for model improvement. In addition, we trace how these profound capabilities of LLMs, brought by code, have led to their emergence as intelligent agents (IAs) in situations where the ability to understand instructions, decompose goals, plan and execute actions, and refine from feedback are crucial to their success on downstream tasks. Finally, we present several key challenges and future directions of empowering LLMs with code

arXiv.org e-Print Archive