Search CORE

277 research outputs found

Delay-Energy lower bound on Two-Way Relay Wireless Network Coding

Author: Chen Wei
Zeng Hongyi
Publication venue
Publication date: 24/01/2014
Field of study

Network coding is a novel solution that significantly improve the throughput and energy consumed of wireless networks by mixing traffic flows through algebraic operations. In conventional network coding scheme, a packet has to wait for packets from other sources to be coded before transmitting. The wait-and-code scheme will naturally result in packet loss rate in a finite buffer. We will propose Enhanced Network Coding (ENC), an extension to ONC in continuous time domain. In ENC, the relay transmits both coded and uncoded packets to reduce delay. In exchange, more energy is consumed in transmitting uncoded packets. ENC is a practical algorithm to achieve minimal average delay and zero packet-loss rate under given energy constraint. The system model for ENC on a general renewal process queuing is presented. In particular, we will show that there exists a fundamental trade-off between average delay and energy. We will also present the analytic result of lower bound for this trade-off curve, which can be achieved by ENC

arXiv.org e-Print Archive

CiteSeerX

How well do Large Language Models perform in Arithmetic tasks?

Author: Huang Songfang
Tan Chuanqi
Wang Wei
Yuan Hongyi
Yuan Zheng
Publication venue
Publication date: 16/03/2023
Field of study

Large language models have emerged abilities including chain-of-thought to answer math word problems step by step. Solving math word problems not only requires abilities to disassemble problems via chain-of-thought but also needs to calculate arithmetic expressions correctly for each step. To the best of our knowledge, there is no work to focus on evaluating the arithmetic ability of large language models. In this work, we propose an arithmetic dataset MATH 401 to test the latest large language models including GPT-4, ChatGPT, InstrctGPT, Galactica, and LLaMA with various arithmetic expressions and provide a detailed analysis of the ability of large language models. MATH 401 and evaluation codes are released at \url{https://github.com/GanjinZero/math401-llm}

arXiv.org e-Print Archive

Human-Instruction-Free LLM Self-Alignment with Limited Samples

Author: Guo Hongyi
Liu Yang
Shen Wei
Wang Zhaoran
Wei Jiaheng
Yao Yuanshun
Zhang Xiaoying
Publication venue
Publication date: 06/01/2024
Field of study

Aligning large language models (LLMs) with human values is a vital task for LLM practitioners. Current alignment techniques have several limitations: (1) requiring a large amount of annotated data; (2) demanding heavy human involvement; (3) lacking a systematic mechanism to continuously improve. In this work, we study aligning LLMs to a new domain with limited samples (e.g. < 100). We propose an algorithm that can self-align LLMs iteratively without active human involvement. Unlike existing works, our algorithm relies on neither human-crafted instructions nor labeled rewards, significantly reducing human involvement. In addition, our algorithm can self-improve the alignment continuously. The key idea is to first retrieve high-quality samples related to the target domain and use them as In-context Learning examples to generate more samples. Then we use the self-generated samples to finetune the LLM iteratively. We show that our method can unlock the LLMs' self-generalization ability to perform alignment with near-zero human supervision. We test our algorithm on three benchmarks in safety, truthfulness, and instruction-following, and show good performance in alignment, domain adaptability, and scalability

arXiv.org e-Print Archive

Improving Reinforcement Learning from Human Feedback Using Contrastive Rewards

Author: Guo Hongyi
Liu Yang
Shen Wei
Yao Yuanshun
Zhang Xiaoying
Zheng Rui
Publication venue
Publication date: 13/03/2024
Field of study

Reinforcement learning from human feedback (RLHF) is the mainstream paradigm used to align large language models (LLMs) with human preferences. Yet existing RLHF heavily relies on accurate and informative reward models, which are vulnerable and sensitive to noise from various sources, e.g. human labeling errors, making the pipeline fragile. In this work, we improve the effectiveness of the reward model by introducing a penalty term on the reward, named as \textit{contrastive rewards}. %Contrastive rewards Our approach involves two steps: (1) an offline sampling step to obtain responses to prompts that serve as baseline calculation and (2) a contrastive reward calculated using the baseline responses and used in the Proximal Policy Optimization (PPO) step. We show that contrastive rewards enable the LLM to penalize reward uncertainty, improve robustness, encourage improvement over baselines, calibrate according to task difficulty, and reduce variance in PPO. We show empirically contrastive rewards can improve RLHF substantially, evaluated by both GPTs and humans, and our method consistently outperforms strong baselines

arXiv.org e-Print Archive

RRHF: Rank Responses to Align Language Models with Human Feedback without tears

Author: Huang Fei
Huang Songfang
Tan Chuanqi
Wang Wei
Yuan Hongyi
Yuan Zheng
Publication venue
Publication date: 11/04/2023
Field of study

Reinforcement Learning from Human Feedback (RLHF) facilitates the alignment of large language models with human preferences, significantly enhancing the quality of interactions between humans and these models. InstructGPT implements RLHF through several stages, including Supervised Fine-Tuning (SFT), reward model training, and Proximal Policy Optimization (PPO). PPO, however, is sensitive to hyperparameters and requires a minimum of four models in its standard implementation, which makes it hard to train. In contrast, we propose a novel learning paradigm called RRHF, which scores responses generated by different sampling policies and learns to align them with human preferences through ranking loss. RRHF can efficiently align language model output probabilities with human preferences as robust as fine-tuning and it only needs 1 to 2 models during tuning. In addition, RRHF can be considered an extension of SFT and reward models while being simpler than PPO in terms of coding, model counts, and hyperparameters. The entire alignment process can be accomplished within a single RRHF training session. We evaluate RRHF using LLaMA and Alpaca on Helpful and Harmless data, demonstrating performance comparable to PPO.Comment: Codes available at https://github.com/GanjinZero/RRH

arXiv.org e-Print Archive

Competitiveness of the Hong Kong economy

Author: Li Hongyi
Wei Xiangdong
Xie Danyang
Publication venue
Publication date: 30/01/2007
Field of study

Our assessment of the competitiveness of the Hong Kong economy from various perspectives indicates that the overall competitiveness of Hong Kong economy has been improving during the past several years. However, from a longer term historical perspective, there are still a number of areas in which Hong Kong’s competitiveness has been eroded relative to her main competitors in East Asia, especially in export sector. On the aggregate level, although Hong Kong’s Total Factor Productivity (TFP) growth rate is amongst the best performers in East Asia in the recent years, it has been adversely affected by the continuing relocation of Hong Kong’s manufacturing production to the Mainland China. On sectorial levels, although Hong Kong’s unit labour costs started to decline since 2000 comparing to her Asian competitors, the unit labour cost-based real effective exchange rate continues to appreciate against her major trade partners. Furthermore, Hong Kong’s competitiveness deteriorated in several important categories of goods and service exports. Overall, our study shows that the Hong Kong economy still maintains its resilience to outside shocks, nevertheless it needs to explore new areas to fuel its future growth

Munich RePEc Personal Archive

M2ORT: Many-To-One Regression Transformer for Spatial Transcriptomics Prediction from Histopathology Images

Author: Chen Yen-Wei
Du Xiuju
Lin Lanfen
Liu Jing
Ouyang Shuyi
Wang Hongyi
Publication venue
Publication date: 24/01/2024
Field of study

The advancement of Spatial Transcriptomics (ST) has facilitated the spatially-aware profiling of gene expressions based on histopathology images. Although ST data offers valuable insights into the micro-environment of tumors, its acquisition cost remains expensive. Therefore, directly predicting the ST expressions from digital pathology images is desired. Current methods usually adopt existing regression backbones for this task, which ignore the inherent multi-scale hierarchical data structure of digital pathology images. To address this limit, we propose M2ORT, a many-to-one regression Transformer that can accommodate the hierarchical structure of the pathology images through a decoupled multi-scale feature extractor. Different from traditional models that are trained with one-to-one image-label pairs, M2ORT accepts multiple pathology images of different magnifications at a time to jointly predict the gene expressions at their corresponding common ST spot, aiming at learning a many-to-one relationship through training. We have tested M2ORT on three public ST datasets and the experimental results show that M2ORT can achieve state-of-the-art performance with fewer parameters and floating-point operations (FLOPs). The code is available at: https://github.com/Dootmaan/M2ORT/

arXiv.org e-Print Archive

The $s^\pm$ -Wave Superconductivity in the Pressurized La $_4$ Ni $_3$ O $_{10}$

Author: Chen Wei-Qiang
Liu Qihang
Liu Yu-Bo
Sun Hongyi
Yang Fan
Zhang Ming
Publication venue
Publication date: 07/03/2024
Field of study

Recently, evidence of superconductivity (SC) has been reported in pressurized La

_4

_3

_{10}

. Here we study the possible pairing mechanism and pairing symmetry in this material. Through fitting the density-functional-theory band structure, we provide a six-orbital tight-binding model. In comparison with the band structure of La

_3

_2

_7

, the additional non-bonding

d_{z^2}

band is importance to the pairing mechanism here. When the multi-orbital Hubbard interactions are included, our random-phase-approximation based study yields an

s^{\pm}

-wave pairing. The dominant FS nesting with nesting vector

\mathbf{Q}_1\approx (\pi,\pi)

is between the

\gamma

-pocket contributed by the bonding

d_{z^2}

band top and the

\alpha_1

-pocket contributed by the non-bonding

d_{z^2}

band bottom, leading to the strongest pairing gap amplitude and opposite gap signs within the two regimes. The dominant real-space pairing is the interlayer pairing between the

d_{z^2}

orbitals. We have also studied the doping dependence of the pairing symmetry and

T_c

.Comment: 5 pages, 5 figure

arXiv.org e-Print Archive