277 research outputs found
Delay-Energy lower bound on Two-Way Relay Wireless Network Coding
Network coding is a novel solution that significantly improve the throughput
and energy consumed of wireless networks by mixing traffic flows through
algebraic operations. In conventional network coding scheme, a packet has to
wait for packets from other sources to be coded before transmitting. The
wait-and-code scheme will naturally result in packet loss rate in a finite
buffer. We will propose Enhanced Network Coding (ENC), an extension to ONC in
continuous time domain.
In ENC, the relay transmits both coded and uncoded packets to reduce delay.
In exchange, more energy is consumed in transmitting uncoded packets. ENC is a
practical algorithm to achieve minimal average delay and zero packet-loss rate
under given energy constraint. The system model for ENC on a general renewal
process queuing is presented. In particular, we will show that there exists a
fundamental trade-off between average delay and energy. We will also present
the analytic result of lower bound for this trade-off curve, which can be
achieved by ENC
How well do Large Language Models perform in Arithmetic tasks?
Large language models have emerged abilities including chain-of-thought to
answer math word problems step by step. Solving math word problems not only
requires abilities to disassemble problems via chain-of-thought but also needs
to calculate arithmetic expressions correctly for each step. To the best of our
knowledge, there is no work to focus on evaluating the arithmetic ability of
large language models. In this work, we propose an arithmetic dataset MATH 401
to test the latest large language models including GPT-4, ChatGPT, InstrctGPT,
Galactica, and LLaMA with various arithmetic expressions and provide a detailed
analysis of the ability of large language models. MATH 401 and evaluation codes
are released at \url{https://github.com/GanjinZero/math401-llm}
Human-Instruction-Free LLM Self-Alignment with Limited Samples
Aligning large language models (LLMs) with human values is a vital task for
LLM practitioners. Current alignment techniques have several limitations: (1)
requiring a large amount of annotated data; (2) demanding heavy human
involvement; (3) lacking a systematic mechanism to continuously improve. In
this work, we study aligning LLMs to a new domain with limited samples (e.g. <
100). We propose an algorithm that can self-align LLMs iteratively without
active human involvement. Unlike existing works, our algorithm relies on
neither human-crafted instructions nor labeled rewards, significantly reducing
human involvement. In addition, our algorithm can self-improve the alignment
continuously. The key idea is to first retrieve high-quality samples related to
the target domain and use them as In-context Learning examples to generate more
samples. Then we use the self-generated samples to finetune the LLM
iteratively. We show that our method can unlock the LLMs' self-generalization
ability to perform alignment with near-zero human supervision. We test our
algorithm on three benchmarks in safety, truthfulness, and
instruction-following, and show good performance in alignment, domain
adaptability, and scalability
Improving Reinforcement Learning from Human Feedback Using Contrastive Rewards
Reinforcement learning from human feedback (RLHF) is the mainstream paradigm
used to align large language models (LLMs) with human preferences. Yet existing
RLHF heavily relies on accurate and informative reward models, which are
vulnerable and sensitive to noise from various sources, e.g. human labeling
errors, making the pipeline fragile. In this work, we improve the effectiveness
of the reward model by introducing a penalty term on the reward, named as
\textit{contrastive rewards}. %Contrastive rewards Our approach involves two
steps: (1) an offline sampling step to obtain responses to prompts that serve
as baseline calculation and (2) a contrastive reward calculated using the
baseline responses and used in the Proximal Policy Optimization (PPO) step. We
show that contrastive rewards enable the LLM to penalize reward uncertainty,
improve robustness, encourage improvement over baselines, calibrate according
to task difficulty, and reduce variance in PPO. We show empirically contrastive
rewards can improve RLHF substantially, evaluated by both GPTs and humans, and
our method consistently outperforms strong baselines
RRHF: Rank Responses to Align Language Models with Human Feedback without tears
Reinforcement Learning from Human Feedback (RLHF) facilitates the alignment
of large language models with human preferences, significantly enhancing the
quality of interactions between humans and these models. InstructGPT implements
RLHF through several stages, including Supervised Fine-Tuning (SFT), reward
model training, and Proximal Policy Optimization (PPO). PPO, however, is
sensitive to hyperparameters and requires a minimum of four models in its
standard implementation, which makes it hard to train. In contrast, we propose
a novel learning paradigm called RRHF, which scores responses generated by
different sampling policies and learns to align them with human preferences
through ranking loss. RRHF can efficiently align language model output
probabilities with human preferences as robust as fine-tuning and it only needs
1 to 2 models during tuning. In addition, RRHF can be considered an extension
of SFT and reward models while being simpler than PPO in terms of coding, model
counts, and hyperparameters. The entire alignment process can be accomplished
within a single RRHF training session. We evaluate RRHF using LLaMA and Alpaca
on Helpful and Harmless data, demonstrating performance comparable to PPO.Comment: Codes available at https://github.com/GanjinZero/RRH
Competitiveness of the Hong Kong economy
Our assessment of the competitiveness of the Hong Kong economy from various perspectives indicates that the overall competitiveness of Hong Kong economy has been improving during the past several years. However, from a longer term historical perspective, there are still a number of areas in which Hong Kong’s competitiveness
has been eroded relative to her main competitors in East Asia, especially in export sector. On the aggregate level, although Hong Kong’s Total Factor Productivity (TFP)
growth rate is amongst the best performers in East Asia in the recent years, it has been adversely affected by the continuing relocation of Hong Kong’s manufacturing
production to the Mainland China. On sectorial levels, although Hong Kong’s unit labour costs started to decline since 2000 comparing to her Asian competitors, the unit
labour cost-based real effective exchange rate continues to appreciate against her major trade partners. Furthermore, Hong Kong’s competitiveness deteriorated in several important categories of goods and service exports. Overall, our study shows that the Hong Kong economy still maintains its resilience to outside shocks, nevertheless it needs to explore new areas to fuel its future growth
M2ORT: Many-To-One Regression Transformer for Spatial Transcriptomics Prediction from Histopathology Images
The advancement of Spatial Transcriptomics (ST) has facilitated the
spatially-aware profiling of gene expressions based on histopathology images.
Although ST data offers valuable insights into the micro-environment of tumors,
its acquisition cost remains expensive. Therefore, directly predicting the ST
expressions from digital pathology images is desired. Current methods usually
adopt existing regression backbones for this task, which ignore the inherent
multi-scale hierarchical data structure of digital pathology images. To address
this limit, we propose M2ORT, a many-to-one regression Transformer that can
accommodate the hierarchical structure of the pathology images through a
decoupled multi-scale feature extractor. Different from traditional models that
are trained with one-to-one image-label pairs, M2ORT accepts multiple pathology
images of different magnifications at a time to jointly predict the gene
expressions at their corresponding common ST spot, aiming at learning a
many-to-one relationship through training. We have tested M2ORT on three public
ST datasets and the experimental results show that M2ORT can achieve
state-of-the-art performance with fewer parameters and floating-point
operations (FLOPs). The code is available at:
https://github.com/Dootmaan/M2ORT/
The -Wave Superconductivity in the Pressurized LaNiO
Recently, evidence of superconductivity (SC) has been reported in pressurized
LaNiO. Here we study the possible pairing mechanism and pairing
symmetry in this material. Through fitting the density-functional-theory band
structure, we provide a six-orbital tight-binding model. In comparison with the
band structure of LaNiO, the additional non-bonding band
is importance to the pairing mechanism here. When the multi-orbital Hubbard
interactions are included, our random-phase-approximation based study yields an
-wave pairing. The dominant FS nesting with nesting vector
is between the -pocket contributed by
the bonding band top and the -pocket contributed by the
non-bonding band bottom, leading to the strongest pairing gap
amplitude and opposite gap signs within the two regimes. The dominant
real-space pairing is the interlayer pairing between the orbitals. We
have also studied the doping dependence of the pairing symmetry and .Comment: 5 pages, 5 figure
- …