30 research outputs found
The Public Distribution Systems of Foodgrains and Implications for Food Security: A Comparison of the Experiences of India and China
public distribution system, food security, poverty, food subsidy, India, China
Income Inequality in Rural China: Regression-based Decomposition Using Household Data
inequality decomposition, regression, income generating function, China
Turn Waste into Worth: Rectifying Top- Router of MoE
Sparse Mixture of Experts (MoE) models are popular for training large
language models due to their computational efficiency. However, the commonly
used top- routing mechanism suffers from redundancy computation and memory
costs due to the unbalanced routing. Some experts are overflow, where the
exceeding tokens are dropped. While some experts are vacant, which are padded
with zeros, negatively impacting model performance. To address the dropped
tokens and padding, we propose the Rectify-Router, comprising the Intra-GPU
Rectification and the Fill-in Rectification. The Intra-GPU Rectification
handles dropped tokens, efficiently routing them to experts within the GPU
where they are located to avoid inter-GPU communication. The Fill-in
Rectification addresses padding by replacing padding tokens with the tokens
that have high routing scores. Our experimental results demonstrate that the
Intra-GPU Rectification and the Fill-in Rectification effectively handle
dropped tokens and padding, respectively. Furthermore, the combination of them
achieves superior performance, surpassing the accuracy of the vanilla top-1
router by 4.7%
Secrets of RLHF in Large Language Models Part I: PPO
Large language models (LLMs) have formulated a blueprint for the advancement
of artificial general intelligence. Its primary objective is to function as a
human-centric (helpful, honest, and harmless) assistant. Alignment with humans
assumes paramount significance, and reinforcement learning with human feedback
(RLHF) emerges as the pivotal technological paradigm underpinning this pursuit.
Current technical routes usually include \textbf{reward models} to measure
human preferences, \textbf{Proximal Policy Optimization} (PPO) to optimize
policy model outputs, and \textbf{process supervision} to improve step-by-step
reasoning capabilities. However, due to the challenges of reward design,
environment interaction, and agent training, coupled with huge trial and error
cost of large language models, there is a significant barrier for AI
researchers to motivate the development of technical alignment and safe landing
of LLMs. The stable training of RLHF has still been a puzzle. In the first
report, we dissect the framework of RLHF, re-evaluate the inner workings of
PPO, and explore how the parts comprising PPO algorithms impact policy agent
training. We identify policy constraints being the key factor for the effective
implementation of the PPO algorithm. Therefore, we explore the PPO-max, an
advanced version of PPO algorithm, to efficiently improve the training
stability of the policy model. Based on our main results, we perform a
comprehensive analysis of RLHF abilities compared with SFT models and ChatGPT.
The absence of open-source implementations has posed significant challenges to
the investigation of LLMs alignment. Therefore, we are eager to release
technical reports, reward models and PPO code
Expert Judgement on the Effects of the Grain Marketing System on Grain Production in China: A Survey
Expert Judgement on the Effects of the Grain Marketing System on Grain Production in India: A Survey
Achieving food security in China: past three decades and beyond
Purpose – The paper aims to review and assess China's food security practice over the past three decades with a view of drawing implications for further improving its food security in the future. Design/methodology/approach – A normative food security framework is used to assess China's food security achievements and examine any remaining and emerging issues in its pursuit for food security. Findings – China has done well in achieving grain security in the past three decades. However, it cannot be concluded that China has achieved its food security according to the normative food security framework. This is because there are serious problems in the aspects of food safety and quality, environmental sustainability, and social stability. To achieve long-term food security, China has to tackle the wide spread issues of unsafe foods and foods of dubious quality, environmental pollution and degradation, and the establishment of a social security system. Originality/value – Examining China's food security practice over the past three decades can generate experiences and lessons valuable not only for China, but also for other developing countries in their efforts to achieving national food security. Issues are identified to which the Chinese government needs to pay attention in order to improve China's food security in the future.China, Contamination, Environmental health and safety, Food products, Food safety, Social welfare
China's feedgrain demand in global perspective
China's feedgrain use has increased remarkably in the past two decades. Its demand for feedgrain is expected to further grow, and by 2010, China's demand for feedgrain is expected to exceed that of foodgrain. This paper places\ud
China's feedgrain demand in the global perspective and discusses the likely impact of China's rising demand for feedgrains on the world grain market and in turn on China's own domestic grain market and livestock industries. The paper concludes with recommendations on policy options available for China to deal with its fast-growing demand for feedgrains