Search CORE

11 research outputs found

EfficientZero V2: Mastering Discrete and Continuous Control with Limited Data

Author: Gao Yang
Liu Shaohuai
Wang Shengjie
Ye Weirui
You Jiacheng
Publication venue
Publication date: 01/03/2024
Field of study

Sample efficiency remains a crucial challenge in applying Reinforcement Learning (RL) to real-world tasks. While recent algorithms have made significant strides in improving sample efficiency, none have achieved consistently superior performance across diverse domains. In this paper, we introduce EfficientZero V2, a general framework designed for sample-efficient RL algorithms. We have expanded the performance of EfficientZero to multiple domains, encompassing both continuous and discrete actions, as well as visual and low-dimensional inputs. With a series of improvements we propose, EfficientZero V2 outperforms the current state-of-the-art (SOTA) by a significant margin in diverse tasks under the limited data setting. EfficientZero V2 exhibits a notable advancement over the prevailing general algorithm, DreamerV3, achieving superior outcomes in 50 of 66 evaluated tasks across diverse benchmarks, such as Atari 100k, Proprio Control, and Vision Control.Comment: 21 pages,10 figure

arXiv.org e-Print Archive

A Generic Multi-Player Transformation Algorithm for Solving Large-Scale Zero-Sum Extensive-Form Adversarial Team Games

Author: Huang Weixin
Liu Botao
Qiu Chen
Shi Shaohuai
Wang Xuan
Wu Yulin
Publication venue
Publication date: 03/07/2023
Field of study

Many recent practical and theoretical breakthroughs focus on adversarial team multi-player games (ATMGs) in ex ante correlation scenarios. In this setting, team members are allowed to coordinate their strategies only before the game starts. Although there existing algorithms for solving extensive-form ATMGs, the size of the game tree generated by the previous algorithms grows exponentially with the number of players. Therefore, how to deal with large-scale zero-sum extensive-form ATMGs problems close to the real world is still a significant challenge. In this paper, we propose a generic multi-player transformation algorithm, which can transform any multi-player game tree satisfying the definition of AMTGs into a 2-player game tree, such that finding a team-maxmin equilibrium with correlation (TMECor) in large-scale ATMGs can be transformed into solving NE in 2-player games. To achieve this goal, we first introduce a new structure named private information pre-branch, which consists of a temporary chance node and coordinator nodes and aims to make decisions for all potential private information on behalf of the team members. We also show theoretically that NE in the transformed 2-player game is equivalent TMECor in the original multi-player game. This work significantly reduces the growth of action space and nodes from exponential to constant level. This enables our work to outperform all the previous state-of-the-art algorithms in finding a TMECor, with 182.89, 168.47, 694.44, and 233.98 significant improvements in the different Kuhn Poker and Leduc Poker cases (21K3, 21K4, 21K6 and 21L33). In addition, this work first practically solves the ATMGs in a 5-player case which cannot be conducted by existing algorithms.Comment: 9 pages, 5 figures, NIPS 202

arXiv.org e-Print Archive

Real-time scheduling of renewable power systems through planning-based reinforcement learning

Author: Di Fangchun
Gao Yang
Jiang Qirong
Kang Chongqing
Liu Jinbo
Liu Shaohuai
Song Xuri
Yang Nan
Ye Weirui
Zhang Guanglun
Zhong Haiwang
Publication venue
Publication date: 13/03/2023
Field of study

The growing renewable energy sources have posed significant challenges to traditional power scheduling. It is difficult for operators to obtain accurate day-ahead forecasts of renewable generation, thereby requiring the future scheduling system to make real-time scheduling decisions aligning with ultra-short-term forecasts. Restricted by the computation speed, traditional optimization-based methods can not solve this problem. Recent developments in reinforcement learning (RL) have demonstrated the potential to solve this challenge. However, the existing RL methods are inadequate in terms of constraint complexity, algorithm performance, and environment fidelity. We are the first to propose a systematic solution based on the state-of-the-art reinforcement learning algorithm and the real power grid environment. The proposed approach enables planning and finer time resolution adjustments of power generators, including unit commitment and economic dispatch, thus increasing the grid's ability to admit more renewable energy. The well-trained scheduling agent significantly reduces renewable curtailment and load shedding, which are issues arising from traditional scheduling's reliance on inaccurate day-ahead forecasts. High-frequency control decisions exploit the existing units' flexibility, reducing the power grid's dependence on hardware transformations and saving investment and operating costs, as demonstrated in experimental results. This research exhibits the potential of reinforcement learning in promoting low-carbon and intelligent power systems and represents a solid step toward sustainable electricity generation.Comment: 12 pages, 7 figure

arXiv.org e-Print Archive

Dissecting the Runtime Performance of the Training, Fine-tuning, and Inference of Large Language Models

Author: Chu Xiaowen
Dong Peijie
Fan Ruibo
Guo Rui
Li Zeyu
Liu Xiang
Luo Qiong
Pan Xinglin
Shi Shaohuai
Wang Xin
Zhang Longteng
Publication venue
Publication date: 01/12/2023
Field of study

Large Language Models (LLMs) have seen great advance in both academia and industry, and their popularity results in numerous open-source frameworks and techniques in accelerating LLM pre-training, fine-tuning, and inference. Training and deploying LLMs are expensive as it requires considerable computing resources and memory, hence many efficient approaches have been developed for improving system pipelines as well as operators. However, the runtime performance can vary significantly across hardware and software stacks, which makes it difficult to choose the best configuration. In this work, we aim to benchmark the performance from both macro and micro perspectives. First, we benchmark the end-to-end performance of pre-training, fine-tuning, and serving LLMs in different sizes , i.e., 7, 13, and 70 billion parameters (7B, 13B, and 70B) on three 8-GPU platforms with and without individual optimization techniques, including ZeRO, quantization, recomputation, FlashAttention. Then, we dive deeper to provide a detailed runtime analysis of the sub-modules, including computing and communication operators in LLMs. For end users, our benchmark and findings help better understand different optimization techniques, training and inference frameworks, together with hardware platforms in choosing configurations for deploying LLMs. For researchers, our in-depth module-wise analyses discover potential opportunities for future work to further optimize the runtime performance of LLMs

arXiv.org e-Print Archive

Automated Model Design and Benchmarking of Deep Learning Models for COVID-19 Detection with Chest CT Scans

Author: Chu Xiaowen
Ding Guiguang
He Xin
Liu Xin
Shi Shaohuai
Tang Jiangping
Wang Shihao
Yan Chenggang
Zhang Jiyong
Publication venue: Association for the Advancement of Artificial Intelligence
Publication date: 18/05/2021
Field of study

The COVID-19 pandemic has spread globally for several months. Because its transmissibility and high pathogenicity seriously threaten people's lives, it is crucial to accurately and quickly detect COVID-19 infection. Many recent studies have shown that deep learning (DL) based solutions can help detect COVID-19 based on chest CT scans. However, most existing work focuses on 2D datasets, which may result in low quality models as the real CT scans are 3D images. Besides, the reported results span a broad spectrum on different datasets with a relatively unfair comparison. In this paper, we first use three state-of-the-art 3D models (ResNet3D101, DenseNet3D121, and MC3\_18) to establish the baseline performance on three publicly available chest CT scan datasets. Then we propose a differentiable neural architecture search (DNAS) framework to automatically search the 3D DL models for 3D chest CT scans classification and use the Gumbel Softmax technique to improve the search efficiency. We further exploit the Class Activation Mapping (CAM) technique on our models to provide the interpretability of the results. The experimental results show that our searched models (CovidNet3D) outperform the baseline human-designed models on three datasets with tens of times smaller model size and higher accuracy. Furthermore, the results also verify that CAM can be well applied in CovidNet3D for COVID-19 datasets to provide interpretability for medical diagnosis. Code: https://github.com/HKBU-HPML/CovidNet3D

Association for the Advancement of Artificial Intelligence: AAAI Publications

Nebula-I: A General Framework for Collaboratively Training Deep Learning Models on Low-Bandwidth Cloud Clusters

Author: Ding Siyu
Gong Weibao
Han Yaqian
Hou Yongshuai
Li Ge
Li Long
Liu Peng
Liu Yuang
Ma Yanjun
Mo Xianjie
Shi Shaohuai
Sun Yu
Wang Bin
Wang Shuohuan
Wu Zhihua
Xiang Yang
Yu Dianhai
Yu Yue
Publication venue
Publication date: 19/05/2022
Field of study

The ever-growing model size and scale of compute have attracted increasing interests in training deep learning models over multiple nodes. However, when it comes to training on cloud clusters, especially across remote clusters, huge challenges are faced. In this work, we introduce a general framework, Nebula-I, for collaboratively training deep learning models over remote heterogeneous clusters, the connections between which are low-bandwidth wide area networks (WANs). We took natural language processing (NLP) as an example to show how Nebula-I works in different training phases that include: a) pre-training a multilingual language model using two remote clusters; and b) fine-tuning a machine translation model using knowledge distilled from pre-trained models, which run through the most popular paradigm of recent deep learning. To balance the accuracy and communication efficiency, in Nebula-I, parameter-efficient training strategies, hybrid parallel computing methods and adaptive communication acceleration techniques are jointly applied. Meanwhile, security strategies are employed to guarantee the safety, reliability and privacy in intra-cluster computation and inter-cluster communication. Nebula-I is implemented with the PaddlePaddle deep learning framework, which can support collaborative training over heterogeneous hardware, e.g. GPU and NPU. Experiments demonstrate that the proposed framework could substantially maximize the training efficiency while preserving satisfactory NLP performance. By using Nebula-I, users can run large-scale training tasks over cloud clusters with minimum developments, and the utility of existed large pre-trained models could be further promoted. We also introduced new state-of-the-art results on cross-lingual natural language inference tasks, which are generated based upon a novel learning framework and Nebula-I.Comment: 20 pages, 10 figures, technical repor

arXiv.org e-Print Archive

Research of Downhole Instructions Decoding with Variable Drilling Fluid Displacement

Author: Aiqing Huo
Engelder
Huo Aiqing
Huo Aiqing
Li Fengfei
Li Qi
Liu Xiushan
Moriarty K
Tang Nan
Treviranus Joachim
Wang Yuelong
Yuyan Yang
Zhang Shaohuai
Zhao Qi
Zhou Jing
Publication venue: 'IOP Publishing'
Publication date
Field of study

Crossref

Differential proteomics analysis to identify proteins and pathways associated with male sterility of soybean using iTRAQ-based strategy

Author: Balk
Bradford
Carlsson
Chinnery
de Dios
Ding
Ding
Esaka
Fan
Faurobert
Fischer
Fothergill-Gilmore
Fotopoulos
Ge
Ghosh
Giegé
Han
Hanson
Hao Zhang
Hatsugai
Heng
Hu
Iwabuchi
Jiajia Li
Jiang
Junyi Gai
Kanehisa
Kato
Kaul
Kurepa
Li
Li
Linke
Liu
Liu
Liu
Logan
Longshu Yang
Lorrain
MacLean
Martens
Merida
Mo
Ohkawa
Pan
Perkins
Pignocchi
Reichert
Rhoads
Schnable
Shaohuai Han
Shen
Sheoran
Sheoran
Shouping Yang
Simoes
Solomon
Sun
Sun
Suzuki
Tatusov
Tingting He
Touzet
van der Meer
Wang
Wang
Wang
Wen
Wesołowski
Wu
Xianlong Ding
Yan
Yang
Yang
Yang
Yang
Yuste
Zeng
Zeng
Zheng
Zheng
Zieske
Publication venue: 'Elsevier BV'
Publication date
Field of study

Crossref