Search CORE

8 research outputs found

FaaSwap: SLO-Aware, GPU-Efficient Serverless Inference via Model Swapping

Author: Chen Dong
Chen Ruichuan
Li Zhuohao
Luo Xiaonan
Nie Dapeng
Wang Ao
Wang Wei
Yang Haoran
Yu Haoxuan
Yu Minchen
Publication venue
Publication date: 06/06/2023
Field of study

The dynamic request patterns of machine learning (ML) inference workloads have driven an increasing trend towards exploiting serverless computing for scalable ML model serving. However, today's serverless platforms lack efficient support for GPUs -- provisioning functions on GPUs incurs extremely high overhead, forcing them to keep long-running even when idling for reduced cold starts. This leads to significant resource waste to perform ML inference and hinders the pay-per-use billing for GPUs. In this paper, we present FaaSwap, a serverless platform enabling fine-grained, request-level GPU sharing for resource-efficient ML inference. FaaSwap leverages model swapping to support fast inference execution at low resource cost. It keeps models in a host which has a large amount of cheap memory and quickly swaps models to GPUs when requested, reducing per-function keep-alive cost and enabling efficient GPU sharing across much more functions. FaaSwap also supports swapping models between GPUs for load balancing and improved inference performance. In FaaSwap, we design sophisticated request scheduling and memory management algorithms that efficiently exploit model swapping to reduce GPU cost and meet latency service-level objectives (SLOs) for all inference functions. We have implemented and integrated FaaSwap into Alibaba Cloud Function Compute (FC), one of the world's largest commercial serverless platform. Evaluation results show that FaaSwap can achieve low-latency model swapping, efficiently share a GPU across hundreds of functions, and satisfy per-function latency SLOs at scale

arXiv.org e-Print Archive

PandaLM: An Automatic Evaluation Benchmark for LLM Instruction Tuning Optimization

Author: Chen Hao
Jiang Chaoya
Wang Cunxiang
Wang Jindong
Wang Yidong
Xie Rui
Xie Xing
Yang Linyi
Ye Wei
Yu Zhuohao
Zeng Zhengran
Zhang Shikun
Zhang Yue
Publication venue
Publication date: 08/06/2023
Field of study

Instruction tuning large language models (LLMs) remains a challenging task, owing to the complexity of hyperparameter selection and the difficulty involved in evaluating the tuned models. To determine the optimal hyperparameters, an automatic, robust, and reliable evaluation benchmark is essential. However, establishing such a benchmark is not a trivial task due to the challenges associated with evaluation accuracy and privacy protection. In response to these challenges, we introduce a judge large language model, named PandaLM, which is trained to distinguish the superior model given several LLMs. PandaLM's focus extends beyond just the objective correctness of responses, which is the main focus of traditional evaluation datasets. It addresses vital subjective factors such as relative conciseness, clarity, adherence to instructions, comprehensiveness, and formality. To ensure the reliability of PandaLM, we collect a diverse human-annotated test dataset, where all contexts are generated by humans and labels are aligned with human preferences. Our results indicate that PandaLM-7B achieves 93.75% of GPT-3.5's evaluation ability and 88.28% of GPT-4's in terms of F1-score on our test dataset. PandaLM enables the evaluation of LLM to be fairer but with less cost, evidenced by significant improvements achieved by models tuned through PandaLM compared to their counterparts trained with default Alpaca's hyperparameters. In addition, PandaLM does not depend on API-based evaluations, thus avoiding potential data leakage. All resources of PandaLM are released at https://github.com/WeOpenML/PandaLM

arXiv.org e-Print Archive

TextBox 2.0: A Text Generation Library with Pre-trained Language Models

Author: Chen Zhipeng
Cheng Xiaoxue
Dai Wenxun
Dong Zican
Hu Yiwen
Li Junyi
Nie Jian-Yun
Tang Tianyi
Wang Yuhao
Wen Ji-Rong
Yu Zhuohao
Zhao Wayne Xin
Publication venue
Publication date: 25/12/2022
Field of study

To facilitate research on text generation, this paper presents a comprehensive and unified library, TextBox 2.0, focusing on the use of pre-trained language models (PLMs). To be comprehensive, our library covers

13

common text generation tasks and their corresponding

83

datasets and further incorporates

45

PLMs covering general, translation, Chinese, dialogue, controllable, distilled, prompting, and lightweight PLMs. We also implement

4

efficient training strategies and provide

4

generation objectives for pre-training new PLMs from scratch. To be unified, we design the interfaces to support the entire research pipeline (from data loading to training and evaluation), ensuring that each step can be fulfilled in a unified way. Despite the rich functionality, it is easy to use our library, either through the friendly Python API or command line. To validate the effectiveness of our library, we conduct extensive experiments and exemplify four types of research scenarios. The project is released at the link: https://github.com/RUCAIBox/TextBox.Comment: Accepted by EMNLP 202

arXiv.org e-Print Archive