8 research outputs found
FaaSwap: SLO-Aware, GPU-Efficient Serverless Inference via Model Swapping
The dynamic request patterns of machine learning (ML) inference workloads
have driven an increasing trend towards exploiting serverless computing for
scalable ML model serving. However, today's serverless platforms lack efficient
support for GPUs -- provisioning functions on GPUs incurs extremely high
overhead, forcing them to keep long-running even when idling for reduced cold
starts. This leads to significant resource waste to perform ML inference and
hinders the pay-per-use billing for GPUs.
In this paper, we present FaaSwap, a serverless platform enabling
fine-grained, request-level GPU sharing for resource-efficient ML inference.
FaaSwap leverages model swapping to support fast inference execution at low
resource cost. It keeps models in a host which has a large amount of cheap
memory and quickly swaps models to GPUs when requested, reducing per-function
keep-alive cost and enabling efficient GPU sharing across much more functions.
FaaSwap also supports swapping models between GPUs for load balancing and
improved inference performance. In FaaSwap, we design sophisticated request
scheduling and memory management algorithms that efficiently exploit model
swapping to reduce GPU cost and meet latency service-level objectives (SLOs)
for all inference functions. We have implemented and integrated FaaSwap into
Alibaba Cloud Function Compute (FC), one of the world's largest commercial
serverless platform. Evaluation results show that FaaSwap can achieve
low-latency model swapping, efficiently share a GPU across hundreds of
functions, and satisfy per-function latency SLOs at scale
PandaLM: An Automatic Evaluation Benchmark for LLM Instruction Tuning Optimization
Instruction tuning large language models (LLMs) remains a challenging task,
owing to the complexity of hyperparameter selection and the difficulty involved
in evaluating the tuned models. To determine the optimal hyperparameters, an
automatic, robust, and reliable evaluation benchmark is essential. However,
establishing such a benchmark is not a trivial task due to the challenges
associated with evaluation accuracy and privacy protection. In response to
these challenges, we introduce a judge large language model, named PandaLM,
which is trained to distinguish the superior model given several LLMs.
PandaLM's focus extends beyond just the objective correctness of responses,
which is the main focus of traditional evaluation datasets. It addresses vital
subjective factors such as relative conciseness, clarity, adherence to
instructions, comprehensiveness, and formality. To ensure the reliability of
PandaLM, we collect a diverse human-annotated test dataset, where all contexts
are generated by humans and labels are aligned with human preferences. Our
results indicate that PandaLM-7B achieves 93.75% of GPT-3.5's evaluation
ability and 88.28% of GPT-4's in terms of F1-score on our test dataset. PandaLM
enables the evaluation of LLM to be fairer but with less cost, evidenced by
significant improvements achieved by models tuned through PandaLM compared to
their counterparts trained with default Alpaca's hyperparameters. In addition,
PandaLM does not depend on API-based evaluations, thus avoiding potential data
leakage. All resources of PandaLM are released at
https://github.com/WeOpenML/PandaLM
TextBox 2.0: A Text Generation Library with Pre-trained Language Models
To facilitate research on text generation, this paper presents a
comprehensive and unified library, TextBox 2.0, focusing on the use of
pre-trained language models (PLMs). To be comprehensive, our library covers
common text generation tasks and their corresponding datasets and
further incorporates PLMs covering general, translation, Chinese,
dialogue, controllable, distilled, prompting, and lightweight PLMs. We also
implement efficient training strategies and provide generation
objectives for pre-training new PLMs from scratch. To be unified, we design the
interfaces to support the entire research pipeline (from data loading to
training and evaluation), ensuring that each step can be fulfilled in a unified
way. Despite the rich functionality, it is easy to use our library, either
through the friendly Python API or command line. To validate the effectiveness
of our library, we conduct extensive experiments and exemplify four types of
research scenarios. The project is released at the link:
https://github.com/RUCAIBox/TextBox.Comment: Accepted by EMNLP 202
Materials development and potential applications of transparent ceramics: A review
International audienc