Burst-Aware Weighted Fair Queueing for Serverless Inference: Mitigating Noisy Neighbor Effects in Multi-Tenant Systems

Pandey, Rajesh Kumar; Soni, Jubin Abhishek; Anand, Amit

Search results>Research output from Journals of Universiti Tun Hussein Onn Malaysia (UTHM)

research article text

oai:publisher.uthm.edu.my:article/23553

Burst-Aware Weighted Fair Queueing for Serverless Inference: Mitigating Noisy Neighbor Effects in Multi-Tenant Systems

Authors: Rajesh Kumar Pandey
Jubin Abhishek Soni
Amit Anand
Publication date: 29 December 2025
Publisher: 'Penerbit UTHM'

Abstract

Multi-tenant serverless inference often devolves into noisy-neighbor scenarios where a single tenant’s bursty LLM batch floods the fleet, pushing interactive calls beyond their latency budgets. We are proposing a Burst-Aware Weighted Fair Queueing (BWFQ) - a scheduler that requires only two counters per tenant (tokens earned, tokens spent) and a constant-time heap pop to pick the next invocation. In BWFQ, we use the classic token-bucket shaper where tokens accumulate at a tenant-specific base rate and are reduced on each dispatch. When a tenant exhausts all its tokens, its requests are queued, giving chances to other quieter tenant s to run. Techniques described in other papers like Dominant-Resource Fairness, BWFQ requires neither per-invocation resource profiling nor multi-dimensional share accounting, making it easy to integrate onto existing Lambda-style dispatchers. To evaluate our algorithm, we built a prototype using AWS Lambda and observed that BWFQ reduces the P99 latency gap between interactive and batch tenants from 8.5s to 2.1s; a 4.0X improvement, while preserving 94% of the throughput achieved by First-Come-First-Serve. The algorithm adds only 35 µs of scheduling overhead per decision and fits in approximately in 150 lines of Go code. These results demonstrate that a simple token-bucket fair queueing is a practical, immediately usable step towards building fairness in production serverless inference

Similar works

Full text

Open in the Core reader

Download PDF

Journals of Universiti Tun Hussein Onn Malaysia (UTHM)

oai:publisher.uthm.edu.my:arti...

Last time updated on 11/02/2026

This paper was published in Journals of Universiti Tun Hussein Onn Malaysia (UTHM).

Having an issue?

Is data on this page outdated, violates copyrights or anything else? Report the problem now and we will take corresponding actions after reviewing your request.

Licence: https://creativecommons.org/licenses/by-nc-sa/4.0