The primary objective of speech enhancement is to reduce background noise
while preserving the target's speech. A common dilemma occurs when a speaker is
confined to a noisy environment and receives a call with high background and
transmission noise. To address this problem, the Deep Noise Suppression (DNS)
Challenge focuses on removing the background noise with the next-generation
deep learning models to enhance the target's speech; however, researchers fail
to consider Voice Over IP (VoIP) applications their transmission noise.
Focusing on Google Meet and its cellular application, our work achieves
state-of-the-art performance on the Google Meet To Phone Track of the VoIP DNS
Challenge. This paper demonstrates how to beat industrial performance and
achieve 1.92 PESQ and 0.88 STOI, as well as superior acoustic fidelity,
perceptual quality, and intelligibility in various metrics