We propose SE-Bridge, a novel method for speech enhancement (SE). After
recently applying the diffusion models to speech enhancement, we can achieve
speech enhancement by solving a stochastic differential equation (SDE). Each
SDE corresponds to a probabilistic flow ordinary differential equation
(PF-ODE), and the trajectory of the PF-ODE solution consists of the speech
states at different moments. Our approach is based on consistency model that
ensure any speech states on the same PF-ODE trajectory, correspond to the same
initial state. By integrating the Brownian Bridge process, the model is able to
generate high-intelligibility speech samples without adversarial training. This
is the first attempt that applies the consistency models to SE task, achieving
state-of-the-art results in several metrics while saving 15 x the time required
for sampling compared to the diffusion-based baseline. Our experiments on
multiple datasets demonstrate the effectiveness of SE-Bridge in SE.
Furthermore, we show through extensive experiments on downstream tasks,
including Automatic Speech Recognition (ASR) and Speaker Verification (SV),
that SE-Bridge can effectively support multiple downstream tasks