3 research outputs found
Multi-talker ASR for an unknown number of sources: Joint training of source counting, separation and ASR
Most approaches to multi-talker overlapped speech separation and recognition
assume that the number of simultaneously active speakers is given, but in
realistic situations, it is typically unknown. To cope with this, we extend an
iterative speech extraction system with mechanisms to count the number of
sources and combine it with a single-talker speech recognizer to form the first
end-to-end multi-talker automatic speech recognition system for an unknown
number of active speakers. Our experiments show very promising performance in
counting accuracy, source separation and speech recognition on simulated clean
mixtures from WSJ0-2mix and WSJ0-3mix. Among others, we set a new
state-of-the-art word error rate on the WSJ0-2mix database. Furthermore, our
system generalizes well to a larger number of speakers than it ever saw during
training, as shown in experiments with the WSJ0-4mix database.Comment: 5 pages, INTERSPEECH 202