Previously, Target Speaker Extraction (TSE) has yielded outstanding
performance in certain application scenarios for speech enhancement and source
separation. However, obtaining auxiliary speaker-related information is still
challenging in noisy environments with significant reverberation. inspired by
the recently proposed distance-based sound separation, we propose the near
sound (NS) extractor, which leverages distance information for TSE to reliably
extract speaker information without requiring previous speaker enrolment,
called speaker embedding self-enrollment (SESE). Full- & sub-band modeling is
introduced to enhance our NS-Extractor's adaptability towards environments with
significant reverberation. Experimental results on several cross-datasets
demonstrate the effectiveness of our improvements and the excellent performance
of our proposed NS-Extractor in different application scenarios.Comment: Accepted by InterSpeech202