3 research outputs found
AutoCycle-VC: Towards Bottleneck-Independent Zero-Shot Cross-Lingual Voice Conversion
This paper proposes a simple and robust zero-shot voice conversion system
with a cycle structure and mel-spectrogram pre-processing. Previous works
suffer from information loss and poor synthesis quality due to their reliance
on a carefully designed bottleneck structure. Moreover, models relying solely
on self-reconstruction loss struggled with reproducing different speakers'
voices. To address these issues, we suggested a cycle-consistency loss that
considers conversion back and forth between target and source speakers.
Additionally, stacked random-shuffled mel-spectrograms and a label smoothing
method are utilized during speaker encoder training to extract a
time-independent global speaker representation from speech, which is the key to
a zero-shot conversion. Our model outperforms existing state-of-the-art results
in both subjective and objective evaluations. Furthermore, it facilitates
cross-lingual voice conversions and enhances the quality of synthesized speech
Recycling Sampling Timing Offset of Wi-Fi for Estimating Multiple ToFs of Superimposed Signal
Many Wi-Fi based device free localization (DFL) methods have been proposed for indoor location based services. Unfortunately, the received signal is superimposed with Line-of-Sight signal and reflections so that multi target DFL is only possible by estimating the time of flight (ToF) of each signal. To estimate multiple ToFs, we utilize the sampling timing offset (STO) that inherently occurs by asynchronous sampling timing between TX-RX. By utilizing STO, we can generate signals mimicking oversampled signals. We put the signal to our correlation based ToF estimation algorithm. We achieved 0.75-4.65 ns median error when 2-5 signals are superimposed.11Nsciescopu