137 research outputs found
Coding, Decoding, and Recovery of Clock Synchronization in Digital Multiplexing System
High-speed broadband digital communication networks rely on digital multiplexing technology where clock synchronization, including processing, transmission, and recovery of the clock, is the critical technique. This paper interprets the process of clock synchronization in multiplexing systems as quantizing and coding the information of clock synchronization, interprets clock justification as timing sigma-delta modulation (TΔ-ΣM), and interprets the jitter of justification as quantization error. As a result, decreasing the quantization error is equivalent to decreasing the jitter of justification. Using this theory, the paper studies the existing jitter-reducing techniques in transmitters and receivers, presents some techniques that can decrease the quantization error (justification jitter) in digital multiplexing systems, and presents a new method of clock recovery
SIG-VC: A Speaker Information Guided Zero-shot Voice Conversion System for Both Human Beings and Machines
Nowadays, as more and more systems achieve good performance in traditional
voice conversion (VC) tasks, people's attention gradually turns to VC tasks
under extreme conditions. In this paper, we propose a novel method for
zero-shot voice conversion. We aim to obtain intermediate representations for
speaker-content disentanglement of speech to better remove speaker information
and get pure content information. Accordingly, our proposed framework contains
a module that removes the speaker information from the acoustic feature of the
source speaker. Moreover, speaker information control is added to our system to
maintain the voice cloning performance. The proposed system is evaluated by
subjective and objective metrics. Results show that our proposed system
significantly reduces the trade-off problem in zero-shot voice conversion,
while it also manages to have high spoofing power to the speaker verification
system
An All-Digital Clock-Smoothing Technique—Counting-Prognostication
This article presents a novel universal all-digital clock-smoothing technique - counting-prognostication. Operation principles, performance analysis, and comparisons are given. Analysis and measurement results show that this technique can efficiently smooth jitter and wander for a wide pull-in range and pull-out range, and jitter accumulation is small. A cycle-varying counting-prognostication method, which decreases pull-in time, is also suggested
The DKU-OPPO System for the 2022 Spoofing-Aware Speaker Verification Challenge
This paper describes our DKU-OPPO system for the 2022 Spoofing-Aware Speaker
Verification (SASV) Challenge. First, we split the joint task into speaker
verification (SV) and spoofing countermeasure (CM), these two tasks which are
optimized separately. For ASV systems, four state-of-the-art methods are
employed. For CM systems, we propose two methods on top of the challenge
baseline to further improve the performance, namely Embedding Random Sampling
Augmentation (ERSA) and One-Class Confusion Loss(OCCL). Second, we also explore
whether SV embedding could help improve CM system performance. We observe a
dramatic performance degradation of existing CM systems on the
domain-mismatched Voxceleb2 dataset. Third, we compare different fusion
strategies, including parallel score fusion and sequential cascaded systems.
Compared to the 1.71% SASV-EER baseline, our submitted cascaded system obtains
a 0.21% SASV-EER on the challenge official evaluation set.Comment: Accepted by Interspeech202
The 2022 Far-field Speaker Verification Challenge: Exploring domain mismatch and semi-supervised learning under the far-field scenario
FFSVC2022 is the second challenge of far-field speaker verification.
FFSVC2022 provides the fully-supervised far-field speaker verification to
further explore the far-field scenario and proposes semi-supervised far-field
speaker verification. In contrast to FFSVC2020, FFSVC2022 focus on the
single-channel scenario. In addition, a supplementary set for the FFSVC2020
dataset is released this year. The supplementary set consists of more recording
devices and has the same data distribution as the FFSVC2022 evaluation set.
This paper summarizes the FFSVC 2022, including tasks description, trial
designing details, a baseline system and a summary of challenge results. The
challenge results indicate substantial progress made in the field but also
present that there are still difficulties with the far-field scenario
Laugh Betrays You? Learning Robust Speaker Representation From Speech Containing Non-Verbal Fragments
The success of automatic speaker verification shows that discriminative
speaker representations can be extracted from neutral speech. However, as a
kind of non-verbal voice, laughter should also carry speaker information
intuitively. Thus, this paper focuses on exploring speaker verification about
utterances containing non-verbal laughter segments. We collect a set of clips
with laughter components by conducting a laughter detection script on VoxCeleb
and part of the CN-Celeb dataset. To further filter untrusted clips,
probability scores are calculated by our binary laughter detection classifier,
which is pre-trained by pure laughter and neutral speech. After that, based on
the clips whose scores are over the threshold, we construct trials under two
different evaluation scenarios: Laughter-Laughter (LL) and Speech-Laughter
(SL). Then a novel method called Laughter-Splicing based Network (LSN) is
proposed, which can significantly boost performance in both scenarios and
maintain the performance on the neutral speech, such as the VoxCeleb1 test set.
Specifically, our system achieves relative 20% and 22% improvement on
Laughter-Laughter and Speech-Laughter trials, respectively. The meta-data and
sample clips have been released at https://github.com/nevermoreLin/Laugh_LSN.Comment: Submitted to ICASSP202
The DKU-MSXF Speaker Verification System for the VoxCeleb Speaker Recognition Challenge 2023
This paper is the system description of the DKU-MSXF System for the track1,
track2 and track3 of the VoxCeleb Speaker Recognition Challenge 2023
(VoxSRC-23). For Track 1, we utilize a network structure based on ResNet for
training. By constructing a cross-age QMF training set, we achieve a
substantial improvement in system performance. For Track 2, we inherite the
pre-trained model from Track 1 and conducte mixed training by incorporating the
VoxBlink-clean dataset. In comparison to Track 1, the models incorporating
VoxBlink-clean data exhibit a performance improvement by more than 10%
relatively. For Track3, the semi-supervised domain adaptation task, a novel
pseudo-labeling method based on triple thresholds and sub-center purification
is adopted to make domain adaptation. The final submission achieves mDCF of
0.1243 in task1, mDCF of 0.1165 in Track 2 and EER of 4.952% in Track 3.Comment: arXiv admin note: text overlap with arXiv:2210.0509
The DKU-DukeECE Diarization System for the VoxCeleb Speaker Recognition Challenge 2022
This paper discribes the DKU-DukeECE submission to the 4th track of the
VoxCeleb Speaker Recognition Challenge 2022 (VoxSRC-22). Our system contains a
fused voice activity detection model, a clustering-based diarization model, and
a target-speaker voice activity detection-based overlap detection model.
Overall, the submitted system is similar to our previous year's system in
VoxSRC-21. The difference is that we use a much better speaker embedding and a
fused voice activity detection, which significantly improves the performance.
Finally, we fuse 4 different systems using DOVER-lap and achieve 4.75 of the
diarization error rate, which ranks the 1st place in track 4.Comment: arXiv admin note: substantial text overlap with arXiv:2109.0200
- …