Search CORE

137 research outputs found

Coding, Decoding, and Recovery of Clock Synchronization in Digital Multiplexing System

Author: Qin Xiaoyi
Wang Hansheng
Xiong Fuqin
Zeng Lieguang
Publication venue: EngagedScholarship@CSU
Publication date: 01/05/2003
Field of study

High-speed broadband digital communication networks rely on digital multiplexing technology where clock synchronization, including processing, transmission, and recovery of the clock, is the critical technique. This paper interprets the process of clock synchronization in multiplexing systems as quantizing and coding the information of clock synchronization, interprets clock justification as timing sigma-delta modulation (TΔ-ΣM), and interprets the jitter of justification as quantization error. As a result, decreasing the quantization error is equivalent to decreasing the jitter of justification. Using this theory, the paper studies the existing jitter-reducing techniques in transmitters and receivers, presents some techniques that can decrease the quantization error (justification jitter) in digital multiplexing systems, and presents a new method of clock recovery

Cleveland-Marshall College of Law

SIG-VC: A Speaker Information Guided Zero-shot Voice Conversion System for Both Human Beings and Machines

Author: Cai Zexin
Li Ming
Qin Xiaoyi
Zhang Haozhe
Publication venue
Publication date: 04/03/2022
Field of study

Nowadays, as more and more systems achieve good performance in traditional voice conversion (VC) tasks, people's attention gradually turns to VC tasks under extreme conditions. In this paper, we propose a novel method for zero-shot voice conversion. We aim to obtain intermediate representations for speaker-content disentanglement of speech to better remove speaker information and get pure content information. Accordingly, our proposed framework contains a module that removes the speaker information from the acoustic feature of the source speaker. Moreover, speaker information control is added to our system to maintain the voice cloning performance. The proposed system is evaluated by subjective and objective metrics. Results show that our proposed system significantly reduces the trade-off problem in zero-shot voice conversion, while it also manages to have high spoofing power to the speaker verification system

arXiv.org e-Print Archive

An All-Digital Clock-Smoothing Technique—Counting-Prognostication

Author: Qin Xiaoyi
Wang Hansheng
Xiong Fuqin
Zeng Lieguang
Publication venue: EngagedScholarship@CSU
Publication date: 01/02/2003
Field of study

This article presents a novel universal all-digital clock-smoothing technique - counting-prognostication. Operation principles, performance analysis, and comparisons are given. Analysis and measurement results show that this technique can efficiently smooth jitter and wander for a wide pull-in range and pull-out range, and jitter accumulation is small. A cycle-varying counting-prognostication method, which decreases pull-in time, is also suggested

Cleveland-Marshall College of Law

The DKU-OPPO System for the 2022 Spoofing-Aware Speaker Verification Challenge

Author: Li Ming
Qin Xiaoyi
Wang Xingming
Wang Yikang
Xu Yunfei
Publication venue
Publication date: 15/07/2022
Field of study

This paper describes our DKU-OPPO system for the 2022 Spoofing-Aware Speaker Verification (SASV) Challenge. First, we split the joint task into speaker verification (SV) and spoofing countermeasure (CM), these two tasks which are optimized separately. For ASV systems, four state-of-the-art methods are employed. For CM systems, we propose two methods on top of the challenge baseline to further improve the performance, namely Embedding Random Sampling Augmentation (ERSA) and One-Class Confusion Loss(OCCL). Second, we also explore whether SV embedding could help improve CM system performance. We observe a dramatic performance degradation of existing CM systems on the domain-mismatched Voxceleb2 dataset. Third, we compare different fusion strategies, including parallel score fusion and sequential cascaded systems. Compared to the 1.71% SASV-EER baseline, our submitted cascaded system obtains a 0.21% SASV-EER on the challenge official evaluation set.Comment: Accepted by Interspeech202

arXiv.org e-Print Archive

The 2022 Far-field Speaker Verification Challenge: Exploring domain mismatch and semi-supervised learning under the far-field scenario

Author: Bu Hui
Li Haizhou
Li Ming
Narayanan Shrikanth
Qin Xiaoyi
Publication venue
Publication date: 15/09/2022
Field of study

FFSVC2022 is the second challenge of far-field speaker verification. FFSVC2022 provides the fully-supervised far-field speaker verification to further explore the far-field scenario and proposes semi-supervised far-field speaker verification. In contrast to FFSVC2020, FFSVC2022 focus on the single-channel scenario. In addition, a supplementary set for the FFSVC2020 dataset is released this year. The supplementary set consists of more recording devices and has the same data distribution as the FFSVC2022 evaluation set. This paper summarizes the FFSVC 2022, including tasks description, trial designing details, a baseline system and a summary of challenge results. The challenge results indicate substantial progress made in the field but also present that there are still difficulties with the far-field scenario

arXiv.org e-Print Archive

Laugh Betrays You? Learning Robust Speaker Representation From Speech Containing Non-Verbal Fragments

Author: Cui Huahua
Li Ming
Lin Yuke
Qin Xiaoyi
Zhu Zhenyi
Publication venue
Publication date: 06/11/2022
Field of study

The success of automatic speaker verification shows that discriminative speaker representations can be extracted from neutral speech. However, as a kind of non-verbal voice, laughter should also carry speaker information intuitively. Thus, this paper focuses on exploring speaker verification about utterances containing non-verbal laughter segments. We collect a set of clips with laughter components by conducting a laughter detection script on VoxCeleb and part of the CN-Celeb dataset. To further filter untrusted clips, probability scores are calculated by our binary laughter detection classifier, which is pre-trained by pure laughter and neutral speech. After that, based on the clips whose scores are over the threshold, we construct trials under two different evaluation scenarios: Laughter-Laughter (LL) and Speech-Laughter (SL). Then a novel method called Laughter-Splicing based Network (LSN) is proposed, which can significantly boost performance in both scenarios and maintain the performance on the neutral speech, such as the VoxCeleb1 test set. Specifically, our system achieves relative 20% and 22% improvement on Laughter-Laughter and Speech-Laughter trials, respectively. The meta-data and sample clips have been released at https://github.com/nevermoreLin/Laugh_LSN.Comment: Submitted to ICASSP202

arXiv.org e-Print Archive

The DKU-MSXF Speaker Verification System for the VoxCeleb Speaker Recognition Challenge 2023

Author: Jiang Ning
Li Ming
Li Ze
Lin Yuke
Qin Xiaoyi
Zhao Guoqing
Publication venue
Publication date: 16/08/2023
Field of study

This paper is the system description of the DKU-MSXF System for the track1, track2 and track3 of the VoxCeleb Speaker Recognition Challenge 2023 (VoxSRC-23). For Track 1, we utilize a network structure based on ResNet for training. By constructing a cross-age QMF training set, we achieve a substantial improvement in system performance. For Track 2, we inherite the pre-trained model from Track 1 and conducte mixed training by incorporating the VoxBlink-clean dataset. In comparison to Track 1, the models incorporating VoxBlink-clean data exhibit a performance improvement by more than 10% relatively. For Track3, the semi-supervised domain adaptation task, a novel pseudo-labeling method based on triple thresholds and sub-center purification is adopted to make domain adaptation. The final submission achieves mDCF of 0.1243 in task1, mDCF of 0.1165 in Track 2 and EER of 4.952% in Track 3.Comment: arXiv admin note: text overlap with arXiv:2210.0509

arXiv.org e-Print Archive

The DKU-DukeECE Diarization System for the VoxCeleb Speaker Recognition Challenge 2022

Author: Cheng Ming
Li Ming
Qin Xiaoyi
Wang Kangyue
Wang Weiqing
Zhang Yucong
Publication venue
Publication date: 04/10/2022
Field of study

This paper discribes the DKU-DukeECE submission to the 4th track of the VoxCeleb Speaker Recognition Challenge 2022 (VoxSRC-22). Our system contains a fused voice activity detection model, a clustering-based diarization model, and a target-speaker voice activity detection-based overlap detection model. Overall, the submitted system is similar to our previous year's system in VoxSRC-21. The difference is that we use a much better speaker embedding and a fused voice activity detection, which significantly improves the performance. Finally, we fuse 4 different systems using DOVER-lap and achieve 4.75 of the diarization error rate, which ranks the 1st place in track 4.Comment: arXiv admin note: substantial text overlap with arXiv:2109.0200

arXiv.org e-Print Archive