137 research outputs found

    Coding, Decoding, and Recovery of Clock Synchronization in Digital Multiplexing System

    Get PDF
    High-speed broadband digital communication networks rely on digital multiplexing technology where clock synchronization, including processing, transmission, and recovery of the clock, is the critical technique. This paper interprets the process of clock synchronization in multiplexing systems as quantizing and coding the information of clock synchronization, interprets clock justification as timing sigma-delta modulation (TΔ-ΣM), and interprets the jitter of justification as quantization error. As a result, decreasing the quantization error is equivalent to decreasing the jitter of justification. Using this theory, the paper studies the existing jitter-reducing techniques in transmitters and receivers, presents some techniques that can decrease the quantization error (justification jitter) in digital multiplexing systems, and presents a new method of clock recovery

    SIG-VC: A Speaker Information Guided Zero-shot Voice Conversion System for Both Human Beings and Machines

    Full text link
    Nowadays, as more and more systems achieve good performance in traditional voice conversion (VC) tasks, people's attention gradually turns to VC tasks under extreme conditions. In this paper, we propose a novel method for zero-shot voice conversion. We aim to obtain intermediate representations for speaker-content disentanglement of speech to better remove speaker information and get pure content information. Accordingly, our proposed framework contains a module that removes the speaker information from the acoustic feature of the source speaker. Moreover, speaker information control is added to our system to maintain the voice cloning performance. The proposed system is evaluated by subjective and objective metrics. Results show that our proposed system significantly reduces the trade-off problem in zero-shot voice conversion, while it also manages to have high spoofing power to the speaker verification system

    An All-Digital Clock-Smoothing Technique—Counting-Prognostication

    Get PDF
    This article presents a novel universal all-digital clock-smoothing technique - counting-prognostication. Operation principles, performance analysis, and comparisons are given. Analysis and measurement results show that this technique can efficiently smooth jitter and wander for a wide pull-in range and pull-out range, and jitter accumulation is small. A cycle-varying counting-prognostication method, which decreases pull-in time, is also suggested

    The DKU-OPPO System for the 2022 Spoofing-Aware Speaker Verification Challenge

    Full text link
    This paper describes our DKU-OPPO system for the 2022 Spoofing-Aware Speaker Verification (SASV) Challenge. First, we split the joint task into speaker verification (SV) and spoofing countermeasure (CM), these two tasks which are optimized separately. For ASV systems, four state-of-the-art methods are employed. For CM systems, we propose two methods on top of the challenge baseline to further improve the performance, namely Embedding Random Sampling Augmentation (ERSA) and One-Class Confusion Loss(OCCL). Second, we also explore whether SV embedding could help improve CM system performance. We observe a dramatic performance degradation of existing CM systems on the domain-mismatched Voxceleb2 dataset. Third, we compare different fusion strategies, including parallel score fusion and sequential cascaded systems. Compared to the 1.71% SASV-EER baseline, our submitted cascaded system obtains a 0.21% SASV-EER on the challenge official evaluation set.Comment: Accepted by Interspeech202

    The 2022 Far-field Speaker Verification Challenge: Exploring domain mismatch and semi-supervised learning under the far-field scenario

    Full text link
    FFSVC2022 is the second challenge of far-field speaker verification. FFSVC2022 provides the fully-supervised far-field speaker verification to further explore the far-field scenario and proposes semi-supervised far-field speaker verification. In contrast to FFSVC2020, FFSVC2022 focus on the single-channel scenario. In addition, a supplementary set for the FFSVC2020 dataset is released this year. The supplementary set consists of more recording devices and has the same data distribution as the FFSVC2022 evaluation set. This paper summarizes the FFSVC 2022, including tasks description, trial designing details, a baseline system and a summary of challenge results. The challenge results indicate substantial progress made in the field but also present that there are still difficulties with the far-field scenario

    Laugh Betrays You? Learning Robust Speaker Representation From Speech Containing Non-Verbal Fragments

    Full text link
    The success of automatic speaker verification shows that discriminative speaker representations can be extracted from neutral speech. However, as a kind of non-verbal voice, laughter should also carry speaker information intuitively. Thus, this paper focuses on exploring speaker verification about utterances containing non-verbal laughter segments. We collect a set of clips with laughter components by conducting a laughter detection script on VoxCeleb and part of the CN-Celeb dataset. To further filter untrusted clips, probability scores are calculated by our binary laughter detection classifier, which is pre-trained by pure laughter and neutral speech. After that, based on the clips whose scores are over the threshold, we construct trials under two different evaluation scenarios: Laughter-Laughter (LL) and Speech-Laughter (SL). Then a novel method called Laughter-Splicing based Network (LSN) is proposed, which can significantly boost performance in both scenarios and maintain the performance on the neutral speech, such as the VoxCeleb1 test set. Specifically, our system achieves relative 20% and 22% improvement on Laughter-Laughter and Speech-Laughter trials, respectively. The meta-data and sample clips have been released at https://github.com/nevermoreLin/Laugh_LSN.Comment: Submitted to ICASSP202

    The DKU-MSXF Speaker Verification System for the VoxCeleb Speaker Recognition Challenge 2023

    Full text link
    This paper is the system description of the DKU-MSXF System for the track1, track2 and track3 of the VoxCeleb Speaker Recognition Challenge 2023 (VoxSRC-23). For Track 1, we utilize a network structure based on ResNet for training. By constructing a cross-age QMF training set, we achieve a substantial improvement in system performance. For Track 2, we inherite the pre-trained model from Track 1 and conducte mixed training by incorporating the VoxBlink-clean dataset. In comparison to Track 1, the models incorporating VoxBlink-clean data exhibit a performance improvement by more than 10% relatively. For Track3, the semi-supervised domain adaptation task, a novel pseudo-labeling method based on triple thresholds and sub-center purification is adopted to make domain adaptation. The final submission achieves mDCF of 0.1243 in task1, mDCF of 0.1165 in Track 2 and EER of 4.952% in Track 3.Comment: arXiv admin note: text overlap with arXiv:2210.0509

    The DKU-DukeECE Diarization System for the VoxCeleb Speaker Recognition Challenge 2022

    Full text link
    This paper discribes the DKU-DukeECE submission to the 4th track of the VoxCeleb Speaker Recognition Challenge 2022 (VoxSRC-22). Our system contains a fused voice activity detection model, a clustering-based diarization model, and a target-speaker voice activity detection-based overlap detection model. Overall, the submitted system is similar to our previous year's system in VoxSRC-21. The difference is that we use a much better speaker embedding and a fused voice activity detection, which significantly improves the performance. Finally, we fuse 4 different systems using DOVER-lap and achieve 4.75 of the diarization error rate, which ranks the 1st place in track 4.Comment: arXiv admin note: substantial text overlap with arXiv:2109.0200
    • …
    corecore