77 research outputs found

    2D2D HILIC-ELSD/UPLC-Q-TOF-MS Method for Acquiring Phospholipid Profiles and the Application in Caenorhabditis elegans

    Get PDF
    Phospholipids are the main constituent of cellular membranes and have recently been identified to have diagnostic value as biomarkers for many diseases. Accordingly, much emphasis is now laid on developing optimal analytical techniques for the phospholipid profiles of various biological samples. In the present study, different classes of phospholipids are first separated by optimized hydrophilic interaction chromatography with evaporative light scattering detector (HILIC-ELSD). The phospholipids in each class are then identified by ultraperformance liquid chromatography-quadrupole time-of-flight mass spectrometry (UPLC-Q-TOF-MS). Validation results confirm that this approach meets the requirements of quantitative analysis. Finally, the approach is adopted to analyze the phospholipid profiles in Caenorhabditis elegans. A total of 111 phospholipid species is identified according to the mass fragments. Major fatty acyl chains in phospholipids are found to be formed by oleic acid (C18:1), arachidonic acid (C20:4), and eicosapentaenoic acid (C20:5). Overall, this study improves current knowledge on analytical techniques of the phospholipid composition in C. elegans and provides a basis for future lipidomics research. Practical applications: Phospholipids reportedly play a crucial role in the development of many diseases. Until now, only a small portion of phospholipids in Caenorhabditis elegans has been reported by using one-dimensional analysis strategy. The offline 2D2D liquid chromatography method developed in this study identifies 111 phospholipid species in Caenorhabditis elegans. The obtained phospholipid profiles complement the lipid database of Caenorhabditis elegans. The study also provides the basis for the future development of a 2D online approach

    Self-Supervised Training of Speaker Encoder with Multi-Modal Diverse Positive Pairs

    Full text link
    We study a novel neural architecture and its training strategies of speaker encoder for speaker recognition without using any identity labels. The speaker encoder is trained to extract a fixed-size speaker embedding from a spoken utterance of various length. Contrastive learning is a typical self-supervised learning technique. However, the quality of the speaker encoder depends very much on the sampling strategy of positive and negative pairs. It is common that we sample a positive pair of segments from the same utterance. Unfortunately, such poor-man's positive pairs (PPP) lack necessary diversity for the training of a robust encoder. In this work, we propose a multi-modal contrastive learning technique with novel sampling strategies. By cross-referencing between speech and face data, we study a method that finds diverse positive pairs (DPP) for contrastive learning, thus improving the robustness of the speaker encoder. We train the speaker encoder on the VoxCeleb2 dataset without any speaker labels, and achieve an equal error rate (EER) of 2.89\%, 3.17\% and 6.27\% under the proposed progressive clustering strategy, and an EER of 1.44\%, 1.77\% and 3.27\% under the two-stage learning strategy with pseudo labels, on the three test sets of VoxCeleb1. This novel solution outperforms the state-of-the-art self-supervised learning methods by a large margin, at the same time, achieves comparable results with the supervised learning counterpart. We also evaluate our self-supervised learning technique on LRS2 and LRW datasets, where the speaker information is unknown. All experiments suggest that the proposed neural architecture and sampling strategies are robust across datasets.Comment: 13 page

    Self-supervised Speaker Recognition with Loss-gated Learning

    Full text link
    In self-supervised learning for speaker recognition, pseudo labels are useful as the supervision signals. It is a known fact that a speaker recognition model doesn't always benefit from pseudo labels due to their unreliability. In this work, we observe that a speaker recognition network tends to model the data with reliable labels faster than those with unreliable labels. This motivates us to study a loss-gated learning (LGL) strategy, which extracts the reliable labels through the fitting ability of the neural network during training. With the proposed LGL, our speaker recognition model obtains a 46.3%46.3\% performance gain over the system without it. Further, the proposed self-supervised speaker recognition with LGL trained on the VoxCeleb2 dataset without any labels achieves an equal error rate of 1.66%1.66\% on the VoxCeleb1 original test set. Code has been made available at: https://github.com/TaoRuijie/Loss-Gated-Learning.Comment: 5 pages, 3 figure

    Emphasized Non-Target Speaker Knowledge in Knowledge Distillation for Automatic Speaker Verification

    Full text link
    Knowledge distillation (KD) is used to enhance automatic speaker verification performance by ensuring consistency between large teacher networks and lightweight student networks at the embedding level or label level. However, the conventional label-level KD overlooks the significant knowledge from non-target speakers, particularly their classification probabilities, which can be crucial for automatic speaker verification. In this paper, we first demonstrate that leveraging a larger number of training non-target speakers improves the performance of automatic speaker verification models. Inspired by this finding about the importance of non-target speakers' knowledge, we modified the conventional label-level KD by disentangling and emphasizing the classification probabilities of non-target speakers during knowledge distillation. The proposed method is applied to three different student model architectures and achieves an average of 13.67% improvement in EER on the VoxCeleb dataset compared to embedding-level and conventional label-level KD methods.Comment: Submitted to ICASSP 202

    USED: Universal Speaker Extraction and Diarization

    Full text link
    Speaker extraction and diarization are two crucial enabling techniques for speech applications. Speaker extraction aims to extract a target speaker's voice from a multi-talk mixture, while speaker diarization demarcates speech segments by speaker, identifying `who spoke when'. The previous studies have typically treated the two tasks independently. However, the two tasks share a similar objective, that is to disentangle the speakers in the spectral domain for the former but in the temporal domain for the latter. It is logical to believe that the speaker turns obtained from speaker diarization can benefit speaker extraction, while the extracted speech offers more accurate speaker turns than the mixture speech. In this paper, we propose a unified framework called Universal Speaker Extraction and Diarization (USED). We extend the existing speaker extraction model to simultaneously extract the waveforms of all speakers. We also employ a scenario-aware differentiated loss function to address the problem of sparsely overlapped speech in real-world conversations. We show that the USED model significantly outperforms the baselines for both speaker extraction and diarization tasks, in both highly overlapped and sparsely overlapped scenarios. Audio samples are available at https://ajyy.github.io/demo/USED/.Comment: Submitted to ICASSP 202
    • …
    corecore