477 research outputs found

    Self-Distillation Network with Ensemble Prototypes: Learning Robust Speaker Representations without Supervision

    Full text link
    Training speaker-discriminative and robust speaker verification systems without speaker labels is still challenging and worthwhile to explore. Previous studies have noted a substantial performance disparity between self-supervised and fully supervised approaches. In this paper, we propose an effective Self-Distillation network with Ensemble Prototypes (SDEP) to facilitate self-supervised speaker representation learning. A range of experiments conducted on the VoxCeleb datasets demonstrate the superiority of the SDEP framework in speaker verification. SDEP achieves a new SOTA on Voxceleb1 speaker verification evaluation benchmark ( i.e., equal error rate 1.94\%, 1.99\%, and 3.77\% for trial Vox1-O, Vox1-E and Vox1-H , respectively), discarding any speaker labels in the training phase. Code will be publicly available at https://github.com/alibaba-damo-academy/3D-Speaker.Comment: arXiv admin note: text overlap with arXiv:2211.0416

    Analysis of Improved Particle Swarm Algorithm in Wireless Sensor Network Localization

    Get PDF
    WSN localization occupies an important position in the practical application of WSN. To complete WSN localization efficiently and accurately, the article constructs the objective function based on the target node location constraints and maximum likelihood function. It avoids premature convergence through the PSO algorithm based on chaos search and backward learning. Based on linear fitting, the node-flipping fuzzy detection method is proposed to perform the judgment of node flipping fuzzy phenomenon. And the detection method is combined with the localization algorithm, and the final WSN localization algorithm is obtained after multi-threshold processing. After analysis, it is found that compared with other PSO algorithms, the MTLFPSO algorithm used in the paper has better performance with the highest accuracy of 83.1%. Different threshold values will affect the favorable and error detection rates of different WSNs. For type 1 WSNs, the positive detection rate of the 3-node network is the highest under the same threshold value, followed by the 4-node network; when the threshold value is 7.5 (3 ), the positive detection rate of the 3-node network is 97.8%. Different numbers of anchor nodes and communication radius will have specific effects on the number of definable nodes and relative localization error, in which the lowest relative localization error of the MTLFPSO algorithm is 3.4% under different numbers of anchor nodes; the lowest relative localization error of MTLFPSO algorithm is 2.5% under different communication radius. The article adopts the method to achieve accurate and efficient localization of WSNs

    3D-Speaker: A Large-Scale Multi-Device, Multi-Distance, and Multi-Dialect Corpus for Speech Representation Disentanglement

    Full text link
    Disentangling uncorrelated information in speech utterances is a crucial research topic within speech community. Different speech-related tasks focus on extracting distinct speech representations while minimizing the affects of other uncorrelated information. We present a large-scale speech corpus to facilitate the research of speech representation disentanglement. 3D-Speaker contains over 10,000 speakers, each of whom are simultaneously recorded by multiple Devices, locating at different Distances, and some speakers are speaking multiple Dialects. The controlled combinations of multi-dimensional audio data yield a matrix of a diverse blend of speech representation entanglement, thereby motivating intriguing methods to untangle them. The multi-domain nature of 3D-Speaker also makes it a suitable resource to evaluate large universal speech models and experiment methods of out-of-domain learning and self-supervised learning. https://3dspeaker.github.io

    Pushing the limits of self-supervised speaker verification using regularized distillation framework

    Full text link
    Training robust speaker verification systems without speaker labels has long been a challenging task. Previous studies observed a large performance gap between self-supervised and fully supervised methods. In this paper, we apply a non-contrastive self-supervised learning framework called DIstillation with NO labels (DINO) and propose two regularization terms applied to embeddings in DINO. One regularization term guarantees the diversity of the embeddings, while the other regularization term decorrelates the variables of each embedding. The effectiveness of various data augmentation techniques are explored, on both time and frequency domain. A range of experiments conducted on the VoxCeleb datasets demonstrate the superiority of the regularized DINO framework in speaker verification. Our method achieves the state-of-the-art speaker verification performance under a single-stage self-supervised setting on VoxCeleb. The codes will be made publicly-available

    Toward reliable signals decoding for electroencephalogram: A benchmark study to EEGNeX

    Full text link
    This study examines the efficacy of various neural network (NN) models in interpreting mental constructs via electroencephalogram (EEG) signals. Through the assessment of 16 prevalent NN models and their variants across four brain-computer interface (BCI) paradigms, we gauged their information representation capability. Rooted in comprehensive literature review findings, we proposed EEGNeX, a novel, purely ConvNet-based architecture. We pitted it against both existing cutting-edge strategies and the Mother of All BCI Benchmarks (MOABB) involving 11 distinct EEG motor imagination (MI) classification tasks and revealed that EEGNeX surpasses other state-of-the-art methods. Notably, it shows up to 2.1%-8.5% improvement in the classification accuracy in different scenarios with statistical significance (p < 0.05) compared to its competitors. This study not only provides deeper insights into designing efficient NN models for EEG data but also lays groundwork for future explorations into the relationship between bioelectric brain signals and NN architectures. For the benefit of broader scientific collaboration, we have made all benchmark models, including EEGNeX, publicly available at (https://github.com/chenxiachan/EEGNeX).Comment: 19 pages, 6 figure

    Inhibitors of the renin–angiotensin system: The potential role in the pathogenesis of COVID-19

    Get PDF
    Coronavirus disease 2019 (COVID-19), which initially began in China, has spread to other countries of Asia, Europe, America, Africa and Oceania, with the number of confirmed cases and suspected cases increasing each day. According to recently published research, it was found that the majority of the severe cases were elderly, and many of them had at least one chronic disease, especially cardiovascular diseases. Angiotensin-converting enzyme inhibitors/angiotensin receptor blockers (ACEIs/ARBs) are the most widely used drugs for cardiovascular diseases. The clinical effect of ACEIs/ARBs on patients with COVID-19 is still uncertain. This paper describes their potential role in the pathogenesis of COVID-19, which may provide useful in the advice of cardiologists and physicians

    Multidimensional sound propagation in 3D high-order topological sonic insulator

    Full text link
    High-order topological insulators (TIs) develop the conventional bulk-boundary correspondence theory and rise the interest in searching innovative topological materials. To realize a high-order TI with a wide passband of 1D and 2D transportation modes, we design non-trivial and trivial 3D sonic crystals whose combination mimics the Su-Schrieffer-Heeger model. The high-order topological boundary states can be found at the interfaces, including 0D corner state, 1D hinge state, and 2D surface state. The fabricated sample with the bent two-dimensional and one-dimensional acoustic channels exhibits the multidimensional sound propagation in space, and also verifies the transition between the complete band gap, hinge states, and surface states within the bulk band gap. Among them, the bandwidth of the single-mode hinge state achieves a large relative bandwidth 9.1%, in which sound transports one-dimensionally without significant leak into the surfaces or the bulk. The high-order topological states in the study pave the way for multidimensional sound manipulation in space.Comment: 21 pages, 7 figure

    MBrain: A Multi-channel Self-Supervised Learning Framework for Brain Signals

    Full text link
    Brain signals are important quantitative data for understanding physiological activities and diseases of human brain. Most existing studies pay attention to supervised learning methods, which, however, require high-cost clinical labels. In addition, the huge difference in the clinical patterns of brain signals measured by invasive (e.g., SEEG) and non-invasive (e.g., EEG) methods leads to the lack of a unified method. To handle the above issues, we propose to study the self-supervised learning (SSL) framework for brain signals that can be applied to pre-train either SEEG or EEG data. Intuitively, brain signals, generated by the firing of neurons, are transmitted among different connecting structures in human brain. Inspired by this, we propose MBrain to learn implicit spatial and temporal correlations between different channels (i.e., contacts of the electrode, corresponding to different brain areas) as the cornerstone for uniformly modeling different types of brain signals. Specifically, we represent the spatial correlation by a graph structure, which is built with proposed multi-channel CPC. We theoretically prove that optimizing the goal of multi-channel CPC can lead to a better predictive representation and apply the instantaneou-time-shift prediction task based on it. Then we capture the temporal correlation by designing the delayed-time-shift prediction task. Finally, replace-discriminative-learning task is proposed to preserve the characteristics of each channel. Extensive experiments of seizure detection on both EEG and SEEG large-scale real-world datasets demonstrate that our model outperforms several state-of-the-art time series SSL and unsupervised models, and has the ability to be deployed to clinical practice
    • …
    corecore