477 research outputs found
Self-Distillation Network with Ensemble Prototypes: Learning Robust Speaker Representations without Supervision
Training speaker-discriminative and robust speaker verification systems
without speaker labels is still challenging and worthwhile to explore. Previous
studies have noted a substantial performance disparity between self-supervised
and fully supervised approaches. In this paper, we propose an effective
Self-Distillation network with Ensemble Prototypes (SDEP) to facilitate
self-supervised speaker representation learning. A range of experiments
conducted on the VoxCeleb datasets demonstrate the superiority of the SDEP
framework in speaker verification. SDEP achieves a new SOTA on Voxceleb1
speaker verification evaluation benchmark ( i.e., equal error rate 1.94\%,
1.99\%, and 3.77\% for trial Vox1-O, Vox1-E and Vox1-H , respectively),
discarding any speaker labels in the training phase. Code will be publicly
available at https://github.com/alibaba-damo-academy/3D-Speaker.Comment: arXiv admin note: text overlap with arXiv:2211.0416
Analysis of Improved Particle Swarm Algorithm in Wireless Sensor Network Localization
WSN localization occupies an important position in the practical application of WSN. To complete WSN localization efficiently and accurately, the article constructs the objective function based on the target node location constraints and maximum likelihood function. It avoids premature convergence through the PSO algorithm based on chaos search and backward learning. Based on linear fitting, the node-flipping fuzzy detection method is proposed to perform the judgment of node flipping fuzzy phenomenon. And the detection method is combined with the localization algorithm, and the final WSN localization algorithm is obtained after multi-threshold processing. After analysis, it is found that compared with other PSO algorithms, the MTLFPSO algorithm used in the paper has better performance with the highest accuracy of 83.1%. Different threshold values will affect the favorable and error detection rates of different WSNs. For type 1 WSNs, the positive detection rate of the 3-node network is the highest under the same threshold value, followed by the 4-node network; when the threshold value is 7.5 (3 ), the positive detection rate of the 3-node network is 97.8%. Different numbers of anchor nodes and communication radius will have specific effects on the number of definable nodes and relative localization error, in which the lowest relative localization error of the MTLFPSO algorithm is 3.4% under different numbers of anchor nodes; the lowest relative localization error of MTLFPSO algorithm is 2.5% under different communication radius. The article adopts the method to achieve accurate and efficient localization of WSNs
3D-Speaker: A Large-Scale Multi-Device, Multi-Distance, and Multi-Dialect Corpus for Speech Representation Disentanglement
Disentangling uncorrelated information in speech utterances is a crucial
research topic within speech community. Different speech-related tasks focus on
extracting distinct speech representations while minimizing the affects of
other uncorrelated information. We present a large-scale speech corpus to
facilitate the research of speech representation disentanglement. 3D-Speaker
contains over 10,000 speakers, each of whom are simultaneously recorded by
multiple Devices, locating at different Distances, and some speakers are
speaking multiple Dialects. The controlled combinations of multi-dimensional
audio data yield a matrix of a diverse blend of speech representation
entanglement, thereby motivating intriguing methods to untangle them. The
multi-domain nature of 3D-Speaker also makes it a suitable resource to evaluate
large universal speech models and experiment methods of out-of-domain learning
and self-supervised learning. https://3dspeaker.github.io
Pushing the limits of self-supervised speaker verification using regularized distillation framework
Training robust speaker verification systems without speaker labels has long
been a challenging task. Previous studies observed a large performance gap
between self-supervised and fully supervised methods. In this paper, we apply a
non-contrastive self-supervised learning framework called DIstillation with NO
labels (DINO) and propose two regularization terms applied to embeddings in
DINO. One regularization term guarantees the diversity of the embeddings, while
the other regularization term decorrelates the variables of each embedding. The
effectiveness of various data augmentation techniques are explored, on both
time and frequency domain. A range of experiments conducted on the VoxCeleb
datasets demonstrate the superiority of the regularized DINO framework in
speaker verification. Our method achieves the state-of-the-art speaker
verification performance under a single-stage self-supervised setting on
VoxCeleb. The codes will be made publicly-available
Toward reliable signals decoding for electroencephalogram: A benchmark study to EEGNeX
This study examines the efficacy of various neural network (NN) models in
interpreting mental constructs via electroencephalogram (EEG) signals. Through
the assessment of 16 prevalent NN models and their variants across four
brain-computer interface (BCI) paradigms, we gauged their information
representation capability. Rooted in comprehensive literature review findings,
we proposed EEGNeX, a novel, purely ConvNet-based architecture. We pitted it
against both existing cutting-edge strategies and the Mother of All BCI
Benchmarks (MOABB) involving 11 distinct EEG motor imagination (MI)
classification tasks and revealed that EEGNeX surpasses other state-of-the-art
methods. Notably, it shows up to 2.1%-8.5% improvement in the classification
accuracy in different scenarios with statistical significance (p < 0.05)
compared to its competitors. This study not only provides deeper insights into
designing efficient NN models for EEG data but also lays groundwork for future
explorations into the relationship between bioelectric brain signals and NN
architectures. For the benefit of broader scientific collaboration, we have
made all benchmark models, including EEGNeX, publicly available at
(https://github.com/chenxiachan/EEGNeX).Comment: 19 pages, 6 figure
Inhibitors of the renin–angiotensin system: The potential role in the pathogenesis of COVID-19
Coronavirus disease 2019 (COVID-19), which initially began in China, has spread to other countries of Asia, Europe, America, Africa and Oceania, with the number of confirmed cases and suspected cases increasing each day. According to recently published research, it was found that the majority of the severe cases were elderly, and many of them had at least one chronic disease, especially cardiovascular diseases. Angiotensin-converting enzyme inhibitors/angiotensin receptor blockers (ACEIs/ARBs) are the most widely used drugs for cardiovascular diseases. The clinical effect of ACEIs/ARBs on patients with COVID-19 is still uncertain. This paper describes their potential role in the pathogenesis of COVID-19, which may provide useful in the advice of cardiologists and physicians
Multidimensional sound propagation in 3D high-order topological sonic insulator
High-order topological insulators (TIs) develop the conventional
bulk-boundary correspondence theory and rise the interest in searching
innovative topological materials. To realize a high-order TI with a wide
passband of 1D and 2D transportation modes, we design non-trivial and trivial
3D sonic crystals whose combination mimics the Su-Schrieffer-Heeger model. The
high-order topological boundary states can be found at the interfaces,
including 0D corner state, 1D hinge state, and 2D surface state. The fabricated
sample with the bent two-dimensional and one-dimensional acoustic channels
exhibits the multidimensional sound propagation in space, and also verifies the
transition between the complete band gap, hinge states, and surface states
within the bulk band gap. Among them, the bandwidth of the single-mode hinge
state achieves a large relative bandwidth 9.1%, in which sound transports
one-dimensionally without significant leak into the surfaces or the bulk. The
high-order topological states in the study pave the way for multidimensional
sound manipulation in space.Comment: 21 pages, 7 figure
MBrain: A Multi-channel Self-Supervised Learning Framework for Brain Signals
Brain signals are important quantitative data for understanding physiological
activities and diseases of human brain. Most existing studies pay attention to
supervised learning methods, which, however, require high-cost clinical labels.
In addition, the huge difference in the clinical patterns of brain signals
measured by invasive (e.g., SEEG) and non-invasive (e.g., EEG) methods leads to
the lack of a unified method. To handle the above issues, we propose to study
the self-supervised learning (SSL) framework for brain signals that can be
applied to pre-train either SEEG or EEG data. Intuitively, brain signals,
generated by the firing of neurons, are transmitted among different connecting
structures in human brain. Inspired by this, we propose MBrain to learn
implicit spatial and temporal correlations between different channels (i.e.,
contacts of the electrode, corresponding to different brain areas) as the
cornerstone for uniformly modeling different types of brain signals.
Specifically, we represent the spatial correlation by a graph structure, which
is built with proposed multi-channel CPC. We theoretically prove that
optimizing the goal of multi-channel CPC can lead to a better predictive
representation and apply the instantaneou-time-shift prediction task based on
it. Then we capture the temporal correlation by designing the
delayed-time-shift prediction task. Finally, replace-discriminative-learning
task is proposed to preserve the characteristics of each channel. Extensive
experiments of seizure detection on both EEG and SEEG large-scale real-world
datasets demonstrate that our model outperforms several state-of-the-art time
series SSL and unsupervised models, and has the ability to be deployed to
clinical practice
- …