65,365 research outputs found
Exploiting generative self-supervised learning for the assessment of biological images with lack of annotations
Computer-aided analysis of biological images typically requires extensive training on large-scale annotated datasets, which is not viable in many situations. In this paper, we present Generative Adversarial Network Discriminator Learner (GAN-DL), a novel self-supervised learning paradigm based on the StyleGAN2 architecture, which we employ for self-supervised image representation learning in the case of fluorescent biological images
Learning Representations of Emotional Speech with Deep Convolutional Generative Adversarial Networks
Automatically assessing emotional valence in human speech has historically
been a difficult task for machine learning algorithms. The subtle changes in
the voice of the speaker that are indicative of positive or negative emotional
states are often "overshadowed" by voice characteristics relating to emotional
intensity or emotional activation. In this work we explore a representation
learning approach that automatically derives discriminative representations of
emotional speech. In particular, we investigate two machine learning strategies
to improve classifier performance: (1) utilization of unlabeled data using a
deep convolutional generative adversarial network (DCGAN), and (2) multitask
learning. Within our extensive experiments we leverage a multitask annotated
emotional corpus as well as a large unlabeled meeting corpus (around 100
hours). Our speaker-independent classification experiments show that in
particular the use of unlabeled data in our investigations improves performance
of the classifiers and both fully supervised baseline approaches are
outperformed considerably. We improve the classification of emotional valence
on a discrete 5-point scale to 43.88% and on a 3-point scale to 49.80%, which
is competitive to state-of-the-art performance
Disentangled representation learning for multilingual speaker recognition
The goal of this paper is to learn robust speaker representation for
bilingual speaking scenario. The majority of the world's population speak at
least two languages; however, most speaker recognition systems fail to
recognise the same speaker when speaking in different languages.
Popular speaker recognition evaluation sets do not consider the bilingual
scenario, making it difficult to analyse the effect of bilingual speakers on
speaker recognition performance. In this paper, we publish a large-scale
evaluation set named VoxCeleb1-B derived from VoxCeleb that considers bilingual
scenarios.
We introduce an effective disentanglement learning strategy that combines
adversarial and metric learning-based methods. This approach addresses the
bilingual situation by disentangling language-related information from speaker
representation while ensuring stable speaker representation learning. Our
language-disentangled learning method only uses language pseudo-labels without
manual information.Comment: Interspeech 202
- …