Search CORE

836 research outputs found

A Comparative Analysis Of Latent Regressor Losses For Singing Voice Conversion

Author: Dixon S
O'Connor B
Sound and Music Computing
Publication venue: SMC Network
Publication date: 12/06/2023
Field of study

Previous research has shown that established techniques for spoken voice conversion (VC) do not perform as well when applied to singing voice conversion (SVC). We propose an alternative loss component in a loss function that is otherwise well-established among VC tasks, which has been shown to improve our model’s SVC performance. We first trained a singer identity embedding (SIE) network on mel-spectrograms of singer recordings to produce singer-specific variance encodings using contrastive learning. We subsequently trained a well-known autoencoder framework (AutoVC) conditioned on these SIEs, and measured differences in SVC performance when using different latent regressor loss components. We found that using this loss w.r.t. SIEs leads to better performance than w.r.t. bottleneck embeddings, where converted audio is more natural and specific towards target singers. The inclusion of this loss component has the advantage of explicitly forcing the network to reconstruct with timbral similarity, and also negates the effect of poor disentanglement in AutoVC’s bottleneck embeddings. We demonstrate peculiar diversity between computational and human evaluations on singer converted audio clips, which highlights the necessity of both. We also propose a pitch-matching mechanism between source and target singers to ensure these evaluations are not influenced by differences in pitch register

Queen Mary Research Online

LaDI-VTON: Latent Diffusion Textual-Inversion Enhanced Virtual Try-On

Author: Baldrati Alberto
Bertini Marco
Cartella Giuseppe
Cornia Marcella
Cucchiara Rita
Morelli Davide
Publication venue
Publication date: 22/05/2023
Field of study

The rapidly evolving fields of e-commerce and metaverse continue to seek innovative approaches to enhance the consumer experience. At the same time, recent advancements in the development of diffusion models have enabled generative networks to create remarkably realistic images. In this context, image-based virtual try-on, which consists in generating a novel image of a target model wearing a given in-shop garment, has yet to capitalize on the potential of these powerful generative solutions. This work introduces LaDI-VTON, the first Latent Diffusion textual Inversion-enhanced model for the Virtual Try-ON task. The proposed architecture relies on a latent diffusion model extended with a novel additional autoencoder module that exploits learnable skip connections to enhance the generation process preserving the model's characteristics. To effectively maintain the texture and details of the in-shop garment, we propose a textual inversion component that can map the visual features of the garment to the CLIP token embedding space and thus generate a set of pseudo-word token embeddings capable of conditioning the generation process. Experimental results on Dress Code and VITON-HD datasets demonstrate that our approach outperforms the competitors by a consistent margin, achieving a significant milestone for the task. Source code and trained models will be publicly released at: https://github.com/miccunifi/ladi-vton

arXiv.org e-Print Archive

Exploiting generative self-supervised learning for the assessment of biological images with lack of annotations

Author: Cardamone Dario
Di Cataldo Santa
Ficarra Elisa
Mascolini Alessio
Ponzio Francesco
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2022
Field of study

Computer-aided analysis of biological images typically requires extensive training on large-scale annotated datasets, which is not viable in many situations. In this paper, we present Generative Adversarial Network Discriminator Learner (GAN-DL), a novel self-supervised learning paradigm based on the StyleGAN2 architecture, which we employ for self-supervised image representation learning in the case of fluorescent biological images

PubMed Central

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

Archivio istituzionale della ricerca - Università di Modena e Reggio Emilia

Institutional Research Information System University of Turin