9 research outputs found
Quaternion generative adversarial networks
Latest Generative Adversarial Networks (GANs) are gathering outstanding results through a large-scale training, thus employing models composed of millions of parameters requiring extensive computational capabilities. Building such huge models undermines their replicability and increases the training instability. Moreover, multi-channel data, such as images or audio, are usually processed by real-valued convolutional networks that flatten and concatenate the input, often losing intra-channel spatial relations. To address these issues related to complexity and information loss, we propose a family of quaternion-valued generative adversarial networks (QGANs). QGANs exploit the properties of quaternion algebra, e.g., the Hamilton product, that allows to process channels as a single entity and capture internal latent relations, while reducing by a factor of 4 the overall number of parameters. We show how to design QGANs and to extend the proposed approach even to advanced models. We compare the proposed QGANs with real-valued counterparts on several image generation benchmarks. Results show that QGANs are able to obtain better FID scores than real-valued GANs and to generate visually pleasing images. Furthermore, QGANs save up to 75% of the training parameters. We believe these results may pave the way to novel, more accessible, GANs capable of improving performance and saving computational resources
PHNNs: Lightweight Neural Networks via Parameterized Hypercomplex Convolutions
Hypercomplex neural networks have proven to reduce the overall number of parameters while ensuring valuable performance by leveraging the properties of Clifford algebras. Recently, hypercomplex linear layers have been further improved by involving efficient parameterized Kronecker products. In this article, we define the parameterization of hypercomplex convolutional layers and introduce the family of parameterized hypercomplex neural networks (PHNNs) that are lightweight and efficient large-scale models. Our method grasps the convolution rules and the filter organization directly from data without requiring a rigidly predefined domain structure to follow. PHNNs are flexible to operate in any user-defined or tuned domain, from 1-D to nD regardless of whether the algebra rules are preset. Such a malleability allows processing multidimensional inputs in their natural domain without annexing further dimensions, as done, instead, in quaternion neural networks (QNNs) for 3-D inputs like color images. As a result, the proposed family of PHNNs operates with 1/n free parameters as regards its analog in the real domain. We demonstrate the versatility of this approach to multiple domains of application by performing experiments on various image datasets and audio datasets in which our method outperforms real and quaternion-valued counterparts
Hypercomplex Image-to-Image Translation
Image-to-image translation (I2I) aims at transferring the content
representation from an input domain to an output one, bouncing along different
target domains. Recent I2I generative models, which gain outstanding results in
this task, comprise a set of diverse deep networks each with tens of million
parameters. Moreover, images are usually three-dimensional being composed of
RGB channels and common neural models do not take dimensions correlation into
account, losing beneficial information. In this paper, we propose to leverage
hypercomplex algebra properties to define lightweight I2I generative models
capable of preserving pre-existing relations among image dimensions, thus
exploiting additional input information. On manifold I2I benchmarks, we show
how the proposed Quaternion StarGANv2 and parameterized hypercomplex StarGANv2
(PHStarGANv2) reduce parameters and storage memory amount while ensuring high
domain translation performance and good image quality as measured by FID and
LPIPS scores. Full code is available at: https://github.com/ispamm/HI2I
Enhancing Semantic Communication with Deep Generative Models -- An ICASSP Special Session Overview
Semantic communication is poised to play a pivotal role in shaping the
landscape of future AI-driven communication systems. Its challenge of
extracting semantic information from the original complex content and
regenerating semantically consistent data at the receiver, possibly being
robust to channel corruptions, can be addressed with deep generative models.
This ICASSP special session overview paper discloses the semantic communication
challenges from the machine learning perspective and unveils how deep
generative models will significantly enhance semantic communication frameworks
in dealing with real-world complex data, extracting and exploiting semantic
information, and being robust to channel corruptions. Alongside establishing
this emerging field, this paper charts novel research pathways for the next
generative semantic communication frameworks.Comment: Submitted to IEEE ICASS
Diffusion models for audio semantic communication
Directly sending audio signals from a transmitter to a receiver across a
noisy channel may absorb consistent bandwidth and be prone to errors when
trying to recover the transmitted bits. On the contrary, the recent semantic
communication approach proposes to send the semantics and then regenerate
semantically consistent content at the receiver without exactly recovering the
bitstream. In this paper, we propose a generative audio semantic communication
framework that faces the communication problem as an inverse problem, therefore
being robust to different corruptions. Our method transmits lower-dimensional
representations of the audio signal and of the associated semantics to the
receiver, which generates the corresponding signal with a particular focus on
its meaning (i.e., the semantics) thanks to the conditional diffusion model at
its core. During the generation process, the diffusion model restores the
received information from multiple degradations at the same time including
corruption noise and missing parts caused by the transmission over the noisy
channel. We show that our framework outperforms competitors in a real-world
scenario and with different channel conditions. Visit the project page to
listen to samples and access the code:
https://ispamm.github.io/diffusion-audio-semantic-communication/.Comment: Submitted to IEEE ICASSP 202
Dual Quaternion Ambisonics Array for Six-Degree-of-Freedom Acoustic Representation
Spatial audio methods are gaining a growing interest due to the spread of
immersive audio experiences and applications, such as virtual and augmented
reality. For these purposes, 3D audio signals are often acquired through arrays
of Ambisonics microphones, each comprising four capsules that decompose the
sound field in spherical harmonics. In this paper, we propose a dual quaternion
representation of the spatial sound field acquired through an array of two
First Order Ambisonics (FOA) microphones. The audio signals are encapsulated in
a dual quaternion that leverages quaternion algebra properties to exploit
correlations among them. This augmented representation with 6 degrees of
freedom (6DOF) involves a more accurate coverage of the sound field, resulting
in a more precise sound localization and a more immersive audio experience. We
evaluate our approach on a sound event localization and detection (SELD)
benchmark. We show that our dual quaternion SELD model with temporal
convolution blocks (DualQSELD-TCN) achieves better results with respect to real
and quaternion-valued baselines thanks to our augmented representation of the
sound field. Full code is available at:
https://github.com/ispamm/DualQSELD-TCN.Comment: Paper under consideration at Elsevier Pattern Recognition Letter
Exploring the vaccine conversation on TikTok in Italy: beyond classic vaccine stances
TikTok, a social media platform for creating and sharing short videos, has seen a surge in popularity during the COVID-19 pandemic. To analyse the Italian vaccine conversation on TikTok, we downloaded a sample of videos with a high play count (Top Videos), identified through an unofficial Application Programming Interface (consistent with TikTok’s Terms of Service), and collected public videos from vaccine sceptic users through snowball sampling (Vaccine Sceptics’ videos). The videos were analysed using qualitative and quantitative methods, in terms of vaccine stance, tone of voice, topic, conformity with TikTok style, and other characteristics. The final datasets consisted of 754 Top Videos (by 510 single users) plus 180 Vaccine Sceptics’ videos (by 29 single users), posted between January 2020 and March 2021. In 40.5% of the Top Videos the stance was promotional, 33.9% were indefinite-ironic, 11.3% were neutral, 9.7% were discouraging, and 3.1% were ambiguous (i.e. expressing an ambivalent stance towards vaccines); 43% of promotional videos were from healthcare professionals. More than 95% of the Vaccine Sceptic videos were discouraging. Multiple correspondence analysis showed that, compared to other stances, promotional videos were more frequently created by healthcare professionals and by females, and their most frequent topic was herd immunity. Discouraging videos were associated with a polemical tone of voice and their topics were conspiracy and freedom of choice. Our analysis shows that Italian vaccine-sceptic users on TikTok are limited in number and vocality, and the large proportion of videos with an indefinite-ironic stance might imply that the incidence of affective polarisation could be lower on TikTok, compared to other social media, in the Italian context. Safety is the most frequent concern of users, and we recorded an interesting presence of healthcare professionals among the creators. TikTok should be considered as a medium for vaccine communication and for vaccine promotion campaigns
Efficient Sound Event Localization and Detection in the Quaternion Domain
In recent years, several approaches have been proposed for the task of Sound Event Localization and Detection (SELD) with multiple overlapping sound events in the 3D sound field. However, accuracy improvements have been often achieved at the expense of more complex networks and a larger number of parameters. In this paper, we propose an efficient and lightweight Quaternion Temporal Convolutional Network for the SELD task (QSELD-TCN), which combines the advantages of the quaternion-valued processing and the effectiveness of the Temporal Convolutional Network (TCN). The proposed approach involves a representation of the Ambisonic signal components as a single quaternion and, accordingly, the use of quaternion-valued layers through the whole structure of the neural network. This results in a considerable saving of parameters with respect to the corresponding real-valued model. In particular, a quaternion implementation of the TCN block is presented, exploiting TCN ability in capturing long-term dependencies and the effectiveness of quaternion convolutional layers in grasping correlations among input dimensions. The proposed approach implies less runtime memory and lower storage memory, and it achieves faster inference time with respect to the state-of-the-art methods, making its implementation possible even in devices with limited resources