9 research outputs found

    Quaternion generative adversarial networks

    Get PDF
    Latest Generative Adversarial Networks (GANs) are gathering outstanding results through a large-scale training, thus employing models composed of millions of parameters requiring extensive computational capabilities. Building such huge models undermines their replicability and increases the training instability. Moreover, multi-channel data, such as images or audio, are usually processed by real-valued convolutional networks that flatten and concatenate the input, often losing intra-channel spatial relations. To address these issues related to complexity and information loss, we propose a family of quaternion-valued generative adversarial networks (QGANs). QGANs exploit the properties of quaternion algebra, e.g., the Hamilton product, that allows to process channels as a single entity and capture internal latent relations, while reducing by a factor of 4 the overall number of parameters. We show how to design QGANs and to extend the proposed approach even to advanced models. We compare the proposed QGANs with real-valued counterparts on several image generation benchmarks. Results show that QGANs are able to obtain better FID scores than real-valued GANs and to generate visually pleasing images. Furthermore, QGANs save up to 75% of the training parameters. We believe these results may pave the way to novel, more accessible, GANs capable of improving performance and saving computational resources

    PHNNs: Lightweight Neural Networks via Parameterized Hypercomplex Convolutions

    Get PDF
    Hypercomplex neural networks have proven to reduce the overall number of parameters while ensuring valuable performance by leveraging the properties of Clifford algebras. Recently, hypercomplex linear layers have been further improved by involving efficient parameterized Kronecker products. In this article, we define the parameterization of hypercomplex convolutional layers and introduce the family of parameterized hypercomplex neural networks (PHNNs) that are lightweight and efficient large-scale models. Our method grasps the convolution rules and the filter organization directly from data without requiring a rigidly predefined domain structure to follow. PHNNs are flexible to operate in any user-defined or tuned domain, from 1-D to nD regardless of whether the algebra rules are preset. Such a malleability allows processing multidimensional inputs in their natural domain without annexing further dimensions, as done, instead, in quaternion neural networks (QNNs) for 3-D inputs like color images. As a result, the proposed family of PHNNs operates with 1/n free parameters as regards its analog in the real domain. We demonstrate the versatility of this approach to multiple domains of application by performing experiments on various image datasets and audio datasets in which our method outperforms real and quaternion-valued counterparts

    Hypercomplex Image-to-Image Translation

    Get PDF
    Image-to-image translation (I2I) aims at transferring the content representation from an input domain to an output one, bouncing along different target domains. Recent I2I generative models, which gain outstanding results in this task, comprise a set of diverse deep networks each with tens of million parameters. Moreover, images are usually three-dimensional being composed of RGB channels and common neural models do not take dimensions correlation into account, losing beneficial information. In this paper, we propose to leverage hypercomplex algebra properties to define lightweight I2I generative models capable of preserving pre-existing relations among image dimensions, thus exploiting additional input information. On manifold I2I benchmarks, we show how the proposed Quaternion StarGANv2 and parameterized hypercomplex StarGANv2 (PHStarGANv2) reduce parameters and storage memory amount while ensuring high domain translation performance and good image quality as measured by FID and LPIPS scores. Full code is available at: https://github.com/ispamm/HI2I

    Enhancing Semantic Communication with Deep Generative Models -- An ICASSP Special Session Overview

    Full text link
    Semantic communication is poised to play a pivotal role in shaping the landscape of future AI-driven communication systems. Its challenge of extracting semantic information from the original complex content and regenerating semantically consistent data at the receiver, possibly being robust to channel corruptions, can be addressed with deep generative models. This ICASSP special session overview paper discloses the semantic communication challenges from the machine learning perspective and unveils how deep generative models will significantly enhance semantic communication frameworks in dealing with real-world complex data, extracting and exploiting semantic information, and being robust to channel corruptions. Alongside establishing this emerging field, this paper charts novel research pathways for the next generative semantic communication frameworks.Comment: Submitted to IEEE ICASS

    Diffusion models for audio semantic communication

    Full text link
    Directly sending audio signals from a transmitter to a receiver across a noisy channel may absorb consistent bandwidth and be prone to errors when trying to recover the transmitted bits. On the contrary, the recent semantic communication approach proposes to send the semantics and then regenerate semantically consistent content at the receiver without exactly recovering the bitstream. In this paper, we propose a generative audio semantic communication framework that faces the communication problem as an inverse problem, therefore being robust to different corruptions. Our method transmits lower-dimensional representations of the audio signal and of the associated semantics to the receiver, which generates the corresponding signal with a particular focus on its meaning (i.e., the semantics) thanks to the conditional diffusion model at its core. During the generation process, the diffusion model restores the received information from multiple degradations at the same time including corruption noise and missing parts caused by the transmission over the noisy channel. We show that our framework outperforms competitors in a real-world scenario and with different channel conditions. Visit the project page to listen to samples and access the code: https://ispamm.github.io/diffusion-audio-semantic-communication/.Comment: Submitted to IEEE ICASSP 202

    Dual Quaternion Ambisonics Array for Six-Degree-of-Freedom Acoustic Representation

    Full text link
    Spatial audio methods are gaining a growing interest due to the spread of immersive audio experiences and applications, such as virtual and augmented reality. For these purposes, 3D audio signals are often acquired through arrays of Ambisonics microphones, each comprising four capsules that decompose the sound field in spherical harmonics. In this paper, we propose a dual quaternion representation of the spatial sound field acquired through an array of two First Order Ambisonics (FOA) microphones. The audio signals are encapsulated in a dual quaternion that leverages quaternion algebra properties to exploit correlations among them. This augmented representation with 6 degrees of freedom (6DOF) involves a more accurate coverage of the sound field, resulting in a more precise sound localization and a more immersive audio experience. We evaluate our approach on a sound event localization and detection (SELD) benchmark. We show that our dual quaternion SELD model with temporal convolution blocks (DualQSELD-TCN) achieves better results with respect to real and quaternion-valued baselines thanks to our augmented representation of the sound field. Full code is available at: https://github.com/ispamm/DualQSELD-TCN.Comment: Paper under consideration at Elsevier Pattern Recognition Letter

    Exploring the vaccine conversation on TikTok in Italy: beyond classic vaccine stances

    Get PDF
    TikTok, a social media platform for creating and sharing short videos, has seen a surge in popularity during the COVID-19 pandemic. To analyse the Italian vaccine conversation on TikTok, we downloaded a sample of videos with a high play count (Top Videos), identified through an unofficial Application Programming Interface (consistent with TikTok’s Terms of Service), and collected public videos from vaccine sceptic users through snowball sampling (Vaccine Sceptics’ videos). The videos were analysed using qualitative and quantitative methods, in terms of vaccine stance, tone of voice, topic, conformity with TikTok style, and other characteristics. The final datasets consisted of 754 Top Videos (by 510 single users) plus 180 Vaccine Sceptics’ videos (by 29 single users), posted between January 2020 and March 2021. In 40.5% of the Top Videos the stance was promotional, 33.9% were indefinite-ironic, 11.3% were neutral, 9.7% were discouraging, and 3.1% were ambiguous (i.e. expressing an ambivalent stance towards vaccines); 43% of promotional videos were from healthcare professionals. More than 95% of the Vaccine Sceptic videos were discouraging. Multiple correspondence analysis showed that, compared to other stances, promotional videos were more frequently created by healthcare professionals and by females, and their most frequent topic was herd immunity. Discouraging videos were associated with a polemical tone of voice and their topics were conspiracy and freedom of choice. Our analysis shows that Italian vaccine-sceptic users on TikTok are limited in number and vocality, and the large proportion of videos with an indefinite-ironic stance might imply that the incidence of affective polarisation could be lower on TikTok, compared to other social media, in the Italian context. Safety is the most frequent concern of users, and we recorded an interesting presence of healthcare professionals among the creators. TikTok should be considered as a medium for vaccine communication and for vaccine promotion campaigns

    Efficient Sound Event Localization and Detection in the Quaternion Domain

    No full text
    In recent years, several approaches have been proposed for the task of Sound Event Localization and Detection (SELD) with multiple overlapping sound events in the 3D sound field. However, accuracy improvements have been often achieved at the expense of more complex networks and a larger number of parameters. In this paper, we propose an efficient and lightweight Quaternion Temporal Convolutional Network for the SELD task (QSELD-TCN), which combines the advantages of the quaternion-valued processing and the effectiveness of the Temporal Convolutional Network (TCN). The proposed approach involves a representation of the Ambisonic signal components as a single quaternion and, accordingly, the use of quaternion-valued layers through the whole structure of the neural network. This results in a considerable saving of parameters with respect to the corresponding real-valued model. In particular, a quaternion implementation of the TCN block is presented, exploiting TCN ability in capturing long-term dependencies and the effectiveness of quaternion convolutional layers in grasping correlations among input dimensions. The proposed approach implies less runtime memory and lower storage memory, and it achieves faster inference time with respect to the state-of-the-art methods, making its implementation possible even in devices with limited resources

    Contributory presentations/posters

    No full text
    corecore