Search CORE

36 research outputs found

Advances in complex systems and their applications to cybersecurity

Author: Comminiello D.
Krzemien A.
Sanchez Lasheras F.
Publication venue: 'Hindawi Limited'
Publication date: 01/01/2019
Field of study

Cybersecurity is one of the fastest growing and largest technology sectors and is increasingly being recognized as one of the major issues in many industries, so companies are increasing their security budgets in order to guarantee the security of their processes. Successful menaces to the security of information systems could lead to safety, environmental, production, and quality problems. One of the most harmful issues of attacks and intrusions is the ever-changing nature of attack technologies and strategies, which increases the difficulty of protecting computer systems. As a result, advanced systems are required to deal with the ever-increasing complexity of attacks in order to protect systems and information

Archivio della ricerca- Università di Roma La Sapienza

Compressing deep-quaternion neural networks with targeted regularisation

Author: Comminiello D.
Scardapane S.
Uncini A.
Vecchi R.
Publication venue: 'Institution of Engineering and Technology (IET)'
Publication date: 01/01/2020
Field of study

In recent years, hyper-complex deep networks (such as complex-valued and quaternion-valued neural networks - QVNNs) have received a renewed interest in the literature. They find applications in multiple fields, ranging from image reconstruction to 3D audio processing. Similar to their real-valued counterparts, quaternion neural networks require custom regularisation strategies to avoid overfitting. In addition, for many real-world applications and embedded implementations, there is the need of designing sufficiently compact networks, with few weights and neurons. However, the problem of regularising and/or sparsifying QVNNs has not been properly addressed in the literature as of now. In this study, the authors show how to address both problems by designing targeted regularisation strategies, which can minimise the number of connections and neurons of the network during training. To this end, they investigate two extensions of l1and structured regularisations to the quaternion domain. In the authors' experimental evaluation, they show that these tailored strategies significantly outperform classical (realvalued) regularisation approaches, resulting in small networks especially suitable for low-power and real-time applications

Archivio della ricerca- Università di Roma La Sapienza

Visual odometry with depth-wise separable convolution and quaternion neural networks

Author: Comminiello D.
De Magistris G.
Napoli C.
Starczewski J. T.
Publication venue: CEUR-WS
Publication date: 01/01/2023
Field of study

Monocular visual odometry is a fundamental problem in computer vision and it was extensively studied in literature. The vast majority of visual odometry algorithms are based on a standard pipeline consisting in feature detection, feature matching, motion estimation and local optimization. Only recently, deep learning approaches have shown cutting-edge performance, replacing the standard pipeline with an end-to-end solution. One of the main advantages of deep learning approaches over the standard methods is the reduced inference time, that is an important requirement for the application of visual odometry in real-time. Less emphasis, however, has been placed on memory requirements and training efficiency. The memory footprint, in particular, is important for real world applications such as robot navigation or autonomous driving, where the devices have limited memory resources. In this paper we tackle both aspects introducing novel architectures based on Depth-Wise Separable Convolutional Neural Network and deep Quaternion Recurrent Convolutional Neural Network. In particular, we obtain equal or better accuracy with respect to the other state-of-the-art methods on the KITTI VO dataset with a reduction of the number of parameters and a speed-up in the inference time

Archivio della ricerca- Università di Roma La Sapienza

Semantic Communications Based on Adaptive Generative Models and Information Bottleneck

Author: Barbarossa S.
Comminiello D.
Di Lorenzo P.
Grassucci E.
Pezone F.
Sardellitti S.
Publication venue
Publication date: 05/09/2023
Field of study

Semantic communications represent a significant breakthrough with respect to the current communication paradigm, as they focus on recovering the meaning behind the transmitted sequence of symbols, rather than the symbols themselves. In semantic communications, the scope of the destination is not to recover a list of symbols symbolically identical to the transmitted ones, but rather to recover a message that is semantically equivalent to the semantic message emitted by the source. This paradigm shift introduces many degrees of freedom to the encoding and decoding rules that can be exploited to make the design of communication systems much more efficient. In this paper, we present an approach to semantic communication building on three fundamental ideas: 1) represent data over a topological space as a formal way to capture semantics, as expressed through relations; 2) use the information bottleneck principle as a way to identify relevant information and adapt the information bottleneck online, as a function of the wireless channel state, in order to strike an optimal trade-off between transmit power, reconstruction accuracy and delay; 3) exploit probabilistic generative models as a general tool to adapt the transmission rate to the wireless channel state and make possible the regeneration of the transmitted images or run classification tasks at the receiver side.Comment: To appear on IEEE Communications Magazine, special issue on Semantic Communications: Transmission beyond Shannon, 202

arXiv.org e-Print Archive

SyncFusion: Multimodal Onset-synchronized Video-to-Audio Foley Synthesis

Author: Comminiello Danilo
Comunità Marco
Gramaccioni Riccardo F.
Postolache Emilian
Reiss Joshua D.
Rodolà Emanuele
Publication venue
Publication date: 23/10/2023
Field of study

Sound design involves creatively selecting, recording, and editing sound effects for various media like cinema, video games, and virtual/augmented reality. One of the most time-consuming steps when designing sound is synchronizing audio with video. In some cases, environmental recordings from video shoots are available, which can aid in the process. However, in video games and animations, no reference audio exists, requiring manual annotation of event timings from the video. We propose a system to extract repetitive actions onsets from a video, which are then used - in conjunction with audio or textual embeddings - to condition a diffusion model trained to generate a new synchronized sound effects audio track. In this way, we leave complete creative control to the sound designer while removing the burden of synchronization with video. Furthermore, editing the onset track or changing the conditioning embedding requires much less effort than editing the audio track itself, simplifying the sonification process. We provide sound examples, source code, and pretrained models to faciliate reproducibilit

arXiv.org e-Print Archive

Recommended from our members

Quaternion Anti-Transfer Learning for Speech Emotion Recognition

Author: Comminiello D.
Guizzo E.
Tarroni G.
Weyde T.
Publication venue: IEEE
Publication date: 01/01/2023
Field of study

This study explores the benefits of anti-transfer learning with quaternion neural networks for robust, effective, and efficient speech emotion recognition. Anti-transfer learning selectively promotes task invariance through the introduction of a deep feature loss at training time. It has been shown to improve the performance of speech emotion recognition models by encouraging the independence of emotion predictions from specific uttered words and characteristics of the speaker’s voice. However, the improved accuracy comes at a cost of increased computation time and memory requirements. In order to reduce the resource demand of anti-transfer, we propose to exploit quaternion-valued processing. We design, implement, and evaluate the use of quaternion anti-transfer learning on the basis of the VGG16 architecture and quaternion embeddings on multiple datasets for different speech emotion recognition task setups. The effectiveness of this approach depends on the layer where it is applied, with early layers offering a good compromise between performance gain and resource requirements. Our results show that anti-transfer in the quaternion domain can enhance generalisation while reducing the model’s demand for computation and memory

City Research Online

Archivio della ricerca- Università di Roma La Sapienza

Group sparse regularization for deep neural networks

Author: Comminiello D.
Hussain A.
Scardapane S.
Uncini A.
Publication venue: Elsevier
Publication date: 02/07/2016
Field of study

In this paper, we address the challenging task of simultaneously optimizing (i) the weights of a neural network, (ii) the number of neurons for each hidden layer, and (iii) the subset of active input features (i.e., feature selection). While these problems are traditionally dealt with separately, we propose an efficient regularized formulation enabling their simultaneous parallel execution, using standard optimization routines. Specifically, we extend the group Lasso penalty, originally proposed in the linear regression literature, to impose group-level sparsity on the network’s connections, where each group is defined as the set of outgoing weights from a unit. Depending on the specific case, the weights can be related to an input variable, to a hidden neuron, or to a bias unit, thus performing simultaneously all the aforementioned tasks in order to obtain a compact network. We carry out an extensive experimental evaluation, in comparison with classical weight decay and Lasso penalties, both on a toy dataset for handwritten digit recognition, and multiple realistic mid-scale classification benchmarks. Comparative results demonstrate the potential of our proposed sparse group Lasso penalty in producing extremely compact networks, with a significantly lower number of input features, with a classification accuracy which is equal or only slightly inferior to standard regularization terms

arXiv.org e-Print Archive

Crossref

Stirling Online Research Repository (RIOXX)

Stirling Online Research Repository

Repository@Napier

Archivio della ricerca- Università di Roma La Sapienza

Recommended from our members

Learning Speech Emotion Representations in the Quaternion Domain

Author: Comminiello D.
Guizzo E.
Scardapane S.
Weyde T.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/03/2023
Field of study

The modeling of human emotion expression in speech signals is an important, yet challenging task. The high resource demand of speech emotion recognition models, combined with the general scarcity of emotion-labelled data are obstacles to the development and application of effective solutions in this field. In this paper, we present an approach to jointly circumvent these difficulties. Our method, named RH-emo, is a novel semi-supervised architecture aimed at extracting quaternion embeddings from real-valued monoaural spectrograms, enabling the use of quaternion-valued networks for speech emotion recognition tasks. RH-emo is a hybrid real/quaternion autoencoder network that consists of a real-valued encoder in parallel to a real-valued emotion classifier and a quaternion-valued decoder. On the one hand, the classifier permits to optimization of each latent axis of the embeddings for the classification of a specific emotion-related characteristic: valence, arousal, dominance, and overall emotion. On the other hand, quaternion reconstruction enables the latent dimension to develop intra-channel correlations that are required for an effective representation as a quaternion entity. We test our approach on speech emotion recognition tasks using four popular datasets: IEMOCAP, RAVDESS, EmoDB, and TESS, comparing the performance of three well-established real-valued CNN architectures (AlexNet, ResNet-50, VGG) and their quaternion-valued equivalent fed with the embeddings created with RH-emo. We obtain a consistent improvement in the test accuracy for all datasets, while drastically reducing the resources' demand of models. Moreover, we performed additional experiments and ablation studies that confirm the effectiveness of our approach

City Research Online

L3DAS22 Challenge: Learning 3D Audio Sources in a Real Office Environment

Author: Comminiello D.
Guizzo E.
Marinoni C.
Masiero B.
Pennese M.
Ren X.
Uncini A.
Zhang C.
Zheng X.
Publication venue: place:345 E 47TH ST, NEW YORK, NY 10017 USA
Publication date: 01/01/2022
Field of study

The L3DAS22 Challenge is aimed at encouraging the development of machine learning strategies for 3D speech enhancement and 3D sound localization and detection in office-like environments. This challenge improves and extends the tasks of the L3DAS21 edition. We generated a new dataset, which maintains the same general characteristics of L3DAS21 datasets, but with an extended number of data points and adding constrains that improve the baseline model's efficiency and overcome the major difficulties encountered by the participants of the previous challenge. We updated the baseline model of Task 1, using the architecture that ranked first in the previous challenge edition. We wrote a new supporting API, improving its clarity and ease-of-use. In the end, we present and discuss the results submitted by all participants. L3DAS22 Challenge website: www.l3das.com/icassp2022

Archivio della ricerca- Università di Roma La Sapienza

Introducing complex functional link polynomial filters

Author: Carini A
Comminiello D
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2017
Field of study

The paper introduces a novel class of complex nonlinear filters, the complex functional link polynomial (CFLiP) filters. These filters present many interesting properties. They are a sub-class of linear-in-the-parameter nonlinear filters. They satisfy all the conditions of Stone-Weirstrass theorem and thus are universal approximators for causal, time-invariant, discrete-time, finite-memory, complex, continuous systems defined on a compact domain. The CFLiP basis functions separate the magnitude and phase of the input signal. Moreover, CFLiP filters include many families of nonlinear filters with orthogonal basis functions. It is shown in the experimental results that they are capable of modeling the nonlinearities of high power amplifiers of telecommunication systems with better accuracy than most of the filters currently used for this purpose

Archivio istituzionale della ricerca - Università di Trieste

Archivio istituzionale della ricerca - Università di Urbino

Archivio della ricerca- Università di Roma La Sapienza