Search CORE

200 research outputs found

A Deep Learning Approach for Low-Latency Packet Loss Concealment of Audio Signals in Networked Music Performance Applications

Author: Chafe Chris
Mezza Alessandro Ilic
Rottondi Cristina
Verma Prateek
Publication venue
Publication date: 01/01/2020
Field of study

Networked Music Performance (NMP) is envisioned as a potential game changer among Internet applications: it aims at revolutionizing the traditional concept of musical interaction by enabling remote musicians to interact and perform together through a telecommunication network. Ensuring realistic conditions for music performance, however, constitutes a significant engineering challenge due to extremely strict requirements in terms of audio quality and, most importantly, network delay. To minimize the end-to-end delay experienced by the musicians, typical implementations of NMP applications use un-compressed, bidirectional audio streams and leverage UDP as transport protocol. Being connection less and unreliable,audio packets transmitted via UDP which become lost in transit are not re-transmitted and thus cause glitches in the receiver audio playout. This article describes a technique for predicting lost packet content in real-time using a deep learning approach. The ability of concealing errors in real time can help mitigate audio impairments caused by packet losses, thus improving the quality of audio playout in real-world scenarios.Comment: 8 pages, 2 figure

arXiv.org e-Print Archive

Archivio istituzionale della ricerca - Politecnico di Milano

A Time-Frequency Generative Adversarial based method for Audio Packet Loss Concealment

Author: Aironi Carlo
Cornell Samuele
Serafini Luca
Squartini Stefano
Publication venue
Publication date: 28/07/2023
Field of study

Packet loss is a major cause of voice quality degradation in VoIP transmissions with serious impact on intelligibility and user experience. This paper describes a system based on a generative adversarial approach, which aims to repair the lost fragments during the transmission of audio streams. Inspired by the powerful image-to-image translation capability of Generative Adversarial Networks (GANs), we propose bin2bin, an improved pix2pix framework to achieve the translation task from magnitude spectrograms of audio frames with lost packets, to noncorrupted speech spectrograms. In order to better maintain the structural information after spectrogram translation, this paper introduces the combination of two STFT-based loss functions, mixed with the traditional GAN objective. Furthermore, we employ a modified PatchGAN structure as discriminator and we lower the concealment time by a proper initialization of the phase reconstruction algorithm. Experimental results show that the proposed method has obvious advantages when compared with the current state-of-the-art methods, as it can better handle both high packet loss rates and large gaps.Comment: Accepted at EUSIPCO - 31st European Signal Processing Conference, 202

arXiv.org e-Print Archive

Using Autoregressive Models for Real-Time Packet Loss Concealment in Networked Music Performance Applications

Author: Bianco Andrea
Huang Yuen
Rottondi Cristina
Sacchetto Matteo
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2022
Field of study

In Networked Music Performances (NMP), concealing the effects of lost/late packets on the quality of the playback audio stream is of pivotal importance to mitigate the impact of the resulting audio artifacts. Traditional packet loss concealment techniques implemented in standard audio codecs can be leveraged only at the price of an increased mouth-to-ear latency, which may easily exceed the strict delay requirements of NMP interactions. This paper investigates the adoption of a low-complexity prediction technique based on autoregressive models to fill audio gaps caused by missing packets. Numerical results show that the proposed approach outperforms packet loss concealment methods normally implemented in NMP systems, typically based on filling audio gaps with silence or repetition of the last received audio segment

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

Error resilience and concealment techniques for high-efficiency video coding

Author: Joao F.M. Carreira (7185014)
Publication venue
Publication date: 01/01/2018
Field of study

This thesis investigates the problem of robust coding and error concealment in High Efficiency Video Coding (HEVC). After a review of the current state of the art, a simulation study about error robustness, revealed that the HEVC has weak protection against network losses with significant impact on video quality degradation. Based on this evidence, the first contribution of this work is a new method to reduce the temporal dependencies between motion vectors, by improving the decoded video quality without compromising the compression efficiency. The second contribution of this thesis is a two-stage approach for reducing the mismatch of temporal predictions in case of video streams received with errors or lost data. At the encoding stage, the reference pictures are dynamically distributed based on a constrained Lagrangian rate-distortion optimization to reduce the number of predictions from a single reference. At the streaming stage, a prioritization algorithm, based on spatial dependencies, selects a reduced set of motion vectors to be transmitted, as side information, to reduce mismatched motion predictions at the decoder. The problem of error concealment-aware video coding is also investigated to enhance the overall error robustness. A new approach based on scalable coding and optimally error concealment selection is proposed, where the optimal error concealment modes are found by simulating transmission losses, followed by a saliency-weighted optimisation. Moreover, recovery residual information is encoded using a rate-controlled enhancement layer. Both are transmitted to the decoder to be used in case of data loss. Finally, an adaptive error resilience scheme is proposed to dynamically predict the video stream that achieves the highest decoded quality for a particular loss case. A neural network selects among the various video streams, encoded with different levels of compression efficiency and error protection, based on information from the video signal, the coded stream and the transmission network. Overall, the new robust video coding methods investigated in this thesis yield consistent quality gains in comparison with other existing methods and also the ones implemented in the HEVC reference software. Furthermore, the trade-off between coding efficiency and error robustness is also better in the proposed methods

Wireless End-to-End Image Transmission System using Semantic Communications

Author: Fernando Anil
Gowrisetty Vishnu
Lokumarambage Maheshi
Rajatheva Nandana
Rezaei Hossein
Sivalingam Thushan
Publication venue
Publication date: 10/04/2023
Field of study

Semantic communication is considered the future of mobile communication, which aims to transmit data beyond Shannon's theorem of communications by transmitting the semantic meaning of the data rather than the bit-by-bit reconstruction of the data at the receiver's end. The semantic communication paradigm aims to bridge the gap of limited bandwidth problems in modern high-volume multimedia application content transmission. Integrating AI technologies with the 6G communications networks paved the way to develop semantic communication-based end-to-end communication systems. In this study, we have implemented a semantic communication-based end-to-end image transmission system, and we discuss potential design considerations in developing semantic communication systems in conjunction with physical channel characteristics. A Pre-trained GAN network is used at the receiver as the transmission task to reconstruct the realistic image based on the Semantic segmented image at the receiver input. The semantic segmentation task at the transmitter (encoder) and the GAN network at the receiver (decoder) is trained on a common knowledge base, the COCO-Stuff dataset. The research shows that the resource gain in the form of bandwidth saving is immense when transmitting the semantic segmentation map through the physical channel instead of the ground truth image in contrast to conventional communication systems. Furthermore, the research studies the effect of physical channel distortions and quantization noise on semantic communication-based multimedia content transmission.Comment: Accepted for IEEE Acces

arXiv.org e-Print Archive