Scene Understanding with Sound using Artificial Intelligence Techniques

Michael Neri

thesis

oai:iris.uniroma3.it:11590/508216

Scene Understanding with Sound using Artificial Intelligence Techniques

Authors: Michael Neri
Publication date: 30 April 2025
Publisher: Università degli Studi di Roma Tre

Abstract

This PhD thesis explores the field of scene understanding using sound through artificial intelligence techniques. It addresses the challenge of extracting relevant information from sound in environments where other sensory inputs, such as vision, are limited or occluded. The work contributes novel methods and models for Acoustic Scene Classification (ASC), Sound Event Detection (SED), Unsupervised Anomalous Sound Detection (UASD), and speaker Distance Estimation, with a focus on reducing the complexity of these systems while maintaining high performance. The core of this research lies in the design of low-complexity deep learning models, such as lightweight convolutional networks and methods leveraging Chebyshev moments, which are applied to various sound recognition tasks. These models are tested in noisy environments and shown to be robust, offering state-of-the-art results while being computationally efficient. In addition to the theoretical contributions, the thesis explores practical applications of sound-based scene understanding in domains such as smart devices, security systems, and autonomous vehicles, enhancing human-computer interaction and safety. Future research potential includes the integration of multi-modal sensory data and the development of more interpretable AI systems

Similar works

Full text

Archivio della Ricerca - Università di Roma 3

oai:iris.uniroma3.it:11590/508...

Last time updated on 11/06/2025

This paper was published in Archivio della Ricerca - Università di Roma 3.

Having an issue?

Is data on this page outdated, violates copyrights or anything else? Report the problem now and we will take corresponding actions after reviewing your request.