5 research outputs found

    A comparison of audio-based deep learning methods for detecting anomalous road events

    Get PDF
    Road surveillance systems have an important role in monitoring roads and safeguarding their users. Many of these systems are based on video streams acquired from urban video surveillance infrastructures, from which it is possible to reconstruct the dynamics of accidents and detect other events. However, such systems may lack accuracy in adverse environmental settings: for instance, poor lighting, weather conditions, and occlusions can reduce the effectiveness of the automatic detection and consequently increase the rate of false or missed alarms. These issues can be mitigated by integrating such solutions with audio analysis modules, that can improve the ability to recognize distinctive events such as car crashes. For this purpose, in this work we propose a preliminary analysis of solutions based on Deep Learning techniques for the automatic identification of hazardous events through the analysis of audio spectrograms

    Elaborazione audio dei Segnali con reti neurali profonde per la rilevazione di situazioni di pericolo

    Get PDF
    Nei sistemi di sorveglianza moderni, soluzioni composte dall’unione di telecamere a circuito chiuso e tecniche di intelligenza artificiale, rappresentano lo strumento principale per fronteggiare minacce e pericoli in diversi ambienti: ambienti pubblici, abitazioni private, uffici, strutture critiche come ospedali o scuole. Questi sistemi vengono equipaggiati da robuste tecniche di computer vision, le quali permettono di riconoscere e rilevare oggetti e persone, attraverso sequenze di immagini in maniera automatica. L’obiettivo è predire l’azione degli elementi osservati in un determinato scenario per aumentare l’efficienza globale di un sistema di sorveglianza. Tuttavia, l’analisi delle immagini può subire importanti cali di prestazioni in diverse circostanze, dovuti alla natura dei sensori video e dalle limitazioni che essi introducono. Nel progetto di tesi presentato, si discute lo sviluppo di un sistema di riconoscimento di situazioni di pericolo i cui dati elaborati sono acquisiti da sensori audio. Negli ultimi anni, la sorveglianza audio ha riscosso un grande interesse grazie alla flessibilità di utilizzo, sia per la diversità delle situazioni in cui può essere impiegata, sia per la possibilità di essere combinata con la controparte video in sistemi ibridi. Il sistema proposto è costituito da una rete neurale convoluzionale, la cui architettura si ispira fortemente alla VGG19. Al suo ingresso vengono fornite immagini costruite a partire da porzioni di stream audio e trasformate in rappresentazioni tempo-frequenza quali: spettrogramma, spettrogramma in scala Mel e gammatonogramma. L’obiettivo è stato quello di costruire un modello di classificazione di eventi audio di pericolo, per i quali si sono considerati suoni come: vetri che si infrangono, colpi di pistola e urla. Successivamente si è condotto un confronto sia tra le performance indotte dall’utilizzo delle tre rappresentazioni, sia tra la rete neurale e una tecnica di classificazione standard quale l’SV

    Detecting Sounds of Interest in Roads with Deep Networks

    No full text
    Monitoring of public and private places is of great importance for security of people and is usually done by means of surveillance cameras. In this paper we propose an approach for monitoring of roads, to detect car crashes and tire skidding, based on the analysis of sound signals, which can complement or, in some cases, substitute video analytic systems. The system that we propose employs a MobileNet deep architecture, designed to efficiently run on embedded appliances and be deployed on distributed systems for road monitoring. We designed a recognition system based on analysis of audio frames and tested it on the publicly available MIVIA road events data set. The performance results that we achieved (recognition rate higher than 99%) are higher than existing methods, demonstrating that the proposed approach can be deployed on embedded devices in a distributed surveillance system

    Detecting Sounds of Interest in Roads with Deep Networks

    Get PDF
    Monitoring of public and private places is of great importance for security of people and is usually done by means of surveillance cameras. In this paper we propose an approach for monitoring of roads, to detect car crashes and tire skidding, based on the analysis of sound signals, which can complement or, in some cases, substitute video analytic systems. The system that we propose employs a MobileNet deep architecture, designed to efficiently run on embedded appliances and be deployed on distributed systems for road monitoring. We designed a recognition system based on analysis of audio frames and tested it on the publicly available MIVIA road events data set. The performance results that we achieved (recognition rate higher than 99%) are higher than existing methods, demonstrating that the proposed approach can be deployed on embedded devices in a distributed surveillance system.</p

    Gaze-Based Human-Robot Interaction by the Brunswick Model

    Get PDF
    We present a new paradigm for human-robot interaction based on social signal processing, and in particular on the Brunswick model. Originally, the Brunswick model copes with face-to-face dyadic interaction, assuming that the interactants are communicating through a continuous exchange of non verbal social signals, in addition to the spoken messages. Social signals have to be interpreted, thanks to a proper recognition phase that considers visual and audio information. The Brunswick model allows to quantitatively evaluate the quality of the interaction using statistical tools which measure how effective is the recognition phase. In this paper we cast this theory when one of the interactants is a robot; in this case, the recognition phase performed by the robot and the human have to be revised w.r.t. the original model. The model is applied to Berrick, a recent open-source low-cost robotic head platform, where the gazing is the social signal to be considered
    corecore