5 research outputs found

    CnnSound: Convolutional Neural Networks for the Classification of Environmental Sounds

    Get PDF
    The classification of environmental sounds (ESC) has been increasingly studied in recent years. The main reason is that environmental sounds are part of our daily life, and associating them with our environment that we live in is important in several aspects as ESC is used in areas such as managing smart cities, determining location from environmental sounds, surveillance systems, machine hearing, environment monitoring. The ESC is however more difficult than other sounds because there are too many parameters that generate background noise in the ESC, which makes the sound more difficult to model and classify. The main aim of this study is therefore to develop more robust convolution neural networks architecture (CNN). For this purpose, 150 different CNN-based models were designed by changing the number of layers and values of their tuning parameters used in the layers. In order to test the accuracy of the models, the Urbansound8k environmental sound database was used. The sounds in this data set were first converted into an image format of 32x32x3. The proposed CNN model has yielded an accuracy of as much as 82.5% being higher than its classical counterpart. As there was not that much fine-tuning, the obtained accuracy has been found to be better and satisfactory compared to other studies on the Urbansound8k when both accuracy and computational complexity are considered. The results also suggest further improvement possible due to low complexity of the proposed CNN architecture and its applicability in real-world settings

    Transfer Learning for Improved Audio-Based Human Activity Recognition

    Get PDF
    Human activities are accompanied by characteristic sound events, the processing of which might provide valuable information for automated human activity recognition. This paper presents a novel approach addressing the case where one or more human activities are associated with limited audio data, resulting in a potentially highly imbalanced dataset. Data augmentation is based on transfer learning; more specifically, the proposed method: (a) identifies the classes which are statistically close to the ones associated with limited data; (b) learns a multiple input, multiple output transformation; and (c) transforms the data of the closest classes so that it can be used for modeling the ones associated with limited data. Furthermore, the proposed framework includes a feature set extracted out of signal representations of diverse domains, i.e., temporal, spectral, and wavelet. Extensive experiments demonstrate the relevance of the proposed data augmentation approach under a variety of generative recognition schemes

    Caracteritzaci贸 de l'impacte dels esdeveniments ac煤stics en els nivells equivalents sonors i en la percepci贸 dels ciutadans per a la confecci贸 de mapes din脿mics de soroll

    Get PDF
    La contaminaci贸 ac煤stica ha esdevingut un greu problema de salut p煤blica, provocant diversos tipus de malalties i trastorns en les persones. Segons l'Organitzaci贸 Mundial de la Salut, cada any es perden a l'Europa occidental, un mili贸 d'anys de vida saludables per culpa de l'exposici贸 al soroll ambiental. Per tal d'avaluar i gestionar el soroll ambiental a la Uni贸 Europea, la directiva END 2002/49/CE requereix als estats membres la preparaci贸 i publicaci贸 de mapes de soroll actualitzats i els plans d'acci贸 relatius, cada cinc anys. Aix貌 inclou aglomeracions de m茅s de 100.000 habitants i les principals carreteres, vies de tren i aeroports. Gr脿cies als avan莽os tecnol貌gics recents, el paradigma de creaci贸 de mapes de soroll ha canviat substancialment, permetent l'automatitzaci贸 de les mesures dels nivells sonors utilitzant xarxes de sensors ac煤stics sense fils per a la generaci贸 de mapes de soroll en temps real. Aix铆 i tot, aquestes xarxes no poden prevenir una s猫rie de situacions que esbiaixarien la mesura real dels nivells equivalents sonors, ocasionant que el mapa no sigui fidel a la realitat que percep el ciutad脿, p. ex., el so de les aus, de la ind煤stria, els cl脿xons, les sirenes, les converses que ocorren prop dels sensors o fen貌mens meteorol貌gics com la pluja i el vent. Aquesta tesi estudia la caracteritzaci贸 dels esdeveniments ac煤stics per a la confecci贸 de mapes din脿mics de soroll de tr脿nsit. L'estudi comen莽a presentant el context de la tesi, el projecte LIFE DYNAMAP, que pret茅n mesurar els nivells de soroll de tr脿nsit en dues 脿rees pilot i integrar-los din脿micament en un mapa de soroll que s'actualitza a temps real. A continuaci贸, es presenta una an脿lisi exhaustiva dels esdeveniments que es troben en les dues 脿rees, la urbana i la suburbana, i s'hi apliquen diverses caracteritzacions. Una de les mesures que es presenta 茅s la de l'impacte en el nivell equivalent sonor (Leq), que permet mesurar el biaix que provoca la pres猫ncia de certs esdeveniments ac煤stics en la confecci贸 dels mapes de soroll de tr脿nsit. Tamb茅 es planteja l'煤s de tests perceptius mitjan莽ant m猫triques psicoac煤stiques per tal d'adaptar la caracteritzaci贸 d'aquests esdeveniments a la percepci贸 ciutadana. L'objectiu principal de la tesi 茅s caracteritzar els esdeveniments d'entorns urbans i suburbans per oferir mapes de soroll m茅s fidels a la realitat percebuda pel ciutad脿 en relaci贸 amb el paisatge sonor on es troba. I durant la tesi es mostra la import脿ncia de la detecci贸 de sons en una xarxa de sensors ac煤stics per tal de prevenir errors de mesura en els nivells equivalents i la necessitat d'entrenar el sistema de detecci贸 amb dades obtingudes en els mateixos sensors de la xarxa.La contaminaci贸n ac煤stica se ha convertido en un grave problema de salud p煤blica, provocando varios tipos de enfermedades y trastornos en las personas. Seg煤n la Organizaci贸n Mundial de la Salud, cada a帽o se pierden en la Europa occidental, un mill贸n de a帽os de vida saludables por culpa de la exposici贸n al ruido ambiental. Para evaluar y gestionar el ruido ambiental en la Uni贸n Europea, la directiva END 2002/49/CE requiere a los estados miembros la preparaci贸n y publicaci贸n de mapas de ruido actualizados y los planes de acci贸n relativos, cada cinco a帽os. Esto incluye aglomeraciones de m谩s de 100.000 habitantes y las principales carreteras, v铆as de tren y aeropuertos. Gracias a los avances tecnol贸gicos recientes, el paradigma de creaci贸n de mapas de ruido ha cambiado sustancialmente, permitiendo la automatizaci贸n de las medidas de los niveles sonoros utilizando redes de sensores ac煤sticos inal谩mbricos para la generaci贸n de mapas de ruido en tiempo real. Aun as铆, estas redes no pueden prevenir una serie de situaciones que sesgar铆an la medida real de los niveles equivalentes sonoros, ocasionando que el mapa no sea fiel a la realidad que percibe el ciudadano, p. ej., el sonido de las aves, de la industria, los cl谩xones, las sirenas, las conversaciones que ocurren cerca de los sensores o fen贸menos meteorol贸gicos como la lluvia y el viento. Esta tesis estudia la caracterizaci贸n de los eventos ac煤sticos para la confecci贸n de mapas din谩micos de ruido de tr谩fico. El estudio empieza presentando el contexto de la tesis, el proyecto LIFE DYNAMAP, que pretende mesurar los niveles de ruido de tr谩fico en dos 谩reas piloto e integrarlos din谩micamente en un mapa de ruido que se actualiza a tiempo real. A continuaci贸n, se presenta un an谩lisis exhaustivo de los acontecimientos que se encuentran en las dos 谩reas, la urbana y la suburbana, y se aplican varias caracterizaciones. Una de las medidas que se presenta es la del impacto en el nivel equivalente sonoro (Leq), que permite mesurar el sesgo que provoca la presencia de ciertos acontecimientos ac煤sticos en la confecci贸n de los mapas de ruido de tr谩fico. Tambi茅n se plantea el uso de macetas perceptivas mediante m茅tricas psicoac煤sticas para adaptar la caracterizaci贸n de estos eventos a la percepci贸n ciudadana. El objetivo principal de la tesis es caracterizar los acontecimientos de entornos urbanos y suburbanos para ofrecer mapas de ruido m谩s fieles a la realidad percibida por el ciudadano en relaci贸n con el paisaje sonoro donde se encuentra. Y durante la tesis se muestra la importancia de la detecci贸n de sonidos en una red de sensores ac煤sticos para prevenir errores de medida en los niveles equivalentes y la necesidad de entrenar el sistema de detecci贸n con datos obtenidos en los mismos sensores de la red.Acoustic pollution has become a serious public health problem, causing various types of disease and disorders in people. According to the World Health Organisation, one million years of healthy life are lost in Western Europe every year due to exposure to environmental noise. In order to evaluate and manage environmental noise in the European Union, Directive END 2002/49/EC requires Member States to prepare and publish updated noise maps and relative action plans every five years. This includes agglomerations of more than 100,000 inhabitants and major roads, train tracks and airports. Thanks to recent technological advances, the noise map creation paradigm has changed substantially, allowing noise level measurements to be automated using wireless acoustic sensor networks for real-time noise map generation. However, these networks cannot prevent a series of situations that would bias the actual measurement of sound equivalent levels, causing the map not to be true to the reality perceived by the citizen, e.g., the sound of birds, the industry, the claxons, the mermaids, conversations that occur near sensors or weather phenomena such as rain and wind. This thesis studies the characterization of acoustic events for the tailoring of dynamic traffic noise maps. The study begins by presenting the context of the thesis, the LIFE DYNAMAP project, which aims to measure traffic noise levels in two pilot areas and dynamically integrate them into a noise map that is updated in real time. After that, a detailed analysis is presented for the events in the two areas, urban and suburban, and various characterizations are applied. One of the presented measures is the impact on the equivalent sound level (Leq), which allows the measurement of bias resulting from the presence of certain acoustic events in the making of traffic noise maps. The use of perceptual tests using psychoacoustic metrics is also considered in order to adapt the characterization of these events to citizen perception. The main purpose of the thesis is to characterize the events of urban and suburban environments to offer noise maps more faithful to the reality perceived by the citizen in relation to the sound environment where it is found. And during the thesis, the importance of sound detection on a network of acoustic sensors is shown in order to prevent measurement errors at equivalent levels and the need to train the detection system with data obtained from the same sensors on the network

    Universal background modeling for acoustic surveillance of urban traffic

    No full text
    Traffic congestion in modern cities is an increasing problem having significant consequences in our daily lives. This work proposes a non-intrusive, passive monitoring framework based on the acoustic modality which can be used either autonomously or as a part of a multimodal system and provide valuable information to an intelligent transportation system. We consider a large number of audio classes which are typically encountered in urban areas. We introduce a combination of a powerful audio representation mechanism based on time, frequency and wavelet domain features with universal background modeling which leads to higher recognition accuracies and detection rates (in terms of false alarm and miss probability rates) with respect to commonly employed methodologies. The basic advantage of a class-specific model derived using the universal background modeling logic is its tolerance to data which belong to other sound classes. Another important feature of the proposed system is its ability to detect crash incidents, which apart from their catastrophic impact on human life and property, have negative consequences on the traffic flow. Our experiments are based on the concurrent usage of professional sound effect collections which include audio recordings of high quality. We thoroughly examine the performance of the proposed system on isolated sound events as well as continuous audio streams using confusion matrices and detection error trade-off curves
    corecore