20,152 research outputs found

    Long-term learning behavior in a recurrent neural network for sound recognition

    Get PDF
    In this paper, the long-term learning properties of an artificial neural network model, designed for sound recognition and computational auditory scene analysis in general, are investigated. The model is designed to run for long periods of time (weeks to months) on low-cost hardware, used in a noise monitoring network, and builds upon previous work by the same authors. It consists of three neural layers, connected to each other by feedforward and feedback excitatory connections. It is shown that the different mechanisms that drive auditory attention emerge naturally from the way in which neural activation and intra-layer inhibitory connections are implemented in the model. Training of the artificial neural network is done following the Hebb principle, dictating that "Cells that fire together, wire together", with some important modifications, compared to standard Hebbian learning. As the model is designed to be on-line for extended periods of time, also learning mechanisms need to be adapted to this. The learning needs to be strongly attention-and saliency-driven, in order not to waste available memory space for sounds that are of no interest to the human listener. The model also implements plasticity, in order to deal with new or changing input over time, without catastrophically forgetting what it already learned. On top of that, it is shown that also the implementation of shortterm memory plays an important role in the long-term learning properties of the model. The above properties are investigated and demonstrated by training on real urban sound recordings

    Robust sound event detection in bioacoustic sensor networks

    Full text link
    Bioacoustic sensors, sometimes known as autonomous recording units (ARUs), can record sounds of wildlife over long periods of time in scalable and minimally invasive ways. Deriving per-species abundance estimates from these sensors requires detection, classification, and quantification of animal vocalizations as individual acoustic events. Yet, variability in ambient noise, both over time and across sensors, hinders the reliability of current automated systems for sound event detection (SED), such as convolutional neural networks (CNN) in the time-frequency domain. In this article, we develop, benchmark, and combine several machine listening techniques to improve the generalizability of SED models across heterogeneous acoustic environments. As a case study, we consider the problem of detecting avian flight calls from a ten-hour recording of nocturnal bird migration, recorded by a network of six ARUs in the presence of heterogeneous background noise. Starting from a CNN yielding state-of-the-art accuracy on this task, we introduce two noise adaptation techniques, respectively integrating short-term (60 milliseconds) and long-term (30 minutes) context. First, we apply per-channel energy normalization (PCEN) in the time-frequency domain, which applies short-term automatic gain control to every subband in the mel-frequency spectrogram. Secondly, we replace the last dense layer in the network by a context-adaptive neural network (CA-NN) layer. Combining them yields state-of-the-art results that are unmatched by artificial data augmentation alone. We release a pre-trained version of our best performing system under the name of BirdVoxDetect, a ready-to-use detector of avian flight calls in field recordings.Comment: 32 pages, in English. Submitted to PLOS ONE journal in February 2019; revised August 2019; published October 201

    Convolutional Recurrent Neural Networks for Polyphonic Sound Event Detection

    Get PDF
    Sound events often occur in unstructured environments where they exhibit wide variations in their frequency content and temporal structure. Convolutional neural networks (CNN) are able to extract higher level features that are invariant to local spectral and temporal variations. Recurrent neural networks (RNNs) are powerful in learning the longer term temporal context in the audio signals. CNNs and RNNs as classifiers have recently shown improved performances over established methods in various sound recognition tasks. We combine these two approaches in a Convolutional Recurrent Neural Network (CRNN) and apply it on a polyphonic sound event detection task. We compare the performance of the proposed CRNN method with CNN, RNN, and other established methods, and observe a considerable improvement for four different datasets consisting of everyday sound events.Comment: Accepted for IEEE Transactions on Audio, Speech and Language Processing, Special Issue on Sound Scene and Event Analysi

    Sound Levels Forecasting in an Acoustic Sensor Network Using a Deep Neural Network

    Get PDF
    [EN] Wireless acoustic sensor networks are nowadays an essential tool for noise pollution monitoring and managing in cities. The increased computing capacity of the nodes that create the network is allowing the addition of processing algorithms and artificial intelligence that provide more information about the sound sources and environment, e.g., detect sound events or calculate loudness. Several models to predict sound pressure levels in cities are available, mainly road, railway and aerial traffic noise. However, these models are mostly based in auxiliary data, e.g., vehicles flow or street geometry, and predict equivalent levels for a temporal long-term. Therefore, forecasting of temporal short-term sound levels could be a helpful tool for urban planners and managers. In this work, a Long Short-Term Memory (LSTM) deep neural network technique is proposed to model temporal behavior of sound levels at a certain location, both sound pressure level and loudness level, in order to predict near-time future values. The proposed technique can be trained for and integrated in every node of a sensor network to provide novel functionalities, e.g., a method of early warning against noise pollution and of backup in case of node or network malfunction. To validate this approach, one-minute period equivalent sound levels, captured in a two-month measurement campaign by a node of a deployed network of acoustic sensors, have been used to train it and to obtain different forecasting models. Assessments of the developed LSTM models and Auto regressive integrated moving average models were performed to predict sound levels for several time periods, from 1 to 60 min. Comparison of the results show that the LSTM models outperform the statistics-based models. In general, the LSTM models achieve a prediction of values with a mean square error less than 4.3 dB for sound pressure level and less than 2 phons for loudness. Moreover, the goodness of fit of the LSTM models and the behavior pattern of the data in terms of prediction of sound levels are satisfactory.This work was partially supported by the Fundacion Seneca del Centro de Coordinacion de la Investigacion de la Region de Murcia under Project 20813/PI/18.Navarro, JM.; Martínez-España, R.; Bueno-Crespo, A.; Cecilia-Canales, JM.; Martínez, R. (2020). Sound Levels Forecasting in an Acoustic Sensor Network Using a Deep Neural Network. Sensors. 20(3):1-16. https://doi.org/10.3390/s20030903S116203Hornikx, M. (2016). Ten questions concerning computational urban acoustics. Building and Environment, 106, 409-421. doi:10.1016/j.buildenv.2016.06.028Murphy, E., & King, E. A. (2010). Strategic environmental noise mapping: Methodological issues concerning the implementation of the EU Environmental Noise Directive and their policy implications. Environment International, 36(3), 290-298. doi:10.1016/j.envint.2009.11.006Arana, M., San Martin, R., San Martin, M. L., & Aramendía, E. (2009). Strategic noise map of a major road carried out with two environmental prediction software packages. Environmental Monitoring and Assessment, 163(1-4), 503-513. doi:10.1007/s10661-009-0853-5Garg, N., & Maji, S. (2014). A critical review of principal traffic noise models: Strategies and implications. Environmental Impact Assessment Review, 46, 68-81. doi:10.1016/j.eiar.2014.02.001Steele, C. (2001). A critical review of some traffic noise prediction models. Applied Acoustics, 62(3), 271-287. doi:10.1016/s0003-682x(00)00030-xLi, B., Tao, S., Dawson, R. W., Cao, J., & Lam, K. (2002). A GIS based road traffic noise prediction model. Applied Acoustics, 63(6), 679-691. doi:10.1016/s0003-682x(01)00066-4VAN LEEUWEN, H. J. A. (2000). RAILWAY NOISE PREDICTION MODELS: A COMPARISON. Journal of Sound and Vibration, 231(3), 975-987. doi:10.1006/jsvi.1999.2570Lui, W. K., Li, K. M., Ng, P. L., & Frommer, G. H. (2006). A comparative study of different numerical models for predicting train noise in high-rise cities. Applied Acoustics, 67(5), 432-449. doi:10.1016/j.apacoust.2005.08.005Van Leeuwen, J. J. A. (1996). NOISE PREDICTIONS MODELS TO DETERMINE THE EFFECT OF BARRIERS PLACED ALONGSIDE RAILWAY LINES. Journal of Sound and Vibration, 193(1), 269-276. doi:10.1006/jsvi.1996.0267Oerlemans, S., & Schepers, J. G. (2009). Prediction of Wind Turbine Noise and Validation against Experiment. International Journal of Aeroacoustics, 8(6), 555-584. doi:10.1260/147547209789141489Tadamasa, A., & Zangeneh, M. (2011). Numerical prediction of wind turbine noise. Renewable Energy, 36(7), 1902-1912. doi:10.1016/j.renene.2010.11.036Maisonneuve, N., Stevens, M., & Ochab, B. (2010). Participatory noise pollution monitoring using mobile phones. Information Polity, 15(1,2), 51-71. doi:10.3233/ip-2010-0200Akyildiz, I. F., Su, W., Sankarasubramaniam, Y., & Cayirci, E. (2002). Wireless sensor networks: a survey. Computer Networks, 38(4), 393-422. doi:10.1016/s1389-1286(01)00302-4Peckens, C., Porter, C., & Rink, T. (2018). Wireless Sensor Networks for Long-Term Monitoring of Urban Noise. Sensors, 18(9), 3161. doi:10.3390/s18093161Alías, F., & Alsina-Pagès, R. M. (2019). Review of Wireless Acoustic Sensor Networks for Environmental Noise Monitoring in Smart Cities. Journal of Sensors, 2019, 1-13. doi:10.1155/2019/7634860Mydlarz, C., Salamon, J., & Bello, J. P. (2017). The implementation of low-cost urban acoustic monitoring devices. Applied Acoustics, 117, 207-218. doi:10.1016/j.apacoust.2016.06.010Navarro, J. M., Tomas-Gabarron, J. B., & Escolano, J. (2017). A Big Data Framework for Urban Noise Analysis and Management in Smart Cities. Acta Acustica united with Acustica, 103(4), 552-560. doi:10.3813/aaa.919084Längkvist, M., Karlsson, L., & Loutfi, A. (2014). A review of unsupervised feature learning and deep learning for time-series modeling. Pattern Recognition Letters, 42, 11-24. doi:10.1016/j.patrec.2014.01.008Che, Z., Purushotham, S., Cho, K., Sontag, D., & Liu, Y. (2018). Recurrent Neural Networks for Multivariate Time Series with Missing Values. Scientific Reports, 8(1). doi:10.1038/s41598-018-24271-9Kim, H.-G., & Kim, J. Y. (2017). Environmental sound event detection in wireless acoustic sensor networks for home telemonitoring. China Communications, 14(9), 1-10. doi:10.1109/cc.2017.8068759Luque, A., Romero-Lemos, J., Carrasco, A., & Barbancho, J. (2018). Improving Classification Algorithms by Considering Score Series in Wireless Acoustic Sensor Networks. Sensors, 18(8), 2465. doi:10.3390/s18082465Zhang, Y., Fu, Y., & Wang, R. (2018). Collaborative representation based classification for vehicle recognition in acoustic sensor networks. Journal of Computational Methods in Sciences and Engineering, 18(2), 349-358. doi:10.3233/jcm-180794Cobos, M., Perez-Solano, J. J., Felici-Castell, S., Segura, J., & Navarro, J. M. (2014). Cumulative-Sum-Based Localization of Sound Events in Low-Cost Wireless Acoustic Sensor Networks. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 22(12), 1792-1802. doi:10.1109/taslp.2014.2351132Sevillano, X., Socoró, J. C., Alías, F., Bellucci, P., Peruzzi, L., Radaelli, S., … Zambon, G. (2016). DYNAMAP – Development of low cost sensors networks for real time noise mapping. Noise Mapping, 3(1). doi:10.1515/noise-2016-0013Segura-Garcia, J., Navarro-Ruiz, J., Perez-Solano, J., Montoya-Belmonte, J., Felici-Castell, S., Cobos, M., & Torres-Aranda, A. (2018). Spatio-Temporal Analysis of Urban Acoustic Environments with Binaural Psycho-Acoustical Considerations for IoT-Based Applications. Sensors, 18(3), 690. doi:10.3390/s18030690Bello, J. P., Silva, C., Nov, O., Dubois, R. L., Arora, A., Salamon, J., … Doraiswamy, H. (2019). SONYC. Communications of the ACM, 62(2), 68-77. doi:10.1145/3224204Socoró, J., Alías, F., & Alsina-Pagès, R. (2017). An Anomalous Noise Events Detector for Dynamic Road Traffic Noise Mapping in Real-Life Urban and Suburban Environments. Sensors, 17(10), 2323. doi:10.3390/s17102323Yu, L., & Kang, J. (2009). Modeling subjective evaluation of soundscape quality in urban open spaces: An artificial neural network approach. The Journal of the Acoustical Society of America, 126(3), 1163-1174. doi:10.1121/1.3183377Lopez-Ballester, J., Pastor-Aparicio, A., Segura-Garcia, J., Felici-Castell, S., & Cobos, M. (2019). Computation of Psycho-Acoustic Annoyance Using Deep Neural Networks. Applied Sciences, 9(15), 3136. doi:10.3390/app9153136Mansourkhaki, A., Berangi, M., Haghiri, M., & Haghani, M. (2018). A NEURAL NETWORK NOISE PREDICTION MODEL FOR TEHRAN URBAN ROADS. Journal of Environmental Engineering and Landscape Management, 26(2), 88-97. doi:10.3846/16486897.2017.1356327Pedersen, K., Transtrum, M. K., Gee, K. L., Butler, B. A., James, M. M., & Salton, A. R. (2018). Machine learning-based ensemble model predictions of outdoor ambient sound levels. 2019 International Congress on Ultrasonics. doi:10.1121/2.0001056Torija, A. J., Ruiz, D. P., & Ramos-Ridao, A. F. (2012). Use of back-propagation neural networks to predict both level and temporal-spectral composition of sound pressure in urban sound environments. Building and Environment, 52, 45-56. doi:10.1016/j.buildenv.2011.12.024Garg, N., Soni, K., Saxena, T. K., & Maji, S. (2015). Applications of AutoRegressive Integrated Moving Average (ARIMA) approach in time-series prediction of traffic noise pollution. Noise Control Engineering Journal, 63(2), 182-194. doi:10.3397/1/376317Tong, W., Li, L., Zhou, X., Hamilton, A., & Zhang, K. (2019). Deep learning PM2.5 concentrations with bidirectional LSTM RNN. Air Quality, Atmosphere & Health, 12(4), 411-423. doi:10.1007/s11869-018-0647-4Krishan, M., Jha, S., Das, J., Singh, A., Goyal, M. K., & Sekar, C. (2019). Air quality modelling using long short-term memory (LSTM) over NCT-Delhi, India. Air Quality, Atmosphere & Health, 12(8), 899-908. doi:10.1007/s11869-019-00696-7Noriega-Linares, J. E., Rodriguez-Mayol, A., Cobos, M., Segura-Garcia, J., Felici-Castell, S., & Navarro, J. M. (2017). A Wireless Acoustic Array System for Binaural Loudness Evaluation in Cities. IEEE Sensors Journal, 17(21), 7043-7052. doi:10.1109/jsen.2017.2751665Raspberry PI https://www.raspberrypi.orgLegates, D. R., & McCabe, G. J. (1999). Evaluating the use of «goodness-of-fit» Measures in hydrologic and hydroclimatic model validation. Water Resources Research, 35(1), 233-241. doi:10.1029/1998wr90001
    • …
    corecore