17 research outputs found

    An Audio-Based Vehicle Classifier Using Convolutional Neural Network

    Get PDF
    Audio-based event and scene classification are getting more attention in recent years. Many examples of environmental noise detection, vehicle classification, and soundscape analysis are developed using state of art deep learning techniques. The major noise source in urban and rural areas is road traffic noise. Environmental noise pa-rameters for urban and rural small roads have not been investigated due to some practical reasons. The purpose of this study is to develop an audio-based traffic classifier for rural and urban small roads which have limited or no traffic flow data to supply values for noise mapping and other noise metrics. An audio-based vehicle classifier a convolutional neural network-based algorithm was pro-posed using Mel spectrogram of audio signals as an input feature. Different variations of the network were generated by changing the parameters of the convolu-tional layers and the length of the network. Filter size, number of filters were tested with a dataset prepared with various real-life traffic records and audio extracts from traffic videos. The precision of the networks was evaluated with the common performance metrics. Further assessments were conducted with longer audio files and predictions of the system compared with actual traffic flow. The results showed that convolutional neural networks can be used to classify road traffic noise sources and perform outstandingly for single or double-lane roads

    Low-Complexity Acoustic Scene Classification Using Data Augmentation and Lightweight ResNet

    Full text link
    We present a work on low-complexity acoustic scene classification (ASC) with multiple devices, namely the subtask A of Task 1 of the DCASE2021 challenge. This subtask focuses on classifying audio samples of multiple devices with a low-complexity model, where two main difficulties need to be overcome. First, the audio samples are recorded by different devices, and there is mismatch of recording devices in audio samples. We reduce the negative impact of the mismatch of recording devices by using some effective strategies, including data augmentation (e.g., mix-up, spectrum correction, pitch shift), usages of multi-patch network structure and channel attention. Second, the model size should be smaller than a threshold (e.g., 128 KB required by the DCASE2021 challenge). To meet this condition, we adopt a ResNet with both depthwise separable convolution and channel attention as the backbone network, and perform model compression. In summary, we propose a low-complexity ASC method using data augmentation and a lightweight ResNet. Evaluated on the official development and evaluation datasets, our method obtains classification accuracy scores of 71.6% and 66.7%, respectively; and obtains Log-loss scores of 1.038 and 1.136, respectively. Our final model size is 110.3 KB which is smaller than the maximum of 128 KB.Comment: 5 pages, 5 figures, 4 tables. Accepted for publication in the 16th IEEE International Conference on Signal Processing (IEEE ICSP

    Evaluation of classical machine learning techniques towards urban sound recognition embedded systems

    Get PDF
    Automatic urban sound classification is a desirable capability for urban monitoring systems, allowing real-time monitoring of urban environments and recognition of events. Current embedded systems provide enough computational power to perform real-time urban audio recognition. Using such devices for the edge computation when acting as nodes of Wireless Sensor Networks (WSN) drastically alleviates the required bandwidth consumption. In this paper, we evaluate classical Machine Learning (ML) techniques for urban sound classification on embedded devices with respect to accuracy and execution time. This evaluation provides a real estimation of what can be expected when performing urban sound classification on such constrained devices. In addition, a cascade approach is also proposed to combine ML techniques by exploiting embedded characteristics such as pipeline or multi-thread execution present in current embedded devices. The accuracy of this approach is similar to the traditional solutions, but provides in addition more flexibility to prioritize accuracy or timing

    CnnSound: Convolutional Neural Networks for the Classification of Environmental Sounds

    Get PDF
    The classification of environmental sounds (ESC) has been increasingly studied in recent years. The main reason is that environmental sounds are part of our daily life, and associating them with our environment that we live in is important in several aspects as ESC is used in areas such as managing smart cities, determining location from environmental sounds, surveillance systems, machine hearing, environment monitoring. The ESC is however more difficult than other sounds because there are too many parameters that generate background noise in the ESC, which makes the sound more difficult to model and classify. The main aim of this study is therefore to develop more robust convolution neural networks architecture (CNN). For this purpose, 150 different CNN-based models were designed by changing the number of layers and values of their tuning parameters used in the layers. In order to test the accuracy of the models, the Urbansound8k environmental sound database was used. The sounds in this data set were first converted into an image format of 32x32x3. The proposed CNN model has yielded an accuracy of as much as 82.5% being higher than its classical counterpart. As there was not that much fine-tuning, the obtained accuracy has been found to be better and satisfactory compared to other studies on the Urbansound8k when both accuracy and computational complexity are considered. The results also suggest further improvement possible due to low complexity of the proposed CNN architecture and its applicability in real-world settings

    Recognition of Handwritten Azerbaijani Letters using Convolutional Neural Networks

    Get PDF
    Technology advancements have made it possible to fill out documents such as petitions and forms electronically. However, in some circumstances, hard copies of documents that are difficult to share, store, and save due to their rigid dimensions are still used to preserve documents in the conventional manner. It is crucial to convert these written documents into digital media because of this. From this view point, this goal of this study is to investigate various methods for the digitalization of handwritten documents. In this study, image processing methods were used to pre-process the documents that were converted to image format. These operations include splitting the image format of the document into the lines, separating them into words and characters, and then classifying the characters. Convolutional Neural Networks, which is used for image recognition, is one of the deep learning techniques used in classification. The Extended MNIST dataset and the symbol dataset created from the pre-existing documents are used to train the model. The success rate of the generated dataset was 88.72 percent

    Environment Sound Classification using Multiple Feature Channels and Attention based Deep Convolutional Neural Network

    Get PDF
    In this paper, we propose a model for the Environment Sound Classification Task (ESC) that consists of multiple feature channels given as input to a Deep Convolutional Neural Network (CNN) with Attention mechanism. The novelty of the paper lies in using multiple feature channels consisting of Mel-Frequency Cepstral Coefficients (MFCC), Gammatone Frequency Cepstral Coefficients (GFCC), the Constant Q-transform (CQT) and Chromagram. Such multiple features have never been used before for signal or audio processing. And, we employ a deeper CNN (DCNN) compared to previous models, consisting of spatially separable convolutions working on time and feature domain separately. Alongside, we use attention modules that perform channel and spatial attention together. We use some data augmentation techniques to further boost performance. Our model is able to achieve state-of-the-art performance on all three benchmark environment sound classification datasets, i.e. the UrbanSound8K (97.52%), ESC-10 (95.75%) and ESC-50 (88.50%). To the best of our knowledge, this is the first time that a single environment sound classification model is able to achieve state-of-the-art results on all three datasets. For ESC-10 and ESC-50 datasets, the accuracy achieved by the proposed model is beyond human accuracy of 95.7% and 81.3% respectively.Comment: Re-checking result
    corecore