Search CORE

30 research outputs found

Acoustic Scene Classification by Implicitly Identifying Distinct Sound Events

Author: Deng Shiwen
Du Zhihao
Han Jiqing
Song Hongwei
Publication venue: 'International Speech Communication Association'
Publication date: 26/04/2019
Field of study

In this paper, we propose a new strategy for acoustic scene classification (ASC) , namely recognizing acoustic scenes through identifying distinct sound events. This differs from existing strategies, which focus on characterizing global acoustical distributions of audio or the temporal evolution of short-term audio features, without analysis down to the level of sound events. To identify distinct sound events for each scene, we formulate ASC in a multi-instance learning (MIL) framework, where each audio recording is mapped into a bag-of-instances representation. Here, instances can be seen as high-level representations for sound events inside a scene. We also propose a MIL neural networks model, which implicitly identifies distinct instances (i.e., sound events). Furthermore, we propose two specially designed modules that model the multi-temporal scale and multi-modal natures of the sound events respectively. The experiments were conducted on the official development set of the DCASE2018 Task1 Subtask B, and our best-performing model improves over the official baseline by 9.4% (68.3% vs 58.9%) in terms of classification accuracy. This study indicates that recognizing acoustic scenes by identifying distinct sound events is effective and paves the way for future studies that combine this strategy with previous ones.Comment: code URL typo, code is available at https://github.com/hackerekcah/distinct-events-asc.gi

arXiv.org e-Print Archive

Crossref

CochlScene: Acquisition of acoustic scene data using crowdsourcing

Author: Jeong Il-Young
Park Jeongsoo
Publication venue
Publication date: 04/11/2022
Field of study

This paper describes a pipeline for collecting acoustic scene data by using crowdsourcing. The detailed process of crowdsourcing is explained, including planning, validation criteria, and actual user interfaces. As a result of data collection, we present CochlScene, a novel dataset for acoustic scene classification. Our dataset consists of 76k samples collected from 831 participants in 13 acoustic scenes. We also propose a manual data split of training, validation, and test sets to increase the reliability of the evaluation results. Finally, we provide a baseline system for future research.Comment: Accept by APSIPA ASC 2022, 5 pages, 2 figure

arXiv.org e-Print Archive

Learning Audio Sequence Representations for Acoustic Event Classification

Author: Han Jing
Liu Ding
Qian Kun
Schuller Björn W.
Zhang Zixing
Publication venue
Publication date: 27/07/2017
Field of study

Acoustic Event Classification (AEC) has become a significant task for machines to perceive the surrounding auditory scene. However, extracting effective representations that capture the underlying characteristics of the acoustic events is still challenging. Previous methods mainly focused on designing the audio features in a 'hand-crafted' manner. Interestingly, data-learnt features have been recently reported to show better performance. Up to now, these were only considered on the frame-level. In this paper, we propose an unsupervised learning framework to learn a vector representation of an audio sequence for AEC. This framework consists of a Recurrent Neural Network (RNN) encoder and a RNN decoder, which respectively transforms the variable-length audio sequence into a fixed-length vector and reconstructs the input sequence on the generated vector. After training the encoder-decoder, we feed the audio sequences to the encoder and then take the learnt vectors as the audio sequence representations. Compared with previous methods, the proposed method can not only deal with the problem of arbitrary-lengths of audio streams, but also learn the salient information of the sequence. Extensive evaluation on a large-size acoustic event database is performed, and the empirical results demonstrate that the learnt audio sequence representation yields a significant performance improvement by a large margin compared with other state-of-the-art hand-crafted sequence features for AEC

arXiv.org e-Print Archive

OPUS Augsburg

KLASIFIKASI TINGKAT KEMATANGAN BUAH KELAPA MENGGUNAKAN DEEP LEARNING BERBASIS FITUR AKUSTIK

Author: MUHLIS AHMAD ABDILLAH .
Publication venue
Publication date: 15/08/2023
Field of study

*ABSTRAK* ------- Penelitian ini membahas tentang pemanfaatan bunyi ketukan buah kelapa untuk mengklasifikasikan tingkat kematangannya berbasis fitur akustik. Terdapat kekurangan dalam mengklasifikasikan kematangan kelapa dengan mendengarkan suara ketukannya secara manual sangatlah bergantung dari kemampuan pendengar dalam menentukan kematangan kelapa. Sehingga diperlukan sistem yang dapat melakukan klasifikasi secara otomatis. Fitur akustik dieksplorasi dengan memvarisasikan fitur-fitur yang diekstraksi dari domain frekuensi dan waktu sebagai masukan untuk model deep learning. Fitur yang diekstraksi meliputi MelFrequency Cepstral Coefficients (MFCC) dan Power-Normalized Cepstral Coefficients (PNCC) dari domain frekuensi serta Amplitude Envelope (AE), Zero Crossing Rate (ZCR), dan RMS Energy (RMS Energy) dari domain waktu. Dalam penelitian ini, digunakan adalah Long Short-Term Memory (LSTM) dan Deep Neural Network (DNN). Hasil penelitian menunjukkan bahwa model LSTM dan DNN memperoleh akurasi 92,86% dan 89,29% dengan fitur domain frekuensi. This research discusses the utilization of coconut beats to classify the level of maturity based on acoustic features. There are shortcomings in classifying coconut maturity by listening to the sound of the knock manually is very dependent on the ability of the listener in determining the maturity of the coconut. So a system is needed that can do the classification automatically. Acoustic features are explored by varying the features extracted from the frequency and time domains as input to the deep learning model. The extracted features include Mel-Frequency Cepstral Coefficients (MFCC) and Power-Normalized Cepstral Coefficients (PNCC) from the frequency domain and Amplitude Envelope (AE), Zero Crossing Rate (ZCR), and RMS Energy from the time domain. In this study, Long Short-Term Memory (LSTM) and Deep Neural Network (DNN) were used. The results showed that the LSTM and DNN models obtained 92.86% and 89.29% accuracy with frequency domain features

Repository Universitas Negeri Jakarta