30 research outputs found
Acoustic Scene Classification by Implicitly Identifying Distinct Sound Events
In this paper, we propose a new strategy for acoustic scene classification
(ASC) , namely recognizing acoustic scenes through identifying distinct sound
events. This differs from existing strategies, which focus on characterizing
global acoustical distributions of audio or the temporal evolution of
short-term audio features, without analysis down to the level of sound events.
To identify distinct sound events for each scene, we formulate ASC in a
multi-instance learning (MIL) framework, where each audio recording is mapped
into a bag-of-instances representation. Here, instances can be seen as
high-level representations for sound events inside a scene. We also propose a
MIL neural networks model, which implicitly identifies distinct instances
(i.e., sound events). Furthermore, we propose two specially designed modules
that model the multi-temporal scale and multi-modal natures of the sound events
respectively. The experiments were conducted on the official development set of
the DCASE2018 Task1 Subtask B, and our best-performing model improves over the
official baseline by 9.4% (68.3% vs 58.9%) in terms of classification accuracy.
This study indicates that recognizing acoustic scenes by identifying distinct
sound events is effective and paves the way for future studies that combine
this strategy with previous ones.Comment: code URL typo, code is available at
https://github.com/hackerekcah/distinct-events-asc.gi
CochlScene: Acquisition of acoustic scene data using crowdsourcing
This paper describes a pipeline for collecting acoustic scene data by using
crowdsourcing. The detailed process of crowdsourcing is explained, including
planning, validation criteria, and actual user interfaces. As a result of data
collection, we present CochlScene, a novel dataset for acoustic scene
classification. Our dataset consists of 76k samples collected from 831
participants in 13 acoustic scenes. We also propose a manual data split of
training, validation, and test sets to increase the reliability of the
evaluation results. Finally, we provide a baseline system for future research.Comment: Accept by APSIPA ASC 2022, 5 pages, 2 figure
Learning Audio Sequence Representations for Acoustic Event Classification
Acoustic Event Classification (AEC) has become a significant task for
machines to perceive the surrounding auditory scene. However, extracting
effective representations that capture the underlying characteristics of the
acoustic events is still challenging. Previous methods mainly focused on
designing the audio features in a 'hand-crafted' manner. Interestingly,
data-learnt features have been recently reported to show better performance. Up
to now, these were only considered on the frame-level. In this paper, we
propose an unsupervised learning framework to learn a vector representation of
an audio sequence for AEC. This framework consists of a Recurrent Neural
Network (RNN) encoder and a RNN decoder, which respectively transforms the
variable-length audio sequence into a fixed-length vector and reconstructs the
input sequence on the generated vector. After training the encoder-decoder, we
feed the audio sequences to the encoder and then take the learnt vectors as the
audio sequence representations. Compared with previous methods, the proposed
method can not only deal with the problem of arbitrary-lengths of audio
streams, but also learn the salient information of the sequence. Extensive
evaluation on a large-size acoustic event database is performed, and the
empirical results demonstrate that the learnt audio sequence representation
yields a significant performance improvement by a large margin compared with
other state-of-the-art hand-crafted sequence features for AEC
KLASIFIKASI TINGKAT KEMATANGAN BUAH KELAPA MENGGUNAKAN DEEP LEARNING BERBASIS FITUR AKUSTIK
*ABSTRAK*
-------
Penelitian ini membahas tentang pemanfaatan bunyi ketukan buah kelapa untuk mengklasifikasikan tingkat kematangannya berbasis fitur akustik. Terdapat kekurangan dalam mengklasifikasikan kematangan kelapa dengan mendengarkan suara ketukannya secara manual sangatlah bergantung dari kemampuan pendengar dalam menentukan kematangan kelapa. Sehingga diperlukan sistem yang dapat melakukan klasifikasi secara otomatis. Fitur akustik dieksplorasi dengan memvarisasikan fitur-fitur yang diekstraksi dari domain frekuensi dan waktu sebagai masukan untuk model deep learning. Fitur yang diekstraksi meliputi MelFrequency Cepstral Coefficients (MFCC) dan Power-Normalized Cepstral Coefficients (PNCC) dari domain frekuensi serta Amplitude Envelope (AE), Zero Crossing Rate (ZCR), dan RMS Energy (RMS Energy) dari domain waktu. Dalam penelitian ini, digunakan adalah Long Short-Term Memory (LSTM) dan Deep Neural Network (DNN). Hasil penelitian menunjukkan bahwa model LSTM dan DNN memperoleh akurasi 92,86% dan 89,29% dengan fitur domain frekuensi.
This research discusses the utilization of coconut beats to classify the level of maturity based on acoustic features. There are shortcomings in classifying coconut maturity by listening to the sound of the knock manually is very dependent on the ability of the listener in determining the maturity of the coconut. So a system is needed that can do the classification automatically. Acoustic features are explored by varying the features extracted from the frequency and time domains as input to the deep learning model. The extracted features include Mel-Frequency Cepstral Coefficients (MFCC) and Power-Normalized Cepstral Coefficients (PNCC) from the frequency domain and Amplitude Envelope (AE), Zero Crossing Rate (ZCR), and RMS Energy from the time domain. In this study, Long Short-Term Memory (LSTM) and Deep Neural Network (DNN) were used. The results showed that the LSTM and DNN models obtained 92.86% and 89.29% accuracy with frequency domain features