303 research outputs found
A note on a Mar\v{c}enko-Pastur type theorem for time series
In this note we develop an extension of the Mar\v{c}enko-Pastur theorem to
time series model with temporal correlations. The limiting spectral
distribution (LSD) of the sample covariance matrix is characterised by an
explicit equation for its Stieltjes transform depending on the spectral density
of the time series. A numerical algorithm is then given to compute the density
functions of these LSD's
Sound event localization and classification using WASN in Outdoor Environment
Deep learning-based sound event localization and classification is an
emerging research area within wireless acoustic sensor networks. However,
current methods for sound event localization and classification typically rely
on a single microphone array, making them susceptible to signal attenuation and
environmental noise, which limits their monitoring range. Moreover, methods
using multiple microphone arrays often focus solely on source localization,
neglecting the aspect of sound event classification. In this paper, we propose
a deep learning-based method that employs multiple features and attention
mechanisms to estimate the location and class of sound source. We introduce a
Soundmap feature to capture spatial information across multiple frequency
bands. We also use the Gammatone filter to generate acoustic features more
suitable for outdoor environments. Furthermore, we integrate attention
mechanisms to learn channel-wise relationships and temporal dependencies within
the acoustic features. To evaluate our proposed method, we conduct experiments
using simulated datasets with different levels of noise and size of monitoring
areas, as well as different arrays and source positions. The experimental
results demonstrate the superiority of our proposed method over
state-of-the-art methods in both sound event classification and sound source
localization tasks. And we provide further analysis to explain the reasons for
the observed errors
A Squeeze-and-Excitation and Transformer based Cross-task System for Environmental Sound Recognition
Environmental sound recognition (ESR) is an emerging research topic in audio
pattern recognition. Many tasks are presented to resort to computational
systems for ESR in real-life applications. However, current systems are usually
designed for individual tasks, and are not robust and applicable to other
tasks. Cross-task systems, which promote unified knowledge modeling across
various tasks, have not been thoroughly investigated. In this paper, we propose
a cross-task system for three different tasks of ESR: acoustic scene
classification, urban sound tagging, and anomalous sound detection. An
architecture named SE-Trans is presented that uses attention mechanism-based
Squeeze-and-Excitation and Transformer encoder modules to learn channel-wise
relationship and temporal dependencies of the acoustic features. FMix is
employed as the data augmentation method that improves the performance of ESR.
Evaluations for the three tasks are conducted on the recent databases of DCASE
challenges. The experimental results show that the proposed cross-task system
achieves state-of-the-art performance on all tasks. Further analysis
demonstrates that the proposed cross-task system can effectively utilize
acoustic knowledge across different ESR tasks
Boosting the Discriminant Power of Naive Bayes
Naive Bayes has been widely used in many applications because of its
simplicity and ability in handling both numerical data and categorical data.
However, lack of modeling of correlations between features limits its
performance. In addition, noise and outliers in the real-world dataset also
greatly degrade the classification performance. In this paper, we propose a
feature augmentation method employing a stack auto-encoder to reduce the noise
in the data and boost the discriminant power of naive Bayes. The proposed stack
auto-encoder consists of two auto-encoders for different purposes. The first
encoder shrinks the initial features to derive a compact feature representation
in order to remove the noise and redundant information. The second encoder
boosts the discriminant power of the features by expanding them into a
higher-dimensional space so that different classes of samples could be better
separated in the higher-dimensional space. By integrating the proposed feature
augmentation method with the regularized naive Bayes, the discrimination power
of the model is greatly enhanced. The proposed method is evaluated on a set of
machine-learning benchmark datasets. The experimental results show that the
proposed method significantly and consistently outperforms the state-of-the-art
naive Bayes classifiers.Comment: Accepted by 2022 International Conference on Pattern Recognitio
A Max-relevance-min-divergence Criterion for Data Discretization with Applications on Naive Bayes
In many classification models, data is discretized to better estimate its
distribution. Existing discretization methods often target at maximizing the
discriminant power of discretized data, while overlooking the fact that the
primary target of data discretization in classification is to improve the
generalization performance. As a result, the data tend to be over-split into
many small bins since the data without discretization retain the maximal
discriminant information. Thus, we propose a Max-Dependency-Min-Divergence
(MDmD) criterion that maximizes both the discriminant information and
generalization ability of the discretized data. More specifically, the
Max-Dependency criterion maximizes the statistical dependency between the
discretized data and the classification variable while the Min-Divergence
criterion explicitly minimizes the JS-divergence between the training data and
the validation data for a given discretization scheme. The proposed MDmD
criterion is technically appealing, but it is difficult to reliably estimate
the high-order joint distributions of attributes and the classification
variable. We hence further propose a more practical solution,
Max-Relevance-Min-Divergence (MRmD) discretization scheme, where each attribute
is discretized separately, by simultaneously maximizing the discriminant
information and the generalization ability of the discretized data. The
proposed MRmD is compared with the state-of-the-art discretization algorithms
under the naive Bayes classification framework on 45 machine-learning benchmark
datasets. It significantly outperforms all the compared methods on most of the
datasets.Comment: Under major revision of Pattern Recognitio
SSDPT: Self-Supervised Dual-Path Transformer for Anomalous Sound Detection in Machine Condition Monitoring
Anomalous sound detection for machine condition monitoring has great
potential in the development of Industry 4.0. However, these anomalous sounds
of machines are usually unavailable in normal conditions. Therefore, the models
employed have to learn acoustic representations with normal sounds for
training, and detect anomalous sounds while testing. In this article, we
propose a self-supervised dual-path Transformer (SSDPT) network to detect
anomalous sounds in machine monitoring. The SSDPT network splits the acoustic
features into segments and employs several DPT blocks for time and frequency
modeling. DPT blocks use attention modules to alternately model the interactive
information about the frequency and temporal components of the segmented
acoustic features. To address the problem of lack of anomalous sound, we adopt
a self-supervised learning approach to train the network with normal sound.
Specifically, this approach randomly masks and reconstructs the acoustic
features, and jointly classifies machine identity information to improve the
performance of anomalous sound detection. We evaluated our method on the
DCASE2021 task2 dataset. The experimental results show that the SSDPT network
achieves a significant increase in the harmonic mean AUC score, in comparison
to present state-of-the-art methods of anomalous sound detection
- …