Search CORE

58 research outputs found

Evaluation of 2D Acoustic Signal Representations for Acoustic-Based Machine Condition Monitoring

Author: Jombo Gbanaibolou
Shriram Ajay
Publication venue
Publication date: 12/04/2022
Field of study

Acoustic-based machine condition monitoring (MCM) provides an improved alternative to conventional MCM approaches, including vibration analysis and lubrication monitoring, among others. Several challenges arise in anomalous machine operating sound classification, as it requires effective 2D acoustic signal representation. This paper explores this question. A baseline convolutional neural network (CNN) is implemented and trained with rolling element bearing acoustic fault data. Three representations are considered, such as log-spectrogram, short-time Fourier transform and log-Mel spectrogram. The results establish log-Mel spectrogram and log-spectrogram, as promising candidates for further exploration.Peer reviewe

University of Hertfordshire Research Archive

Music Structure Boundaries Estimation Using Multiple Self-Similarity Matrices as Input Depth of Convolutional Neural Networks

Author: Cohen-Hadria Alice
Peeters Geoffroy
Publication venue: HAL CCSD
Publication date: 21/06/2017
Field of study

International audienceIn this paper, we propose a new representation as input of a Convolutional Neural Network with the goal of estimating music structure boundaries. For this task, previous works used a network performing the late-fusion of a Mel-scaled log-magnitude spectrogram and a self-similarity-lag-matrix. We propose here to use the square-sub-matrices centered on the main diagonals of several self-similarity-matrices, each one representing a different audio descriptors. We propose to combine them using the depth of the input layer. We show that this representation improves the results over the use of the self-similarity-lag-matrix. We also show that using the depth of the input layer provide a convenient way for early fusion of audio representations

Detection and restoration of click degraded audio based on high-order sparse linear prediction

Author: Adugna Eneyew
Derebssa Bisrat
Eneman Koen
Waterschoot Toon van
Publication venue: Addis Ababa University printing
Publication date: 21/08/2022
Field of study

Clicks are short-duration defects that affect most archived audio media. Linear prediction (LP) modeling for the representation and restoration of audio signals that have been corrupted by click degradation has been extensively studied. The use of high-order sparse linear prediction for the restoration of clickdegraded audio given the time location of samples affected by click degradation has been shown to lead to significant restoration improvement over conventional LP-based approaches. For the practical usage of such methods, the identification of the time location of samples affected by click degradation is critical. High-order sparse linear prediction has been shown to lead to better modeling of audio resulting in better restoration of click degraded archived audio. In this paper, the use of high-order sparse linear prediction for the detection and restoration of click degraded audio is proposed. Results in terms of click duration estimation, SNR improvement and perceptual audio quality show that the proposed approach based on high-order sparse linear prediction leads to better performance compared to state of the art LP-based approaches.&nbsp

AJOL - African Journals Online

Music Genre Classification with ResNet and Bi-GRU Using Visual Spectrograms

Author: Zhang Junfei
Publication venue
Publication date: 20/07/2023
Field of study

Music recommendation systems have emerged as a vital component to enhance user experience and satisfaction for the music streaming services, which dominates music consumption. The key challenge in improving these recommender systems lies in comprehending the complexity of music data, specifically for the underpinning music genre classification. The limitations of manual genre classification have highlighted the need for a more advanced system, namely the Automatic Music Genre Classification (AMGC) system. While traditional machine learning techniques have shown potential in genre classification, they heavily rely on manually engineered features and feature selection, failing to capture the full complexity of music data. On the other hand, deep learning classification architectures like the traditional Convolutional Neural Networks (CNN) are effective in capturing the spatial hierarchies but struggle to capture the temporal dynamics inherent in music data. To address these challenges, this study proposes a novel approach using visual spectrograms as input, and propose a hybrid model that combines the strength of the Residual neural Network (ResNet) and the Gated Recurrent Unit (GRU). This model is designed to provide a more comprehensive analysis of music data, offering the potential to improve the music recommender systems through achieving a more comprehensive analysis of music data and hence potentially more accurate genre classification

arXiv.org e-Print Archive

Notes on the use of variational autoencoders for speech and audio spectrogram modeling

Author: Girin Laurent
Hueber Thomas
Leglaive Simon
Roche Fanny
Publication venue: HAL CCSD
Publication date: 02/09/2019
Field of study

International audienceVariational autoencoders (VAEs) are powerful (deep) generative artificial neural networks. They have been recently used in several papers for speech and audio processing, in particular for the modeling of speech/audio spectrograms. In these papers, very poor theoretical support is given to justify the chosen data representation and decoder likelihood function or the corresponding cost function used for training the VAE. Yet, a nice theoretical statistical framework exists and has been extensively presented and discussed in papers dealing with nonnegative matrix factorization (NMF) of audio spectrograms and its application to audio source separation. In the present paper, we show how this statistical framework applies to VAE-based speech/audio spectrogram modeling. This provides the latter insights on the choice and interpretability of data representation and model parameterization

Auditory time-frequency masking: psychoacoustical data and application to audio representations

Author: Balazs Peter
Kronland-Martinet Richard
Laback Bernhard
Meunier Sabine
Necciari Thibaud
Savel Sophie
Ystad Sølvi
Publication venue: Springer Verlag Berlin Heidelberg
Publication date: 01/01/2012
Field of study

International audienceIn this paper, the results of psychoacoustical experiments on auditory time-frequency (TF) masking using stimuli (masker and target) with maximal concentration in the TF plane are presented. The target was shifted either along the time axis, the frequency axis, or both relative to the masker. The results show that a simple superposition of spectral and temporal masking functions does not provide an accurate representa- tion of the measured TF masking function. This confirms the inaccuracy of simple models of TF masking currently implemented in some percep- tual audio codecs. In the context of audio signal processing, the present results constitute a crucial basis for the prediction of auditory masking in the TF representations of sounds. An algorithm that removes the in- audible components in the wavelet transform of a sound while causing no audible difference to the original sound after re-synthesis is proposed. Preliminary results are promising, although further development is re- quired

HAL AMU

Final Research Report on Auto-Tagging of Music

Author: Cohen-Hadria Alice
Cornu Frédéric
Fourer Dominique
Hofmann Robin
Laffitte Pierre
Marchand Ugo
Mignot Rémi
Peeters Geoffroy
Schindler Daniel
Schwarz Diemo
Spadaveccia Rino
Publication venue
Publication date: 12/12/2018
Field of study

The deliverable D4.7 concerns the work achieved by IRCAM until M36 for the “auto-tagging of music”. The deliverable is a research report. The software libraries resulting from the research have been integrated into Fincons/HearDis! Music Library Manager or are used by TU Berlin. The final software libraries are described in D4.5. The research work on auto-tagging has concentrated on four aspects: 1) Further improving IRCAM’s machine-learning system ircamclass. This has been done by developing the new MASSS audio features, including audio augmentation and audio segmentation into ircamclass. The system has then been applied to train HearDis! “soft” features (Vocals-1, Vocals-2, Pop-Appeal, Intensity, Instrumentation, Timbre, Genre, Style). This is described in Part 3. 2) Developing two sets of “hard” features (i.e. related to musical or musicological concepts) as specified by HearDis! (for integration into Fincons/HearDis! Music Library Manager) and TU Berlin (as input for the prediction model of the GMBI attributes). Such features are either derived from previously estimated higher-level concepts (such as structure, key or succession of chords) or by developing new signal processing algorithm (such as HPSS) or main melody estimation. This is described in Part 4. 3) Developing audio features to characterize the audio quality of a music track. The goal is to describe the quality of the audio independently of its apparent encoding. This is then used to estimate audio degradation or music decade. This is to be used to ensure that playlists contain tracks with similar audio quality. This is described in Part 5. 4) Developing innovative algorithms to extract specific audio features to improve music mixes. So far, innovative techniques (based on various Blind Audio Source Separation algorithms and Convolutional Neural Network) have been developed for singing voice separation, singing voice segmentation, music structure boundaries estimation, and DJ cue-region estimation. This is described in Part 6.EC/H2020/688122/EU/Artist-to-Business-to-Business-to-Consumer Audio Branding System/ABC D

DepositOnce

Matching Pursuits with Random Sequential Subdictionaries

Author: Daudet Laurent
Moussallam Manuel
Richard Gaël
Publication venue: 'Elsevier BV'
Publication date: 05/04/2012
Field of study

Matching pursuits are a class of greedy algorithms commonly used in signal processing, for solving the sparse approximation problem. They rely on an atom selection step that requires the calculation of numerous projections, which can be computationally costly for large dictionaries and burdens their competitiveness in coding applications. We propose using a non adaptive random sequence of subdictionaries in the decomposition process, thus parsing a large dictionary in a probabilistic fashion with no additional projection cost nor parameter estimation. A theoretical modeling based on order statistics is provided, along with experimental evidence showing that the novel algorithm can be efficiently used on sparse approximation problems. An application to audio signal compression with multiscale time-frequency dictionaries is presented, along with a discussion of the complexity and practical implementations.Comment: 20 pages - accepted 2nd April 2012 at Elsevier Signal Processin

arXiv.org e-Print Archive

Hal-Diderot

Recommended from our members

Adaptive Noise Reduction for Sound Event Detection Using Subband-Weighted NMF

Author: Emmanouil Benetos
Févotte
Qing Zhou
Virtanen
Zuren Feng
Publication venue: 'MDPI AG'
Publication date: 12/07/2019
Field of study

Sound event detection in real-world environments suffers from the interference of non-stationary and time-varying noise. This paper presents an adaptive noise reduction method for sound event detection based on non-negative matrix factorization (NMF). First, a scheme for noise dictionary learning from the input noisy signal is employed by the technique of robust NMF, which supports adaptation to noise variations. The estimated noise dictionary is used to develop a supervised source separation framework in combination with a pre-trained event dictionary. Second, to improve the separation quality, we extend the basic NMF model to a weighted form, with the aim of varying the relative importance of the different components when separating a target sound event from noise. With properly designed weights, the separation process is forced to rely more on those dominant event components, whereas the noise gets greatly suppressed. The proposed method is evaluated on a dataset of the rare sound event detection task of the DCASE 2017 challenge, and achieves comparable results to the top-ranking system based on convolutional recurrent neural networks (CRNNs). The proposed weighted NMF method shows an excellent noise reduction ability, and achieves an improvement of an F-score by 5%, compared to the unweighted approach

City Research Online

Crossref

Queen Mary Research Online

Enhancing film sound design using audio features, regression models and artificial neural networks

Author: Cunningham Stuart
Griffiths Darryl
Picking Richard
Weinel Jonathan
Publication venue: Taylor & Francis
Publication date: 06/02/2023
Field of study

This is an Accepted Manuscript of an article published by Taylor & Francis in Journal of New Music Research on 21/09/2021, available online: https://doi.org/10.1080/09298215.2021.1977336Making the link between human emotion and music is challenging. Our aim was to produce an efficient system that emotionally rates songs from multiple genres. To achieve this, we employed a series of online self-report studies, utilising Russell's circumplex model. The first study (n = 44) identified audio features that map to arousal and valence for 20 songs. From this, we constructed a set of linear regressors. The second study (n = 158) measured the efficacy of our system, utilising 40 new songs to create a ground truth. Results show our approach may be effective at emotionally rating music, particularly in the prediction of valence

ChesterRep