384 research outputs found

    NMF-based temporal feature integration for acoustic event classification

    Get PDF
    Proceedings of: 14th Annual Conference of the International Speech Communication Association. Lyon, France, 25-29 August 2013.In this paper, we propose a new front-end for Acoustic Event Classification tasks (AEC) based on the combination of the temporal feature integration technique called Filter Bank Coefficients (FC) and Non-Negative Matrix Factorization (NMF). FC aims to capture the dynamic structure in the short-term features by means of the summarization of the periodogram of each short-term feature dimension in several frequency bands using a predefined filter bank. As the commonly used filter bank has been devised for other tasks (such as music genre classification), it can be suboptimal for AEC. In order to overcome this drawback, we propose an unsupervised method based on NMF for learning the filters which collect the most relevant temporal information in the short-time features for AEC. The experiments show that the features obtained with this method achieve significant improvements in the classification performance of a Support Vector Machine (SVM) based AEC system in comparison with the baseline FC features.This work has been partially supported by the Spanish Government grants TSI-020110-2009-103, IPT-120000-2010-24 and TEC2011-26807Publicad

    NMF-Based Spectral Analysis for Acoustic Event Classification Tasks

    Get PDF
    Proceedings of: 6th International Conference The Non-Linear Speech Processing (NOLISP 2013). Mons, Belgium, June 19-21, 2013.In this paper, we propose a new front-end for Acoustic Event Classification tasks (AEC). First, we study the spectral contents of different acoustic events by applying Non-Negative Matrix Factorization (NMF) on their spectral magnitude and compare them with the structure of speech spectra. Second, from the findings of this study, we propose a new parameterization for AEC, which is an extension of the conventional Mel Frequency Cepstrum Coefficients (MFCC) and is based on the high pass filtering of acoustic event spectra. Also, the influence of different frequency scales on the classification rate of the whole system is studied. The evaluation of the proposed features for AEC shows that relative error reductions about 12% at segment level and about 11% at target event level with respect to the conventional MFCC are achieved.This work has been partially supported by the Spanish Government grants TSI-020110-2009-103, IPT-120000-2010-24 and TEC2011-26807. Financial support from the Fundaci´on Carolina and Universidad Católica San Pablo, Arequipa.Publicad

    Recognition of Harmonic Sounds in Polyphonic Audio using a Missing Feature Approach: Extended Report

    Get PDF
    A method based on local spectral features and missing feature techniques is proposed for the recognition of harmonic sounds in mixture signals. A mask estimation algorithm is proposed for identifying spectral regions that contain reliable information for each sound source and then bounded marginalization is employed to treat the feature vector elements that are determined as unreliable. The proposed method is tested on musical instrument sounds due to the extensive availability of data but it can be applied on other sounds (i.e. animal sounds, environmental sounds), whenever these are harmonic. In simulations the proposed method clearly outperformed a baseline method for mixture signals

    An review of automatic drum transcription

    Get PDF
    In Western popular music, drums and percussion are an important means to emphasize and shape the rhythm, often defining the musical style. If computers were able to analyze the drum part in recorded music, it would enable a variety of rhythm-related music processing tasks. Especially the detection and classification of drum sound events by computational methods is considered to be an important and challenging research problem in the broader field of Music Information Retrieval. Over the last two decades, several authors have attempted to tackle this problem under the umbrella term Automatic Drum Transcription(ADT).This paper presents a comprehensive review of ADT research, including a thorough discussion of the task-specific challenges, categorization of existing techniques, and evaluation of several state-of-the-art systems. To provide more insights on the practice of ADT systems, we focus on two families of ADT techniques, namely methods based on Nonnegative Matrix Factorization and Recurrent Neural Networks. We explain the methods’ technical details and drum-specific variations and evaluate these approaches on publicly available datasets with a consistent experimental setup. Finally, the open issues and under-explored areas in ADT research are identified and discussed, providing future directions in this fiel

    Automatic music transcription: challenges and future directions

    Get PDF
    Automatic music transcription is considered by many to be a key enabling technology in music signal processing. However, the performance of transcription systems is still significantly below that of a human expert, and accuracies reported in recent years seem to have reached a limit, although the field is still very active. In this paper we analyse limitations of current methods and identify promising directions for future research. Current transcription methods use general purpose models which are unable to capture the rich diversity found in music signals. One way to overcome the limited performance of transcription systems is to tailor algorithms to specific use-cases. Semi-automatic approaches are another way of achieving a more reliable transcription. Also, the wealth of musical scores and corresponding audio data now available are a rich potential source of training data, via forced alignment of audio to scores, but large scale utilisation of such data has yet to be attempted. Other promising approaches include the integration of information from multiple algorithms and different musical aspects

    Feature Learning with Matrix Factorization Applied to Acoustic Scene Classification

    Get PDF
    International audienceIn this paper, we study the usefulness of various matrix factorization methods for learning features to be used for the specific Acoustic Scene Classification problem. A common way of addressing ASC has been to engineer features capable of capturing the specificities of acoustic environments. Instead, we show that better representations of the scenes can be automatically learned from time-frequency representations using matrix factorization techniques. We mainly focus on extensions including sparse, kernel-based, convolutive and a novel supervised dictionary learning variant of Principal Component Analysis and Nonnegative Matrix Factorization. An experimental evaluation is performed on two of the largest ASC datasets available in order to compare and discuss the usefulness of these methods for the task. We show that the unsupervised learning methods provide better representations of acoustic scenes than the best conventional hand-crafted features on both datasets. Furthermore, the introduction of a novel nonnegative supervised matrix factorization model and Deep Neural networks trained on spectrograms, allow us to reach further improvements

    Acoustic Features for Environmental Sound Analysis

    Get PDF
    International audienceMost of the time it is nearly impossible to differentiate between particular type of sound events from a waveform only. Therefore, frequency domain and time-frequency domain representations have been used for years providing representations of the sound signals that are more inline with the human perception. However, these representations are usually too generic and often fail to describe specific content that is present in a sound recording. A lot of work have been devoted to design features that could allow extracting such specific information leading to a wide variety of hand-crafted features. During the past years, owing to the increasing availability of medium scale and large scale sound datasets, an alternative approach to feature extraction has become popular, the so-called feature learning. Finally, processing the amount of data that is at hand nowadays can quickly become overwhelming. It is therefore of paramount importance to be able to reduce the size of the dataset in the feature space. The general processing chain to convert an sound signal to a feature vector that can be efficiently exploited by a classifier and the relation to features used for speech and music processing are described is this chapter

    Sparse and Nonnegative Factorizations For Music Understanding

    Get PDF
    In this dissertation, we propose methods for sparse and nonnegative factorization that are specifically suited for analyzing musical signals. First, we discuss two constraints that aid factorization of musical signals: harmonic and co-occurrence constraints. We propose a novel dictionary learning method that imposes harmonic constraints upon the atoms of the learned dictionary while allowing the dictionary size to grow appropriately during the learning procedure. When there is significant spectral-temporal overlap among the musical sources, our method outperforms popular existing matrix factorization methods as measured by the recall and precision of learned dictionary atoms. We also propose co-occurrence constraints -- three simple and convenient multiplicative update rules for nonnegative matrix factorization (NMF) that enforce dependence among atoms. Using examples in music transcription, we demonstrate the ability of these updates to represent each musical note with multiple atoms and cluster the atoms for source separation purposes. Second, we study how spectral and temporal information extracted by nonnegative factorizations can improve upon musical instrument recognition. Musical instrument recognition in melodic signals is difficult, especially for classification systems that rely entirely upon spectral information instead of temporal information. Here, we propose a simple and effective method of combining spectral and temporal information for instrument recognition. While existing classification methods use traditional features such as statistical moments, we extract novel features from spectral and temporal atoms generated by NMF using a biologically motivated multiresolution gamma filterbank. Unlike other methods that require thresholds, safeguards, and hierarchies, the proposed spectral-temporal method requires only simple filtering and a flat classifier. Finally, we study how to perform sparse factorization when a large dictionary of musical atoms is already known. Sparse coding methods such as matching pursuit (MP) have been applied to problems in music information retrieval such as transcription and source separation with moderate success. However, when the set of dictionary atoms is large, identification of the best match in the dictionary with the residual is slow -- linear in the size of the dictionary. Here, we propose a variant called approximate matching pursuit (AMP) that is faster than MP while maintaining scalability and accuracy. Unlike MP, AMP uses an approximate nearest-neighbor (ANN) algorithm to find the closest match in a dictionary in sublinear time. One such ANN algorithm, locality-sensitive hashing (LSH), is a probabilistic hash algorithm that places similar, yet not identical, observations into the same bin. While the accuracy of AMP is comparable to similar MP methods, the computational complexity is reduced. Also, by using LSH, this method scales easily; the dictionary can be expanded without reorganizing any data structures

    Advanced informatics for event detection and temporal localization

    Get PDF
    PhD ThesisThe primary objective of a Sound Event Detection (SED) system is to detect the prescene of an acoustic event (i.e., audio tagging) and to return the onset and offset of the identified acoustic event within an audio clip (i.e., temporal localization). Such a system can be promising in wildlife and biodiversity monitoring, surveillance, and smart-home applications. However, developing a system to be adept at both subtasks is not a trivial task. It can be hindered by the need for a large amount of strongly labeled data, where the event tags and the corresponding onsets and offsets are known with certainty. This is a limiting factor as strongly labeled data is challenging to collect and is prone to annotation errors due to the ambiguity in the perception of onsets and offsets. In this thesis, we propose to address the lack of strongly labeled data by using pseudo strongly labeled data, where the event tags are known with certainty while the corresponding onsets and offsets are estimated. While Nonnegative Matrix Factorization can be used directly for SED but with limited accuracy, we show that it can be a useful tool for pseudo labeling. We further show that pseudo strongly labeled data estimated using our proposed methods can improve the accuracy of a SED system developed using deep learning approaches. Subsequent work then focused on improving a SED system as a whole rather than a single subtask. This leads to the proposal of a novel student-teacher training framework that incorporates a noise-robust loss function, a new cyclic training scheme, an improved depthwise separable convolution, a triple instance-level temporal pooling approach, and an improved Transformer encoding layer. Together with synthetic strongly labeled data and a large corpus of unlabeled data, we show that a SED system developed using our proposed method is capable of producing state-of-the-art performance
    corecore