5 research outputs found

    Model Based Sparse Feature Extraction for Biomedical Signal Classification

    Get PDF
    This article focuses on model based sparse feature extraction of biomedical signals for classification problems, which stems from sparse representation in modern signal processing. In the presented work, a novel approach based on sparse principal component analysis (SPCA) is proposed to extract signal features. This method involves partitioning signals and utilizing SPCA to select only a limited number of signal segments in order to construct signal principal components during the training stage. For signal classification purposes, a set of regression models based on sparse principal components of the selected training signal segments is constructed. Within this approach, model residuals are estimated and used as signal features for classification. The applications of the proposed approach are demonstrated by using both the synthetic data and real EEG signals. The high classification accuracy results suggest that the proposed methods may be useful for automatic event detection using long-term observational signals. keywords: Sparse Principal Component Analysis, Sparse Feature Extraction, Signal Classification, Long-term Signal

    Automatic classification of drum sounds with indefinite pitch

    Get PDF
    Automatic classification of musical instruments is an important task for music transcription as well as for professionals such as audio designers, engineers and musicians. Unfortunately, only a limited amount of effort has been conducted to automatically classify percussion instrument in the last years. The studies that deal with percussion sounds are usually restricted to distinguish among the instruments in the drum kit such as toms vs. snare drum vs. bass drum vs. cymbals. In this paper, we are interested in a more challenging task of discriminating sounds produced by the same percussion instrument. Specifically, sounds from different drums cymbals types. Cymbals are known to have indefinite pitch, nonlinear and chaotic behavior. We also identify how the sound of a specific cymbal was produced (e.g., roll or choke movements performed by a drummer). We achieve an accuracy of 96.59% for cymbal type classification and 91.54% in a classification problem with 12 classes which represent the cymbal type and the manner or region that the cymbals are struck. Both results were obtained with Support Vector Machine algorithm using the Line Spectral Frequencies as audio descriptor. We believe that our results can be useful for a more detailed automatic drum transcription and for other related applications as well for audio professionals.Fundação de Amparo a Pesquisa e Desenvolvimento do Estado de São Paulo (FAPESP) (grants 2011/17698-5

    Représentations parcimonieuses pour les signaux multivariés

    Get PDF
    Dans cette thèse, nous étudions les méthodes d'approximation et d'apprentissage qui fournissent des représentations parcimonieuses. Ces méthodes permettent d'analyser des bases de données très redondantes à l'aide de dictionnaires d'atomes appris. Etant adaptés aux données étudiées, ils sont plus performants en qualité de représentation que les dictionnaires classiques dont les atomes sont définis analytiquement. Nous considérons plus particulièrement des signaux multivariés résultant de l'acquisition simultanée de plusieurs grandeurs, comme les signaux EEG ou les signaux de mouvements 2D et 3D. Nous étendons les méthodes de représentations parcimonieuses au modèle multivarié, pour prendre en compte les interactions entre les différentes composantes acquises simultanément. Ce modèle est plus flexible que l'habituel modèle multicanal qui impose une hypothèse de rang 1. Nous étudions des modèles de représentations invariantes : invariance par translation temporelle, invariance par rotation, etc. En ajoutant des degrés de liberté supplémentaires, chaque noyau est potentiellement démultiplié en une famille d'atomes, translatés à tous les échantillons, tournés dans toutes les orientations, etc. Ainsi, un dictionnaire de noyaux invariants génère un dictionnaire d'atomes très redondant, et donc idéal pour représenter les données étudiées redondantes. Toutes ces invariances nécessitent la mise en place de méthodes adaptées à ces modèles. L'invariance par translation temporelle est une propriété incontournable pour l'étude de signaux temporels ayant une variabilité temporelle naturelle. Dans le cas de l'invariance par rotation 2D et 3D, nous constatons l'efficacité de l'approche non-orientée sur celle orientée, même dans le cas où les données ne sont pas tournées. En effet, le modèle non-orienté permet de détecter les invariants des données et assure la robustesse à la rotation quand les données tournent. Nous constatons aussi la reproductibilité des décompositions parcimonieuses sur un dictionnaire appris. Cette propriété générative s'explique par le fait que l'apprentissage de dictionnaire est une généralisation des K-means. D'autre part, nos représentations possèdent de nombreuses invariances, ce qui est idéal pour faire de la classification. Nous étudions donc comment effectuer une classification adaptée au modèle d'invariance par translation, en utilisant des fonctions de groupement consistantes par translation.In this thesis, we study approximation and learning methods which provide sparse representations. These methods allow to analyze very redundant data-bases thanks to learned atoms dictionaries. Being adapted to studied data, they are more efficient in representation quality than classical dictionaries with atoms defined analytically. We consider more particularly multivariate signals coming from the simultaneous acquisition of several quantities, as EEG signals or 2D and 3D motion signals. We extend sparse representation methods to the multivariate model, to take into account interactions between the different components acquired simultaneously. This model is more flexible that the common multichannel one which imposes a hypothesis of rank 1. We study models of invariant representations: invariance to temporal shift, invariance to rotation, etc. Adding supplementary degrees of freedom, each kernel is potentially replicated in an atoms family, translated at all samples, rotated at all orientations, etc. So, a dictionary of invariant kernels generates a very redundant atoms dictionary, thus ideal to represent the redundant studied data. All these invariances require methods adapted to these models. Temporal shift-invariance is an essential property for the study of temporal signals having a natural temporal variability. In the 2D and 3D rotation invariant case, we observe the efficiency of the non-oriented approach over the oriented one, even when data are not revolved. Indeed, the non-oriented model allows to detect data invariants and assures the robustness to rotation when data are revolved. We also observe the reproducibility of the sparse decompositions on a learned dictionary. This generative property is due to the fact that dictionary learning is a generalization of K-means. Moreover, our representations have many invariances that is ideal to make classification. We thus study how to perform a classification adapted to the shift-invariant model, using shift-consistent pooling functions.SAVOIE-SCD - Bib.électronique (730659901) / SudocGRENOBLE1/INP-Bib.électronique (384210012) / SudocGRENOBLE2/3-Bib.électronique (384219901) / SudocSudocFranceF

    Computational Modeling and Analysis of Multi-timbral Musical Instrument Mixtures

    Get PDF
    In the audio domain, the disciplines of signal processing, machine learning, psychoacoustics, information theory and library science have merged into the field of Music Information Retrieval (Music-IR). Music-IR researchers attempt to extract high level information from music like pitch, meter, genre, rhythm and timbre directly from audio signals as well as semantic meta-data over a wide variety of sources. This information is then used to organize and process data for large scale retrieval and novel interfaces. For creating musical content, access to hardware and software tools for producing music has become commonplace in the digital landscape. While the means to produce music have become widely available, significant time must be invested to attain professional results. Mixing multi-channel audio requires techniques and training far beyond the knowledge of the average music software user. As a result, there is significant growth and development in intelligent signal processing for audio, an emergent field combining audio signal processing and machine learning for producing music. This work focuses on methods for modeling and analyzing multi-timbral musical instrument mixtures and performing automated processing techniques to improve audio quality based on quantitative and qualitative measures. The main contributions of the work involve training models to predict mixing parameters for multi-channel audio sources and developing new methods to model the component interactions of individual timbres to an overall mixture. Linear dynamical systems (LDS) are shown to be capable of learning the relative contributions of individual instruments to re-create a commercial recording based on acoustic features extracted directly from audio. Variations in the model topology are explored to make it applicable to a more diverse range of input sources and improve performance. An exploration of relevant features for modeling timbre and identifying instruments is performed. Using various basis decomposition techniques, audio examples are reconstructed and analyzed in a perceptual listening test to evaluate their ability to capture salient aspects of timbre. These tests show that a 2-D decomposition is able to capture much more perceptually relevant information with regard to the temporal evolution of the frequency spectrum of a set of audio examples. The results indicate that joint modeling of frequencies and their evolution is essential for capturing higher level concepts in audio that we desire to leverage in automated systems.Ph.D., Electrical Engineering -- Drexel University, 201
    corecore