Search CORE

3,461 research outputs found

Learning sparse dictionaries for music and speech classification

Author: C Krishna Mohan
Mettu Srinivas
Roy D
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2014
Field of study

The field of music and speech classification is quite mature with researchers having settled on the approximate best discriminative representation. In this regard, Zubair et al. showed the use of sparse coefficients alongwith SVM to classify audio signals as music or speech to get a near-perfect classification. In the proposed method, we go one step further, instead of using the sparse coefficients with another classifier they are directly used in a dictionary which is learned using on-line dictionary learning for music-speech classification. This approach removes the redundancy of using a separate classifier but also produces complete discrimination of music and speech on the GTZAN music/speech dataset. Moreover, instead of the high-dimensional feature vector space which inherently leads to high computation time and complicated decision boundary calculation on the part of SVM, the restricted dictionary size with limited computation serves the same purpose

Research Archive of Indian Institute of Technology Hyderabad

Learning sparse dictionaries for music and speech classification

Author: Mettu Srinivas
Mohan C K
Roy D
Publication venue
Publication date: 01/01/2014
Field of study

The field of music and speech classification is quite\ud mature with researchers having settled on the approximate best\ud discriminative representation. In this regard, Zubair et al. showed\ud the use of sparse coefficients alongwith SVM to classify audio\ud signals as music or speech to get a near-perfect classification. In\ud the proposed method, we go one step further, instead of using\ud the sparse coefficients with another classifier they are directly\ud used in a dictionary which is learned using on-line dictionary\ud learning for music-speech classification. This approach removes\ud the redundancy of using a separate classifier but also produces\ud complete discrimination of music and speech on the GTZAN\ud music/speech dataset. Moreover, instead of the high-dimensional\ud feature vector space which inherently leads to high computation\ud time and complicated decision boundary calculation on the part\ud of SVM, the restricted dictionary size with limited computation\ud serves the same purpose

Learning Sparse Adversarial Dictionaries For Multi-Class Audio Classification

Author: Bhattacharya Puranjoy
Shaj Vaisakh
Publication venue
Publication date: 02/12/2017
Field of study

Audio events are quite often overlapping in nature, and more prone to noise than visual signals. There has been increasing evidence for the superior performance of representations learned using sparse dictionaries for applications like audio denoising and speech enhancement. This paper concentrates on modifying the traditional reconstructive dictionary learning algorithms, by incorporating a discriminative term into the objective function in order to learn class-specific adversarial dictionaries that are good at representing samples of their own class at the same time poor at representing samples belonging to any other class. We quantitatively demonstrate the effectiveness of our learned dictionaries as a stand-alone solution for both binary as well as multi-class audio classification problems.Comment: Accepted in Asian Conference of Pattern Recognition (ACPR-2017

arXiv.org e-Print Archive

Crossref

Classification of music genres using sparse representations in overcomplete dictionaries

Author: Rusu Cristian
Publication venue
Publication date: 01/01/2011
Field of study

This paper presents a simple, but efficient and robust, method for music genre classification that utilizes sparse representations in overcomplete dictionaries. The training step involves creating dictionaries, using the K-SVD algorithm, in which data corresponding to a particular music genre has a sparse representation. In the classification step, the Orthogonal Matching Pursuit (OMP) algorithm is used to separate feature vectors that consist only of Linear Predictive Coding (LPC) coefficients. The paper analyses in detail a popular case study from the literature, the ISMIR 2004 database. Using the presented method, the correct classification percentage of the 6 music genres is 85.59, result that is comparable with the best results published so far

TRAP

IMT Institutional Repository

INSTRUMENTATION-BASED MUSIC SIMILARITY USING SPARSE REPRESENTATIONS

Author: Fujihara H
IEEE
Klapuri A
Plumbley MD
Publication venue
Publication date: 01/01/2012
Field of study

© 2012 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works

Queen Mary Research Online

Sparse Codes for Speech Predict Spectrotemporal Receptive Fields in the Inferior Colliculus

Author: Carlson Nicole L.
DeWeese Michael R.
Ming Vivienne L.
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 12/07/2012
Field of study

We have developed a sparse mathematical representation of speech that minimizes the number of active model neurons needed to represent typical speech sounds. The model learns several well-known acoustic features of speech such as harmonic stacks, formants, onsets and terminations, but we also find more exotic structures in the spectrogram representation of sound such as localized checkerboard patterns and frequency-modulated excitatory subregions flanked by suppressive sidebands. Moreover, several of these novel features resemble neuronal receptive fields reported in the Inferior Colliculus (IC), as well as auditory thalamus and cortex, and our model neurons exhibit the same tradeoff in spectrotemporal resolution as has been observed in IC. To our knowledge, this is the first demonstration that receptive fields of neurons in the ascending mammalian auditory pathway beyond the auditory nerve can be predicted based on coding principles and the statistical properties of recorded sounds.Comment: For Supporting Information, see PLoS website: http://www.ploscompbiol.org/article/info%3Adoi%2F10.1371%2Fjournal.pcbi.100259

arXiv.org e-Print Archive

Directory of Open Access Journals

PubMed Central

FigShare