3,461 research outputs found
Learning sparse dictionaries for music and speech classification
The field of music and speech classification is quite
mature with researchers having settled on the approximate best
discriminative representation. In this regard, Zubair et al. showed
the use of sparse coefficients alongwith SVM to classify audio
signals as music or speech to get a near-perfect classification. In
the proposed method, we go one step further, instead of using
the sparse coefficients with another classifier they are directly
used in a dictionary which is learned using on-line dictionary
learning for music-speech classification. This approach removes
the redundancy of using a separate classifier but also produces
complete discrimination of music and speech on the GTZAN
music/speech dataset. Moreover, instead of the high-dimensional
feature vector space which inherently leads to high computation
time and complicated decision boundary calculation on the part
of SVM, the restricted dictionary size with limited computation
serves the same purpose
Learning sparse dictionaries for music and speech classification
The field of music and speech classification is quite\ud
mature with researchers having settled on the approximate best\ud
discriminative representation. In this regard, Zubair et al. showed\ud
the use of sparse coefficients alongwith SVM to classify audio\ud
signals as music or speech to get a near-perfect classification. In\ud
the proposed method, we go one step further, instead of using\ud
the sparse coefficients with another classifier they are directly\ud
used in a dictionary which is learned using on-line dictionary\ud
learning for music-speech classification. This approach removes\ud
the redundancy of using a separate classifier but also produces\ud
complete discrimination of music and speech on the GTZAN\ud
music/speech dataset. Moreover, instead of the high-dimensional\ud
feature vector space which inherently leads to high computation\ud
time and complicated decision boundary calculation on the part\ud
of SVM, the restricted dictionary size with limited computation\ud
serves the same purpose
Learning Sparse Adversarial Dictionaries For Multi-Class Audio Classification
Audio events are quite often overlapping in nature, and more prone to noise
than visual signals. There has been increasing evidence for the superior
performance of representations learned using sparse dictionaries for
applications like audio denoising and speech enhancement. This paper
concentrates on modifying the traditional reconstructive dictionary learning
algorithms, by incorporating a discriminative term into the objective function
in order to learn class-specific adversarial dictionaries that are good at
representing samples of their own class at the same time poor at representing
samples belonging to any other class. We quantitatively demonstrate the
effectiveness of our learned dictionaries as a stand-alone solution for both
binary as well as multi-class audio classification problems.Comment: Accepted in Asian Conference of Pattern Recognition (ACPR-2017
Classification of music genres using sparse representations in overcomplete dictionaries
This paper presents a simple, but efficient and robust, method for music genre classification that utilizes sparse representations in overcomplete dictionaries. The training step involves creating dictionaries, using the K-SVD algorithm, in which data corresponding to a particular music genre has a sparse representation. In the classification step, the Orthogonal Matching Pursuit (OMP) algorithm is used to separate feature vectors that consist only of Linear Predictive Coding (LPC) coefficients. The paper analyses in detail a popular case study from the literature, the ISMIR 2004 database. Using the presented method, the correct classification percentage of the 6 music genres is 85.59, result that is comparable with the best results published so far
INSTRUMENTATION-BASED MUSIC SIMILARITY USING SPARSE REPRESENTATIONS
© 2012 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works
Sparse Codes for Speech Predict Spectrotemporal Receptive Fields in the Inferior Colliculus
We have developed a sparse mathematical representation of speech that
minimizes the number of active model neurons needed to represent typical speech
sounds. The model learns several well-known acoustic features of speech such as
harmonic stacks, formants, onsets and terminations, but we also find more
exotic structures in the spectrogram representation of sound such as localized
checkerboard patterns and frequency-modulated excitatory subregions flanked by
suppressive sidebands. Moreover, several of these novel features resemble
neuronal receptive fields reported in the Inferior Colliculus (IC), as well as
auditory thalamus and cortex, and our model neurons exhibit the same tradeoff
in spectrotemporal resolution as has been observed in IC. To our knowledge,
this is the first demonstration that receptive fields of neurons in the
ascending mammalian auditory pathway beyond the auditory nerve can be predicted
based on coding principles and the statistical properties of recorded sounds.Comment: For Supporting Information, see PLoS website:
http://www.ploscompbiol.org/article/info%3Adoi%2F10.1371%2Fjournal.pcbi.100259
- …