Search CORE

27 research outputs found

Music genre classification using On-line Dictionary Learning

Author: C Krishna Mohan
Mettu Srinivas
Roy D
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2015
Field of study

In this paper, an approach for music genre classification based on sparse representation using MARSYAS features is proposed. The MARSYAS feature descriptor consisting of timbral texture, pitch and beat related features is used for the classification of music genre. On-line Dictionary Learning (ODL) is used to achieve sparse representation of the features for developing dictionaries for each musical genre. We demonstrate the efficacy of the proposed framework on the Latin Music Database (LMD) consisting of over 3000 tracks spanning 10 genres namely Axé, Bachata, Bolero, Forró, Gaúcha, Merengue, Pagode, Salsa, Sertaneja and Tango

Research Archive of Indian Institute of Technology Hyderabad

Learnable MFCCs for Speaker Verification

Author: Kinnunen Tomi
Liu Xuechen
Sahidullah Md
Publication venue
Publication date: 20/02/2021
Field of study

We propose a learnable mel-frequency cepstral coefficient (MFCC) frontend architecture for deep neural network (DNN) based automatic speaker verification. Our architecture retains the simplicity and interpretability of MFCC-based features while allowing the model to be adapted to data flexibly. In practice, we formulate data-driven versions of the four linear transforms of a standard MFCC extractor -- windowing, discrete Fourier transform (DFT), mel filterbank and discrete cosine transform (DCT). Results reported reach up to 6.7\% (VoxCeleb1) and 9.7\% (SITW) relative improvement in term of equal error rate (EER) from static MFCCs, without additional tuning effort.Comment: Accepted to ISCAS 202

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

Music genre classification using On-line Dictionary Learning

Author: Mettu Srinivas
Mohan C K
Roy D
Publication venue
Publication date
Field of study

Crossref

Supervised Speaker Diarization Using Random Forests: A Tool for Psychotherapy Process Research

Author: Fürer Lukas
Roth Volker
Schenk Nathalie
Schmeck Klaus
Steppan Martin
Zimmermann Ronan
Publication venue: 'Frontiers Media SA'
Publication date: 01/01/2020
Field of study

Speaker diarization is the practice of determining who speaks when in audio recordings. Psychotherapy research often relies on labor intensive manual diarization. Unsupervised methods are available but yield higher error rates. We present a method for supervised speaker diarization based on random forests. It can be considered a compromise between commonly used labor-intensive manual coding and fully automated procedures. The method is validated using the EMRAI synthetic speech corpus and is made publicly available. It yields low diarization error rates (M: 5.61%, STD: 2.19). Supervised speaker diarization is a promising method for psychotherapy research and similar fields

edoc

ASR Systems in Noisy Environment: Analysis and Solutions for Increasing Noise Robustness

Author: Pollak P.
Rajnoha J.
Publication venue: Společnost pro radioelektronické inženýrství
Publication date: 01/04/2011
Field of study

This paper deals with the analysis of Automatic Speech Recognition (ASR) suitable for usage within noisy environment and suggests optimum configuration under various noisy conditions. The behavior of standard parameterization techniques was analyzed from the viewpoint of robustness against background noise. It was done for Melfrequency cepstral coefficients (MFCC), Perceptual linear predictive (PLP) coefficients, and their modified forms combining main blocks of PLP and MFCC. The second part is devoted to the analysis and contribution of modified techniques containing frequency-domain noise suppression and voice activity detection. The above-mentioned techniques were tested with signals in real noisy environment within Czech digit recognition task and AURORA databases. Finally, the contribution of special VAD selective training and MLLR adaptation of acoustic models were studied for various signal features

Directory of Open Access Journals

Digital library of Brno University of Technology

Learnable MFCCs for Speaker Verification

Author: Kinnunen Tomi
Liu Xuechen
Sahidullah Md
Publication venue: HAL CCSD
Publication date: 22/05/2021
Field of study

International audienceWe propose a learnable mel-frequency cepstral coefficients (MFCCs) front-end architecture for deep neural network (DNN) based automatic speaker verification. Our architecture retains the simplicity and interpretability of MFCC-based features while allowing the model to be adapted to data flexibly. In practice, we formulate data-driven version of four linear transforms in a standard MFCC extractor-windowing, discrete Fourier transform (DFT), mel filterbank and discrete cosine transform (DCT). Results reported reach up to 6.7% (VoxCeleb1) and 9.7% (SITW) relative improvement in term of equal error rate (EER) from static MFCCs, without additional tuning effort. Index Terms-Speaker verification, feature extraction, melfrequency cesptral coefficients (MFCCs)

INRIA a CCSD electronic archive server

The Teager-Kaiser Energy Cepstral Coefficients as an Effective Structural Health Monitoring Tool

Author: Betti Raimondo
Ceravolo Rosario
Civera Marco
Ferraris Matteo
Surace Cecilia
Publication venue: 'MDPI AG'
Publication date: 23/11/2019
Field of study

Recently, features and techniques from speech processing have started to gain increasing attention in the Structural Health Monitoring (SHM) community, in the context of vibration analysis. In particular, the Cepstral Coefficients (CCs) proved to be apt in discerning the response of a damaged structure with respect to a given undamaged baseline. Previous works relied on the Mel-Frequency Cepstral Coefficients (MFCCs). This approach, while efficient and still very common in applications, such as speech and speaker recognition, has been followed by other more advanced and competitive techniques for the same aims. The Teager-Kaiser Energy Cepstral Coefficients (TECCs) is one of these alternatives. These features are very closely related to MFCCs, but provide interesting and useful additional values, such as e.g., improved robustness with respect to noise. The goal of this paper is to introduce the use of TECCs for damage detection purposes, by highlighting their competitiveness with closely related features. Promising results from both numerical and experimental data were obtained

Multidisciplinary Digital Publishing Institute

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)