Search CORE

19 research outputs found

Capstrum Coefficient Features Analysis for Multilingual Speaker Identification System

Author: Vinay Kumar Jain, Dr.(Mrs.) Neeta Tripathi
Publication venue: Auricle Global Society of Education and Research
Publication date: 31/10/2017
Field of study

The Capstrum coefficient features analysis plays a crucial role in the overall performance of the multilingual speaker identification system. The objective of the research work to investigates the results that can be obtained when you combine Mel-Frequency Cepstral Coefficients (MFCC) and Gammatone Frequency Cepstral Coefficients (GFCC) as feature components for the front-end processing of a multilingual speaker identification system. The MFCC and GFCC feature components combined are suggested to improve the reliability of a multilingual speaker identification system. The GFCC features in recent studies have shown very good robustness against noise and acoustic change. The main idea is to integrate MFCC & GFCC features to improve the overall multilingual speaker identification system performance. The experiment carried out on recently collected multilingual speaker speech database to analysis of GFCC and MFCC. The speech database consists of speech data recorded from 100 speakers including male and female. The speech samples are collected in three different languages Hindi, Marathi and Rajasthani. The extracted features of the speech signals of multiple languages are observed. The results provide an empirical comparison of the MFCC-GFCC combined features and the individual counterparts. The average language-independent multilingual speaker identification rate 84.66% (using MFCC), 93.22% (using GFCC)and 94.77% (using combined features)has been achieved

International Journal on Future Revolution in Computer Science & Communication Engineering

ATRIBUTOS PNCC PARA RECONOCIMIENTO ROBUSTO DE LOCUTOR INDEPENDIENTE DEL TEXTO

Author: Anacleto Silva Harry
Publication venue: FACULTAD DE INGENIRÍA, ARQUITECTURA Y URBANISMO
Publication date: 12/09/2016
Field of study

El reconocimiento automático de locutores ha sido sujeto de intensa investigación durante toda la década pasada. Sin embargo las características, del estado de arte de los algoritmos son drásticamente degradados en presencia de ruido. Este artículo se centra en la aplicación de una nueva técnica llamada Power-Normalized Cepstral Coefficients (PNCC) para el reconocimiento de locutor independiente del texto. El objetivo de este estudio es evaluar las características de esta técnica en comparación con la técnica convencional Mel Frequency Cepstral Coefficients (MFCC) y la técnica Gammatone Frequency Cepstral Coefficients (GFCC).

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Revistas Científicas de la Universidad César Vallejo (UCV)

REVISTAS CIENTÃFICAS USS

Adaptive wavelet thresholding with robust hybrid features for text-independent speaker identification system

Author: Alabbasi Hesham A.
Hasan Fadhil S.
Jalil Ali M.
Publication venue: 'Institute of Advanced Engineering and Science'
Publication date: 01/10/2020
Field of study

The robustness of speaker identification system over additive noise channel is crucial for real-world applications. In speaker identification (SID) systems, the extracted features from each speech frame are an essential factor for building a reliable identification system. For clean environments, the identification system works well; in noisy environments, there is an additive noise, which is affect the system. To eliminate the problem of additive noise and to achieve a high accuracy in speaker identification system a proposed algorithm for feature extraction based on speech enhancement and a combined features is presents. In this paper, a wavelet thresholding pre-processing stage, and feature warping (FW) techniques are used with two combined features named power normalized cepstral coefficients (PNCC) and gammatone frequency cepstral coefficients (GFCC) to improve the identification system robustness against different types of additive noises. Universal Background Model Gaussian Mixture Model (UBM-GMM) is used for features matching between the claim and actual speakers. The results showed performance improvement for the proposed feature extraction algorithm of identification system comparing with conventional features over most types of noises and different SNR ratios

ZENODO

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

Institute of Advanced Engineering and Science

Robust cepstral feature for bird sound classification

Author: Abas P. Emeroylariffion
De Silva Liyanage C.
Mohanchandra Kusuma
Ramashini Murugaiya
Publication venue: 'Institute of Advanced Engineering and Science'
Publication date: 01/04/2022
Field of study

Birds are excellent environmental indicators and may indicate sustainability of the ecosystem; birds may be used to provide provisioning, regulating, and supporting services. Therefore, birdlife conservation-related researches always receive centre stage. Due to the airborne nature of birds and the dense nature of the tropical forest, bird identifications through audio may be a better solution than visual identification. The goal of this study is to find the most appropriate cepstral features that can be used to classify bird sounds more accurately. Fifteen (15) endemic Bornean bird sounds have been selected and segmented using an automated energy-based algorithm. Three (3) types of cepstral features are extracted; linear prediction cepstrum coefficients (LPCC), mel frequency cepstral coefficients (MFCC), gammatone frequency cepstral coefficients (GTCC), and used separately for classification purposes using support vector machine (SVM). Through comparison between their prediction results, it has been demonstrated that model utilising GTCC features, with 93.3% accuracy, outperforms models utilising MFCC and LPCC features. This demonstrates the robustness of GTCC for bird sounds classification. The result is significant for the advancement of bird sound classification research, which has been shown to have many applications such as in eco-tourism and wildlife management

ZENODO

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

Institute of Advanced Engineering and Science

Faked Speech Detection with Zero Knowledge

Author: Ajmi Sahar Al
Hayat Khizar
Kumar Naresh
Magnier Baptiste
Najmuldeen Munaf
Obaidi Alaa M. Al
Publication venue
Publication date: 03/09/2023
Field of study

Audio is one of the most used ways of human communication, but at the same time it can be easily misused to trick people. With the revolution of AI, the related technologies are now accessible to almost everyone thus making it simple for the criminals to commit crimes and forgeries. In this work, we introduce a neural network method to develop a classifier that will blindly classify an input audio as real or mimicked; the word 'blindly' refers to the ability to detect mimicked audio without references or real sources. The proposed model was trained on a set of important features extracted from a large dataset of audios to get a classifier that was tested on the same set of features from different audios. The data was extracted from two raw datasets, especially composed for this work; an all English dataset and a mixed dataset (Arabic plus English). These datasets have been made available, in raw form, through GitHub for the use of the research community at https://github.com/SaSs7/Dataset. For the purpose of comparison, the audios were also classified through human inspection with the subjects being the native speakers. The ensued results were interesting and exhibited formidable accuracy.Comment: 14 pages, 4 figures (6 if you count subfigures), 2 table

arXiv.org e-Print Archive

ROBUST HYBRID FEATURES BASED TEXT INDEPENDENT SPEAKER IDENTIFICATION SYSTEM OVER NOISY ADDITIVE CHANNEL

Author: Ali Muayad Jalil
Fadhel Sahib Hasan
Hesham Adnan Alabbasi
Publication venue: Mustansiriyah University/College of Engineering
Publication date: 01/07/2020
Field of study

Robustness of speaker identification systems over additive noise is crucial for real-world applications. In this paper, two robust features named Power Normalized Cepstral Coefficients (PNCC) and Gammatone Frequency Cepstral Coefficients (GFCC) are combined together to improve the robustness of speaker identification system over different types of noise. Universal Background Model Gaussian Mixture Model (UBM-GMM) is used as a feature matching and a classifier to identify the claim speakers. Evaluation results show that the proposed hybrid feature improves the performance of identification system when compared to conventional features over most types of noise and different signal-to-noise ratios

Directory of Open Access Journals

DeepVOX: Discovering Features from Raw Audio for Speaker Recognition in Degraded Audio Signals

Author: Chowdhury Anurag
Ross Arun
Publication venue
Publication date: 26/08/2020
Field of study

Automatic speaker recognition algorithms typically use pre-defined filterbanks, such as Mel-Frequency and Gammatone filterbanks, for characterizing speech audio. The design of these filterbanks is based on domain-knowledge and limited empirical observations. The resultant features, therefore, may not generalize well to different types of audio degradation. In this work, we propose a deep learning-based technique to induce the filterbank design from vast amounts of speech audio. The purpose of such a filterbank is to extract features robust to degradations in the input audio. To this effect, a 1D convolutional neural network is designed to learn a time-domain filterbank called DeepVOX directly from raw speech audio. Secondly, an adaptive triplet mining technique is developed to efficiently mine the data samples best suited to train the filterbank. Thirdly, a detailed ablation study of the DeepVOX filterbanks reveals the presence of both vocal source and vocal tract characteristics in the extracted features. Experimental results on VOXCeleb2, NIST SRE 2008 and 2010, and Fisher speech datasets demonstrate the efficacy of the DeepVOX features across a variety of audio degradations, multi-lingual speech data, and varying-duration speech audio. The DeepVOX features also improve the performance of existing speaker recognition algorithms, such as the xVector-PLDA and the iVector-PLDA

arXiv.org e-Print Archive