96 research outputs found
An automatic speaker recognition system.
by Yu Chun Kei.Thesis (M.Phil.)--Chinese University of Hong Kong, 1989.Bibliography: leaves 86-88
Comparative study to realize an automatic speaker recognition system
In this research, we present an automatic speaker recognition system based on adaptive orthogonal transformations. To obtain the informative features with a minimum dimension from the input signals, we created an adaptive operator, which helped to identify the speaker’s voice in a fast and efficient manner. We test the efficiency and the performance of our method by comparing it with another approach, mel-frequency cepstral coefficients (MFCCs), which is widely used by researchers as their feature extraction method. The experimental results show the importance of creating the adaptive operator, which gives added value to the proposed approach. The performance of the system achieved 96.8% accuracy using Fourier transform as a compression method and 98.1% using Correlation as a compression method
A reduced search algorithm for speaker recognition
In this work, a reduced search algorithm for vector quantization codebooks is applied as a way to reduce the risk of wrong decisions in an automatic speaker recognition system. Instead of a full search method, the algorithm is based on the geometrical properties of the vector space, reducing the search to those codebooks
which are closer to the vector under test. The speaker recognition system is intended to identify a suspect, between a small group of persons, using low quality recordings, working as a text independent automatic speaker recognition system. It was found that the alternative search algorithm can be used to reduce the risk of wrong decisions, which is specially important in forensic applications
Approach to comparative analysis of artificial voice through an automatic speaker recognition system
El ritmo de vida acelerado al que nos vemos sometidos en los países desarrollados
ha llevado a las empresas a dar respuesta a una necesidad de uso y control de software
y hardware mucho más efectiva. Es así como se incorpora la utilización de la voz para
facilitar cualquier comunicación cotidiana, desde consultar el tiempo que va a hacer
durante el día hasta la interacción con sistemas sofisticados que utilizan la inteligencia
artificial para realizar tareas más complejas como solicitar la lectura de un texto.
En este último punto se centra el presente trabajo, que aborda el estudio
comparativo de muestras de voz sintética, obtenidas a partir de tres aplicaciones
gratuitas (SPIK-AI, NUANCE VOCALIZER y Play HT), utilizando BATVOX 4.1, el
Sistema de Reconocimiento Automático de Locutores usado por la mayoría de
laboratorios de Acústica Forense de todo el mundo. Nuestro objetivo es evaluar su
capacidad de discriminación frente a este tipo de locuciones y determinar si los
resultados alcanzados tienen una validez suficiente para considerar su utilización.
El experimento realizado revela, por un lado, que la mayoría de muestras de voz
artificial no cumplen con los requisitos requeridos por el sistema, bien debido a su
formato de audio o bien por los desajustes con las poblaciones de referencia
disponibles. Por otro lado, para las muestras útiles, aunque los resultados no son
incoherentes, se observa que la capacidad de discriminación no es del todo adecuada
por lo que no es recomendable su uso con este tipo de habla.Máster Universitario en Ciencias Policiale
Application of an Annular/Sphere Search Algorithm for Speaker Recognition
In this work, an alternative search algorithm for vector quantization codebook is applied as a way to improve
the performance of an automatic speaker recognition system. The search algorithm is based on geometrical properties of the vector space, defining annular and spherical regions instead of a full search method. The speaker recognition system is intended to identify a suspect, between a small group of persons, using low quality recordings, working as a text independent automatic speaker recognition system. Because the rate of recognition required in forensic applications is extremely important, the use of good discrimination algorithms can reduce the risk of bad decisions. The performance of the system under such a conditions is reported. Besides the few speaker samples used for training, a high recognition rate was obtained, so it was found an improvement of the recognition rate over the full search method
Study of Speaker Recognition Systems
Speaker Recognition is the computing task of validating a user’s claimed identity using characteristics extracted from their voices. This technique is one of the most useful and popular biometric recognition techniques in the world especially related to areas in which security is a major concern. It can be used for authentication, surveillance, forensic speaker recognition and a number of related activities.
Speaker recognition can be classified into identification and verification. Speaker identification is the process of determining which registered speaker provides a given utterance. Speaker verification, on the other hand, is the process of accepting or rejecting the identity claim of a speaker.
The process of Speaker recognition consists of 2 modules namely: - feature extraction and feature matching. Feature extraction is the process in which we extract a small amount of data from the voice signal that can later be used to represent each speaker. Feature matching involves identification of the unknown speaker by comparing the extracted features from his/her voice input with the ones from a set of known speakers.
Our proposed work consists of truncating a recorded voice signal, framing it, passing it through a window function, calculating the Short Term FFT, extracting its features and matching it with a stored template. Cepstral Coefficient Calculation and Mel frequency Cepstral Coefficients (MFCC) are applied for feature extraction purpose. VQLBG (Vector Quantization via Linde-Buzo-Gray), DTW (Dynamic Time Warping) and GMM (Gaussian Mixture Modelling) algorithms are used for generating template and feature matching purpose
Enhancing speaker verification accuracy with deep ensemble learning and inclusion of multifaceted demographic factors
Effective speaker identification is essential for achieving robust speaker recognition in real-world applications such as mobile devices, security, and entertainment while ensuring high accuracy. However, deep learning models trained on large datasets with diverse demographic and environmental factors may lead to increased misclassification and longer processing times. This study proposes incorporating ethnicity and gender information as critical parameters in a deep learning model to enhance accuracy. Two convolutional neural network (CNN) models classify gender and ethnicity, followed by a Siamese deep learning model trained with critical parameters and additional features for speaker verification. The proposed model was tested using the VoxCeleb 2 database, which includes over one million utterances from 6,112 celebrities. In an evaluation after 500 epochs, equal error rate (EER) and minimum decision cost function (minDCF) showed notable results, scoring 1.68 and 0.10, respectively. The proposed model outperforms existing deep learning models, demonstrating improved performance in terms of reduced misclassification errors and faster processing times
Real time speaker recognition using MFCC and VQ
Speaker Recognition is a process of automatically recognizing who is speaking on the basis of the individual information included in speech waves. Speaker Recognition is one of the most useful biometric recognition techniques in this world where insecurity is a major threat. Many organizations like banks, institutions, industries etc are currently using this technology for providing greater security to their vast databases.Speaker Recognition mainly involves two modules namely feature extraction and feature matching. Feature extraction is the process that extracts a small amount of data from the speaker’s voice signal that can later be used to represent that speaker. Feature matching involves the actual procedure to identify the unknown speaker by comparing the extracted features from his/her voice input with the ones that are already stored in our speech database.In feature extraction we find the Mel Frequency Cepstrum Coefficients, which are based on the known variation of the human ear’s critical bandwidths with frequency and these, are vector quantized using LBG algorithm resulting in the speaker specific codebook.
In feature matching we find the VQ distortion between the input utterance of an unknown speaker and the codebooks stored in our database. Based on this VQ distortion we decide whether to accept/reject the unknown speaker’s identity. The system I implemented in my work is 80% accurate in recognizing the correct speaker.In second phase we implement on the acoustic of Real Time speaker ecognition using mfcc and vq on a TMS320C6713 DSP board. We analyze the workload and identify the most timeconsuming
operations
PENERAPAN JARINGAN SYARAF TIRUAN DENGAN RADIAL BASIS FUNCTION UNTUK PENGENALAN GENRE MUSIK
Kecerdasan buatan dapat diaplikasikan
dalam banyak bidang dalam kehidupan. Penerapan
kecerdasan buatan diantaranya dapat dicapai dengan
pendekatan jaringan syaraf tiruan (JST). Salah satu
contoh metode jaringan syaraf tiruan yang dikenal
adalah metode radial basis function (RBF).
Jaringan syaraf tiruan radial basis function (JST
RBF) dikenal sebagai salah satu jaringan syaraf
yang memiliki tiga lapis bersifat feedforward yang
dapat memecahkan masalah klasifikasi atau
pengenalan pola. Dalam penelitian ini JST RBF
digunakan untuk menglasifikasi musik ke dalam
genre (jenis) musik berdasarkan kedekatannya
dengan target. Sebagai kebutuhan, jenis musik yang
dipakai pada penelitian ini adalah campursari,
keroncong, pop, dan rock dengan 3 macam durasi
yaitu 2 detik, 5 detik, dan 10 detik pada setiap
musik. Sedangkan banyak neuron yang dapakai
dalam lapisan tersembunyi sebanyak 56 neuron.
Bahan masukan (input) yang digunakan dalam JST
RBF ini berformat *.mp3 yang diunduh dari internet
yang selanjutnya dikonversi ke dalam format *.wav
dan diektraksi dengan menggunakan mel-frequency
cepstrum coeffisients (MFCC). Teknik ini
mengekstraksi fitur suara yang terdapat pada data
musik. Koefisien yang digunakan dalam penelitian
ini sebanyak 7 koefisien untuk setiap data musik.
Dari hasil simulasi program menunjukkan bahwa
JST RBF dapat mengklasifikasi musik dengan
akurasi paling tinggi pada data uji berdurasi 10
detik sebesar 75%.
Kata kunci : Genre, jaringan syaraf tiruan,
kecerdasan buatan, mel-frequency cepstrum
coefficients, musik, radial basis function
- …