96 research outputs found

    An automatic speaker recognition system.

    Get PDF
    by Yu Chun Kei.Thesis (M.Phil.)--Chinese University of Hong Kong, 1989.Bibliography: leaves 86-88

    Comparative study to realize an automatic speaker recognition system

    Get PDF
    In this research, we present an automatic speaker recognition system based on adaptive orthogonal transformations. To obtain the informative features with a minimum dimension from the input signals, we created an adaptive operator, which helped to identify the speaker’s voice in a fast and efficient manner. We test the efficiency and the performance of our method by comparing it with another approach, mel-frequency cepstral coefficients (MFCCs), which is widely used by researchers as their feature extraction method. The experimental results show the importance of creating the adaptive operator, which gives added value to the proposed approach. The performance of the system achieved 96.8% accuracy using Fourier transform as a compression method and 98.1% using Correlation as a compression method

    A reduced search algorithm for speaker recognition

    Get PDF
    In this work, a reduced search algorithm for vector quantization codebooks is applied as a way to reduce the risk of wrong decisions in an automatic speaker recognition system. Instead of a full search method, the algorithm is based on the geometrical properties of the vector space, reducing the search to those codebooks which are closer to the vector under test. The speaker recognition system is intended to identify a suspect, between a small group of persons, using low quality recordings, working as a text independent automatic speaker recognition system. It was found that the alternative search algorithm can be used to reduce the risk of wrong decisions, which is specially important in forensic applications

    Approach to comparative analysis of artificial voice through an automatic speaker recognition system

    Get PDF
    El ritmo de vida acelerado al que nos vemos sometidos en los países desarrollados ha llevado a las empresas a dar respuesta a una necesidad de uso y control de software y hardware mucho más efectiva. Es así como se incorpora la utilización de la voz para facilitar cualquier comunicación cotidiana, desde consultar el tiempo que va a hacer durante el día hasta la interacción con sistemas sofisticados que utilizan la inteligencia artificial para realizar tareas más complejas como solicitar la lectura de un texto. En este último punto se centra el presente trabajo, que aborda el estudio comparativo de muestras de voz sintética, obtenidas a partir de tres aplicaciones gratuitas (SPIK-AI, NUANCE VOCALIZER y Play HT), utilizando BATVOX 4.1, el Sistema de Reconocimiento Automático de Locutores usado por la mayoría de laboratorios de Acústica Forense de todo el mundo. Nuestro objetivo es evaluar su capacidad de discriminación frente a este tipo de locuciones y determinar si los resultados alcanzados tienen una validez suficiente para considerar su utilización. El experimento realizado revela, por un lado, que la mayoría de muestras de voz artificial no cumplen con los requisitos requeridos por el sistema, bien debido a su formato de audio o bien por los desajustes con las poblaciones de referencia disponibles. Por otro lado, para las muestras útiles, aunque los resultados no son incoherentes, se observa que la capacidad de discriminación no es del todo adecuada por lo que no es recomendable su uso con este tipo de habla.Máster Universitario en Ciencias Policiale

    Application of an Annular/Sphere Search Algorithm for Speaker Recognition

    Get PDF
    In this work, an alternative search algorithm for vector quantization codebook is applied as a way to improve the performance of an automatic speaker recognition system. The search algorithm is based on geometrical properties of the vector space, defining annular and spherical regions instead of a full search method. The speaker recognition system is intended to identify a suspect, between a small group of persons, using low quality recordings, working as a text independent automatic speaker recognition system. Because the rate of recognition required in forensic applications is extremely important, the use of good discrimination algorithms can reduce the risk of bad decisions. The performance of the system under such a conditions is reported. Besides the few speaker samples used for training, a high recognition rate was obtained, so it was found an improvement of the recognition rate over the full search method

    Study of Speaker Recognition Systems

    Get PDF
    Speaker Recognition is the computing task of validating a user’s claimed identity using characteristics extracted from their voices. This technique is one of the most useful and popular biometric recognition techniques in the world especially related to areas in which security is a major concern. It can be used for authentication, surveillance, forensic speaker recognition and a number of related activities. Speaker recognition can be classified into identification and verification. Speaker identification is the process of determining which registered speaker provides a given utterance. Speaker verification, on the other hand, is the process of accepting or rejecting the identity claim of a speaker. The process of Speaker recognition consists of 2 modules namely: - feature extraction and feature matching. Feature extraction is the process in which we extract a small amount of data from the voice signal that can later be used to represent each speaker. Feature matching involves identification of the unknown speaker by comparing the extracted features from his/her voice input with the ones from a set of known speakers. Our proposed work consists of truncating a recorded voice signal, framing it, passing it through a window function, calculating the Short Term FFT, extracting its features and matching it with a stored template. Cepstral Coefficient Calculation and Mel frequency Cepstral Coefficients (MFCC) are applied for feature extraction purpose. VQLBG (Vector Quantization via Linde-Buzo-Gray), DTW (Dynamic Time Warping) and GMM (Gaussian Mixture Modelling) algorithms are used for generating template and feature matching purpose

    Enhancing speaker verification accuracy with deep ensemble learning and inclusion of multifaceted demographic factors

    Get PDF
    Effective speaker identification is essential for achieving robust speaker recognition in real-world applications such as mobile devices, security, and entertainment while ensuring high accuracy. However, deep learning models trained on large datasets with diverse demographic and environmental factors may lead to increased misclassification and longer processing times. This study proposes incorporating ethnicity and gender information as critical parameters in a deep learning model to enhance accuracy. Two convolutional neural network (CNN) models classify gender and ethnicity, followed by a Siamese deep learning model trained with critical parameters and additional features for speaker verification. The proposed model was tested using the VoxCeleb 2 database, which includes over one million utterances from 6,112 celebrities. In an evaluation after 500 epochs, equal error rate (EER) and minimum decision cost function (minDCF) showed notable results, scoring 1.68 and 0.10, respectively. The proposed model outperforms existing deep learning models, demonstrating improved performance in terms of reduced misclassification errors and faster processing times

    Real time speaker recognition using MFCC and VQ

    Get PDF
    Speaker Recognition is a process of automatically recognizing who is speaking on the basis of the individual information included in speech waves. Speaker Recognition is one of the most useful biometric recognition techniques in this world where insecurity is a major threat. Many organizations like banks, institutions, industries etc are currently using this technology for providing greater security to their vast databases.Speaker Recognition mainly involves two modules namely feature extraction and feature matching. Feature extraction is the process that extracts a small amount of data from the speaker’s voice signal that can later be used to represent that speaker. Feature matching involves the actual procedure to identify the unknown speaker by comparing the extracted features from his/her voice input with the ones that are already stored in our speech database.In feature extraction we find the Mel Frequency Cepstrum Coefficients, which are based on the known variation of the human ear’s critical bandwidths with frequency and these, are vector quantized using LBG algorithm resulting in the speaker specific codebook. In feature matching we find the VQ distortion between the input utterance of an unknown speaker and the codebooks stored in our database. Based on this VQ distortion we decide whether to accept/reject the unknown speaker’s identity. The system I implemented in my work is 80% accurate in recognizing the correct speaker.In second phase we implement on the acoustic of Real Time speaker ecognition using mfcc and vq on a TMS320C6713 DSP board. We analyze the workload and identify the most timeconsuming operations

    PENERAPAN JARINGAN SYARAF TIRUAN DENGAN RADIAL BASIS FUNCTION UNTUK PENGENALAN GENRE MUSIK

    Get PDF
    Kecerdasan           buatan        dapat        diaplikasikan dalam  banyak  bidang  dalam  kehidupan.  Penerapan kecerdasan buatan diantaranya dapat dicapai dengan pendekatan jaringan syaraf tiruan (JST). Salah satu contoh  metode  jaringan  syaraf  tiruan  yang  dikenal adalah        metode        radial       basis      function        (RBF). Jaringan  syaraf  tiruan  radial  basis  function  (JST RBF)   dikenal   sebagai   salah   satu   jaringan   syaraf yang  memiliki  tiga  lapis  bersifat  feedforward  yang dapat        memecahkan            masalah         klasifikasi          atau pengenalan   pola.   Dalam  penelitian   ini   JST   RBF digunakan   untuk   menglasifikasi   musik   ke   dalam genre       (jenis)       musik      berdasarkan          kedekatannya dengan target. Sebagai kebutuhan, jenis musik yang dipakai       pada      penelitian        ini     adalah      campursari, keroncong,  pop,  dan  rock  dengan  3  macam  durasi yaitu  2  detik,  5  detik,  dan  10  detik  pada  setiap musik.   Sedangkan   banyak   neuron   yang   dapakai dalam   lapisan   tersembunyi   sebanyak   56   neuron. Bahan masukan (input) yang digunakan dalam JST RBF ini berformat *.mp3 yang diunduh dari internet yang selanjutnya dikonversi ke dalam format *.wav dan diektraksi dengan  menggunakan  mel-frequency cepstrum          coeffisients           (MFCC).          Teknik         ini mengekstraksi  fitur  suara  yang  terdapat  pada  data musik.  Koefisien  yang  digunakan  dalam  penelitian ini  sebanyak  7  koefisien  untuk  setiap  data  musik. Dari  hasil  simulasi  program  menunjukkan  bahwa JST   RBF   dapat   mengklasifikasi                 musik   dengan akurasi   paling   tinggi   pada   data   uji   berdurasi   10 detik sebesar 75%. Kata  kunci  :  Genre,  jaringan  syaraf  tiruan, kecerdasan            buatan,          mel-frequency             cepstrum coefficients, musik, radial basis function
    corecore