Search CORE

393 research outputs found

Development of the Feature Extractor for Speech Recognition

Author: Añorga Irigoien Eneko
Publication venue: Universitat Politècnica de Catalunya
Publication date: 01/10/2009
Field of study

Projecte final de carrera realitzat en col.laboració amb University of MariborWith this diploma work we have attempted to give continuity to the previous work done by other researchers called, Voice Operating Intelligent Wheelchair – VOIC [1]. A development of a wheelchair controlled by voice is presented in this work and is designed for physically disabled people, who cannot control their movements. This work describes basic components of speech recognition and wheelchair control system. Going to the grain, a speech recognizer system is comprised of two distinct blocks, a Feature Extractor and a Recognizer. The present work is targeted at the realization of an adequate Feature Extractor block which uses a standard LPC Cepstrum coder, which translates the incoming speech into a trajectory in the LPC Cepstrum feature space, followed by a Self Organizing Map, which classifies the outcome of the coder in order to produce optimal trajectory representations of words in reduced dimension feature spaces. Experimental results indicate that trajectories on such reduced dimension spaces can provide reliable representations of spoken words. The Recognizer block is left for future researchers. The main contributions of this work have been the research and approach of a new technology for development issues and the realization of applications like a voice recorder and player and a complete Feature Extractor system

Speech recognition in noisy car environment based on OSALPC representation and robust similarity measuring techniques

Author: Hernando Pericás Francisco Javier
Nadeu Camprubí Climent
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/1994
Field of study

The performance of the existing speech recognition systems degrades rapidly in the presence of background noise. The OSALPC (one-sided autocorrelation linear predictive coding) representation of the speech signal has shown to be attractive for speech recognition because of its simplicity and its high recognition performance with respect to the standard LPC in severe conditions of additive white noise. The aim of this paper is twofold: (1) to show that OSALPC also achieves good performance in a case of real noisy speech (in a car environment), and (2) to explore its combination with several robust similarity measuring techniques, showing that its performance improves by using cepstral liftering, dynamic features and multilabeling.Peer ReviewedPostprint (published version

UPCommons. Portal del coneixement obert de la UPC

Linear prediction of the one-sided autocorrelation sequence for noisy speech recognition

Author: Hernando Pericás Francisco Javier
Nadeu Camprubí Climent
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/1997
Field of study

The article presents a robust representation of speech based on AR modeling of the causal part of the autocorrelation sequence. In noisy speech recognition, this new representation achieves better results than several other related techniques.Peer ReviewedPostprint (published version

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

Semi-continuous hidden Markov models for speech recognition

Author: Huang Xuedong
Publication venue: The University of Edinburgh
Publication date: 01/01/1989
Field of study

Edinburgh Research Archive

Evaluation of preprocessors for neural network speaker verification

Author: Salleh Sheikh-Hussain
Publication venue: The University of Edinburgh
Publication date: 01/01/1997
Field of study

Edinburgh Research Archive

Percepcijska utemeljenost kepstranih mjera udaljenosti za primjene u obradi govora

Author: Antonio Vasilijević
Davor Petrinović
Publication venue: KoREMA - Croatian Society for Communications, Computing, Electronics, Measurement and Control
Publication date: 01/01/2011
Field of study

Currently, one of the most widely used distance measures in speech and speaker recognition is the Euclidean distance between mel frequency cepstral coefﬁcients (MFCC). MFCCs are based on ﬁlter bank algorithm whose ﬁlters are equally spaced on a perceptually motivated mel frequency scale. The value of mel cepstral vector, as well as the properties of the corresponding cepstral distance, are determined by several parameters used in mel cepstral analysis. The aim of this work is to examine compatibility of MFCC measure with human perception for different values of parameters in the analysis. By analysing mel ﬁlter bank parameters it is found that ﬁlter bank with 24 bands, 220 mels bandwidth and band overlap coefﬁcient equal and higher than one gives optimal spectral distortion (SD) distance measures. For this kind of mel ﬁlter bank, the difference between vowels can be recognised for full-length mel cepstral SD RMS measure higher than 0.4 - 0.5 dB. Further on, we will show that usage of truncated mel cepstral vector (12 coefﬁcients) is justiﬁed for speech recognition, but may be arguable for speaker recognition. We also analysed the impact of aliasing in cepstral domain on cepstral distortion measures. The results showed high correlation of SD distances calculated from aperiodic and periodic mel cepstrum, leading to the conclusion that the impact of aliasing is generally minor. There are rare exceptions where aliasing is present, and these were also analysed.Jedna od danas najčešće korištenih mjera u automatskom prepoznavanju govora i govornika je mjera euklidske udaljenosti MFCC vektora. Algoritam za izračunavanje mel frekvencijskih kepstralnih koeﬁcijenata zasniva se na ﬁltarskom slogu kod kojeg su pojasi ekvidistantno raspoređeni na percepcijski motiviranoj mel skali. Na vrijednost mel kepstralnog vektora, a samim time i na svojstva kepstralne mjere udaljenosti glasova, utječe veći broj parametara sustava za kepstralnu analizu. Tema ovog rada je ispitati usklađenost MFCC mjere sa stvarnim percepcijskim razlikama za različite vrijednosti parametara analize. Analizom parametara mel ﬁltarskog sloga utvrdili smo da ﬁltar sa 24 pojasa, širine 220 mel-a i faktorom preklapanja ﬁltra većim ili jednakim jedan, daje optimalne SD mjere koje se najbolje slažu s percepcijom. Za takav mel ﬁltarski slog granica čujnosti razlike između glasova je 0.4-0.5 dB, mjereno SD RMS razlikom potpunih mel kepstralnih vektora. Također, pokazat ćemo da je korištenje mel kepstralnog vektora odrezanog na konačnu dužinu (12 koeﬁcijenata) opravdano za prepoznavanje govora, ali da bi moglo biti upitno u primjenama prepoznavanja govornika. Analizirali smo i utjecaj preklapanja spektara u kepstralnoj domeni na mjere udaljenosti glasova. Utvrđena je izrazita koreliranost SD razlika izračunatih iz aperiodskog i periodičkog mel kepstra iz čega zaključujemo da je utjecaj preklapanja spektara generalno zanemariv. Postoje rijetke iznimke kod kojih je utjecaj preklapanja spektara prisutan, te su one posebno analizirane

HRČAK - Portal of Croatian Scientific and Professional Journals

Hrčak - Portal of scientific journals of Croatia

Development of the Feature Extractor for Speech Recognition

Author: Añorga Irigoien Eneko
Publication venue: Universitat Politècnica de Catalunya
Publication date: 01/10/2009
Field of study

UPCommons. Portal del coneixement obert de la UPC

Development of automatic speech recognition system for voice activated Ground Control system

Author: Kavitha S
Kumaraswamy R R
Veena S
Publication venue
Publication date: 01/01/2015
Field of study

This paper gives details of the development of a speech recognition system for voice activated Ground Control Station (GCS). The speech recognition is implemented using MATLAB and the results are validated against the Hidden Markov Model Tool Kit (HTK), an open source tool for speech recognition. The menu items of Mission planner, a typical open source GCS used for flying of Micro Air Vehicles (MAV) are used for the experiments

National Aerospace Laboratories Institutional Repository

Real time speaker recognition using MFCC and VQ

Author: G Arun Rajsekhar
Publication venue
Publication date: 01/01/2008
Field of study

Speaker Recognition is a process of automatically recognizing who is speaking on the basis of the individual information included in speech waves. Speaker Recognition is one of the most useful biometric recognition techniques in this world where insecurity is a major threat. Many organizations like banks, institutions, industries etc are currently using this technology for providing greater security to their vast databases.Speaker Recognition mainly involves two modules namely feature extraction and feature matching. Feature extraction is the process that extracts a small amount of data from the speaker’s voice signal that can later be used to represent that speaker. Feature matching involves the actual procedure to identify the unknown speaker by comparing the extracted features from his/her voice input with the ones that are already stored in our speech database.In feature extraction we find the Mel Frequency Cepstrum Coefficients, which are based on the known variation of the human ear’s critical bandwidths with frequency and these, are vector quantized using LBG algorithm resulting in the speaker specific codebook. In feature matching we find the VQ distortion between the input utterance of an unknown speaker and the codebooks stored in our database. Based on this VQ distortion we decide whether to accept/reject the unknown speaker’s identity. The system I implemented in my work is 80% accurate in recognizing the correct speaker.In second phase we implement on the acoustic of Real Time speaker ecognition using mfcc and vq on a TMS320C6713 DSP board. We analyze the workload and identify the most timeconsuming operations

ethesis@nitr

SPEECH RECOGNITION FOR CONNECTED WORD USING CEPSTRAL AND DYNAMIC TIME WARPING ALGORITHMS

Author: MUDA LINDASALWA
Publication venue
Publication date: 01/09/2014
Field of study

Speech Recognition or Speech Recognizer (SR) has become an important tool for people with physical disabilities when handling Home Automation (HA) appliances. This technology is expected to improve the daily life of the elderly and the disabled so that they are always in control over their lives, and continue to live independently, to learn and stay involved in social life. The goal of the research is to solve the constraints of current Malay SR that is still in its infancy stage where there is limited research in Malay words, especially for HA applications. Since, most of the previous works were confined to wired microphone; this limitation of using wireless microphone type makes it an important area of the research. Research was carried out to develop SR word model for five (5) Malay words and five (5) English words as commands to activate and deactivate home appliances

UTPedia