81 research outputs found
Time-frequency shift-tolerance and counterpropagation network with applications to phoneme recognition
Human speech signals are inherently multi-component non-stationary signals. Recognition schemes for classification of non-stationary signals generally require some kind of temporal alignment to be performed. Examples of techniques used for temporal alignment include hidden Markov models and dynamic time warping. Attempts to incorporate temporal alignment into artificial neural networks have resulted in the construction of time-delay neural networks. The nonstationary nature of speech requires a signal representation that is dependent on time. Time-frequency signal analysis is an extension of conventional time-domain and frequency-domain analysis methods. Researchers have reported on the effectiveness of time-frequency representations to reveal the time-varying nature of speech. In this thesis, a recognition scheme is developed for temporal-spectral alignment of nonstationary signals by performing preprocessing on the time-frequency distributions of the speech phonemes. The resulting representation is independent of any amount of time-frequency shift and is time-frequency shift-tolerant (TFST). The proposed scheme does not require time alignment of the signals and has the additional merit of providing spectral alignment, which may have importance in recognition of speech from different speakers. A modification to the counterpropagation network is proposed that is suitable for phoneme recognition. The modified network maintains the simplicity and competitive mechanism of the counterpropagation network and has additional benefits of fast learning and good modelling accuracy. The temporal-spectral alignment recognition scheme and modified counterpropagation network are applied to the recognition task of speech phonemes. Simulations show that the proposed scheme has potential in the classification of speech phonemes which have not been aligned in time. To facilitate the research, an environment to perform time-frequency signal analysis and recognition using artificial neural networks was developed. The environment provides tools for time-frequency signal analysis and simulations of of the counterpropagation network
Recommended from our members
Image processing methods to segment speech spectrograms for word level recognition
The ultimate goal of automatic speech recognition (ASR) research is to allow a computer to recognize speech in real-time, with full accuracy, independent of vocabulary size, noise, speaker characteristics or accent. Today, systems are trained to learn an individual speaker's voice and larger vocabularies statistically, but accuracy is not ideal. A small gap between actual speech and acoustic speech representation in the statistical mapping causes a failure to produce a match of the acoustic speech signals by Hidden Markov Model (HMM) methods and consequently leads to classification errors. Certainly, these errors in the low level recognition stage of ASR produce unavoidable errors at the higher levels. Therefore, it seems that ASR additional research ideas to be incorporated within current speech recognition systems. This study seeks new perspective on speech recognition. It incorporates a new approach for speech recognition, supporting it with wider previous research, validating it with a lexicon of 533 words and integrating it with a current speech recognition method to overcome the existing limitations. The study focusses on applying image processing to speech spectrogram images (SSI). We, thus develop a new writing system, which we call the Speech-Image Recogniser Code (SIR-CODE). The SIR-CODE refers to the transposition of the speech signal to an artificial domain (the SSI) that allows the classification of the speech signal into segments. The SIR-CODE allows the matching of all speech features (formants, power spectrum, duration, cues of articulation places, etc.) in one process. This was made possible by adding a Realization Layer (RL) on top of the traditional speech recognition layer (based on HMM) to check all sequential phones of a word in single step matching process. The study shows that the method gives better recognition results than HMMs alone, leading to accurate and reliable ASR in noisy environments. Therefore, the addition of the RL for SSI matching is a highly promising solution to compensate for the failure of HMMs in low level recognition. In addition, the same concept of employing SSIs can be used for whole sentences to reduce classification errors in HMM based high level recognition. The SIR-CODE bridges the gap between theory and practice of phoneme recognition by matching the SSI patterns at the word level. Thus, it can be adapted for dynamic time warping on the SIR-CODE segments, which can help to achieve ASR, based on SSI matching alone
Recent Advances in Signal Processing
The signal processing task is a very critical issue in the majority of new technological inventions and challenges in a variety of applications in both science and engineering fields. Classical signal processing techniques have largely worked with mathematical models that are linear, local, stationary, and Gaussian. They have always favored closed-form tractability over real-world accuracy. These constraints were imposed by the lack of powerful computing tools. During the last few decades, signal processing theories, developments, and applications have matured rapidly and now include tools from many areas of mathematics, computer science, physics, and engineering. This book is targeted primarily toward both students and researchers who want to be exposed to a wide variety of signal processing techniques and algorithms. It includes 27 chapters that can be categorized into five different areas depending on the application at hand. These five categories are ordered to address image processing, speech processing, communication systems, time-series analysis, and educational packages respectively. The book has the advantage of providing a collection of applications that are completely independent and self-contained; thus, the interested reader can choose any chapter and skip to another without losing continuity
Informational masking of speech depends on masker spectro-temporal variation but not on its coherence
The impact of an extraneous formant on intelligibility is affected by the extent (depth) of variation in its formant-frequency contour. Two experiments explored whether this impact also depends on masker spectro-temporal coherence, using a method ensuring that interference occurred only through informational masking. Targets were monaural three-formant analogues (F1+F2+F3) of natural sentences presented alone or accompanied by a contralateral competitor for F2 (F2C) that listeners must reject to optimize recognition. The standard F2C was created using the inverted F2 frequency contour and constant amplitude. Variants were derived by dividing F2C into abutting segments (100–200 ms, 10-ms rise/fall). Segments were presented either in the correct order (coherent) or in random order (incoherent), introducing abrupt discontinuities into the F2C frequency contour. F2C depth was also manipulated (0%, 50%, or 100%) prior to segmentation, and the frequency contour of each segment either remained time-varying or was set to constant at the geometric mean frequency of that segment. The extent to which F2C lowered keyword scores depended on segment type (frequency-varying vs constant) and depth, but not segment order. This outcome indicates that the impact on intelligibility depends critically on the overall amount of frequency variation in the competitor, but not its spectro-temporal coherence
The relationship between phonological processing and lexical acquisition in a foreign language. A study on Polish primary school students learning English
Wydział AnglistykiCelem projektu jest zbadanie wpływu czynników fonologicznych na uczenie się słów. Badania nad akwizycją językową wskazują, że przetwarzanie fonologiczne i fonologiczna pamięć krótkotrwała może odgrywać ważną rolę w uczeniu się nowych słów. Jednak nadal brakuje badań, które zajmowałyby się związkiem pomiędzy przetwarzaniem fonologicznym a uczeniem się słów obcego języka. Co więcej, problemem w literaturze jest sam koncept przetwarzania fonologicznego, który nie jest zbyt dobrze zdefiniowany. Poniższa praca zawiera przegląd literatury nt. przetwarzania fonologicznego i proponuje definicję tego konceptu, a następnie przedstawia badanie na 44 polskich dziewięciolatkach uczących się angielskiego w szkole. Uczestnicy zbadani zostali baterią testów na przetwarzanie fonologiczne w języku polskim i angielskim, jak również testami na krótkotrwałą pamięć fonologiczną. Dzieci były też poproszone o wykonanie zadań mierzących uczenie się słów w języku ojczystym (polskim), drugim (angielskim) i w zupełnie obcym języku (LX). Ponadto w badaniu mierzono postępy w akwizycji słownictwa angielskiego u uczestników badania w przeciągu roku szkolnego. Wyniki badania wskazują na związek pomiędzy przetwarzaniem fonologicznym, a uczeniem się słów obcego języka.Phonological short-term memory, phonological processing, lexical acquisition, second language acquisitionThe aim of this project to investigate the phonological factors in word learning. Literature on language acquisition indicates that phonological processing and phonological short-term memory might play an important role in acquiring new vocabulary. However, this topic is still understudied. In particular, there is a lack of studies on the relationship between phonological processing and word learning in a foreign language. Another problem is that the concept of phonological processing is not very well defined in itself. This dissertation provides a review of studies on phonological processing and offers a definition of the concept. Then it goes on to describe a study, in which 44 Polish 9-year olds, who learned English as a second language at schools, are tested on several measures of phonological processing in both Polish and English, and on measures of phonological short-term memory. The participants were also asked to perform four experimental novel word learning tasks in their native language (Polish), second language (English) and in a completely foreign language (LX) and they were tested on the progress they made in term of English vocabulary acquisition over the period of the school year. The results point to relationship between phonological processing and foreign word learning
- …