Search CORE

971 research outputs found

Fractal based speech recognition and synthesis

Author: Fekkai Souhila
Publication venue: Department of Computing Science and Engineering
Publication date: 01/10/2002
Field of study

Transmitting a linguistic message is most often the primary purpose of speech communication and the recognition of this message by machine that would be most useful. This research consists of two major parts. The first part presents a novel and promising approach for estimating the degree of recognition of speech phonemes and makes use of a new set of features based fractals. The main methods of computing the fractal dimension of speech signals are reviewed and a new speaker-independent speech recognition system developed at De Montfort University is described in detail. Finally, a Least Square Method as well as a novel Neural Network algorithm is employed to derive the recognition performance of the speech data. The second part of this work studies the synthesis of speech words, which is based mainly on the fractal dimension to create natural sounding speech. The work shows that by careful use of the fractal dimension together with the phase of the speech signal to ensure consistent intonation contours, natural-sounding speech synthesis is achievable with word level speech. In order to extend the flexibility of this framework, we focused on the filtering and the compression of the phase to maintain and produce natural sounding speech. A ‘naturalness level’ is achieved as a result of the fractal characteristic used in the synthesis process. Finally, a novel speech synthesis system based on fractals developed at De Montfort University is discussed. Throughout our research simulation experiments were performed on continuous speech data available from the Texas Instrument Massachusetts institute of technology ( TIMIT) database, which is designed to provide the speech research community with a standarised corpus for the acquisition of acoustic-phonetic knowledge and for the development and evaluation of automatic speech recognition system

De Montfort University Open Research Archive

The low bit-rate coding of speech signals

Author: Chen Mun
Chen Mun
Publication venue: Department of Communication and Electronics, Imperial College London
Publication date: 01/01/1976
Field of study

Imperial Users onl

Spiral - Imperial College Digital Repository

Recommended from our members

Hardward and algorithm architectures for real-time additive synthesis

Author: Symons Peter Robert
Publication venue
Publication date: 01/01/2005
Field of study

Additive synthesis is a fundamental computer music synthesis paradigm tracing its origins to the work of Fourier and Helmholtz. Rudimentary implementation linearly combines harmonic sinusoids (or partials) to generate tones whose perceived timbral characteristics are a strong function of the partial amplitude spectrum. Having evolved over time, additive synthesis describes a collection of algorithms each characterised by the time-varying linear combination of basis components to generate temporal evolution of timbre. Basis components include exactly harmonic partials, inharmonic partials with time-varying frequency or non-sinusoidal waveforms each with distinct spectral characteristics. Additive synthesis of polyphonic musical instrument tones requires a large number of independently controlled partials incurring a large computational overhead whose investigation and reduction is a key motivator for this work. The thesis begins with a review of prevalent synthesis techniques setting additive synthesis in context and introducing the spectrum modelling paradigm which provides baseline spectral data to the additive synthesis process obtained from the analysis of natural sounds. We proceed to investigate recursive and phase accumulating digital sinusoidal oscillator algorithms, defining specific metrics to quantify relative performance. The concepts of phase accumulation, table lookup phase-amplitude mapping and interpolated fractional addressing are introduced and developed and shown to underpin an additive synthesis subclass - wavetable lookup synthesis (WLS). WLS performance is simulated against specific metrics and parameter conditions peculiar to computer music requirements. We conclude by presenting processing architectures which accelerate computational throughput of specific WLS operations and the sinusoidal additive synthesis model. In particular, we introduce and investigate the concept of phase domain processing and present several “pipeline friendly” arithmetic architectures using this technique which implement the additive synthesis of sinusoidal partials

Open Research Online (The Open University)

OpenGrey Repository

Identification of Transient Speech Using Wavelet Transforms

Author: Rasetshwane Daniel Motlotle
Publication venue
Publication date: 21/06/2005
Field of study

It is generally believed that abrupt stimulus changes, which in speech may be time-varying frequency edges associated with consonants, transitions between consonants and vowels and transitions within vowels are critical to the perception of speech by humans and for speech recognition by machines. Noise affects speech transitions more than it affects quasi-steady-state speech. I believe that identifying and selectively amplifying speech transitions may enhance the intelligibility of speech in noisy conditions. The purpose of this study is to evaluate the use of wavelet transforms to identify speech transitions. Using wavelet transforms may be computationally efficient and allow for real-time applications. The discrete wavelet transform (DWT), stationary wavelet transform (SWT) and wavelet packets (WP) are evaluated. Wavelet analysis is combined with variable frame rate processing to improve the identification process. Variable frame rate can identify time segments when speech feature vectors are changing rapidly and when they are relatively stationary. Energy profiles for words, which show the energy in each node of a speech signal decomposed using wavelets, are used to identify nodes that include predominately transient information and nodes that include predominately quasi-steady-state information, and these are used to synthesize transient and quasi-steady-state speech components. These speech components are estimates of the tonal and nontonal speech components, which Yoo et al identified using time-varying band-pass filters. Comparison of spectra, a listening test and mean-squared-errors between the transient components synthesized using wavelets and Yoo's nontonal components indicated that wavelet packets identified the best estimates of Yoo's components. An algorithm that incorporates variable frame rate analysis into wavelet packet analysis is proposed. The development of this algorithm involves the processes of choosing a wavelet function and a decomposition level to be used. The algorithm itself has 4 steps: wavelet packet decomposition; classification of terminal nodes; incorporation of variable frame rate processing; synthesis of speech components. Combining wavelet analysis with variable frame rate analysis provides the best estimates of Yoo's speech components

D-Scholarship@Pitt

Wavelet-based techniques for speech recognition

Author: Omar Farooq (7204418)
Publication venue
Publication date: 01/01/2002
Field of study

In this thesis, new wavelet-based techniques have been developed for the extraction of features from speech signals for the purpose of automatic speech recognition (ASR). One of the advantages of the wavelet transform over the short time Fourier transform (STFT) is its capability to process non-stationary signals. Since speech signals are not strictly stationary the wavelet transform is a better choice for time-frequency transformation of these signals. In addition it has compactly supported basis functions, thereby reducing the amount of computation as opposed to STFT where an overlapping window is needed. [Continues.

Loughborough University Institutional Repository

Techniques for the enhancement of linear predictive speech coding in adverse conditions

Author: Wrench Alan A.
Publication venue: The University of Edinburgh
Publication date: 01/01/1989
Field of study

Edinburgh Research Archive

Machine Learning Based Dynamic Band Selection for Splitting Auditory Signals to Reduce Inner Ear Hearing Losses

Author: Divekar Sudhir Narsing
Nigam Manoj Kumar
Publication venue: Auricle Global Society of Education and Research
Publication date: 10/07/2023
Field of study

Quality of hearing has been severely impacted due to signal losses occurs in the human inner ear explicitly in the region of cochlea. Loudness recruitment, degraded frequency selectivity and auditory masking are the major outward effects of inner ear hearing losses. Splitting auditory signals into frequency bands and presenting dichotically to both ears became a comprehensive solution to reduce inner ear hearing losses. However, these methods divide input signal into the fix number of frequency bands, this limits their applicability where signals have large variations in their spectral characteristics. To address this challenge, we have proposed machine learning based intelligent band selection algorithm to split auditory signals dynamically. Proposed algorithm analyze input speech signal based on spectral characteristics to determine the optimum number of bands required to effectively present major acoustic cues of the signal. Further, dynamic splitting algorithm efficiently divides signal for dichotic presentation. Proposed method has been examined on large number of subjects from different age groups and gender having cochlear hearing impairment. Qualitative and quantitative assessment shown significant improvement in the recognition score with substantial reduction in the response time

International Journal on Recent and Innovation Trends in Computing and Communication

Communication Biophysics

Author: Bickley Corine A.
Boardman Ian A.
Braida Louis D.
Brown M. Christian
Brown Robert M.
Bustamante Diane K.
Colburn H. Steven
Corbett Cathleen R.
Curby Mark L.
Delgutte Bertrand
Delhorne Lorraine A.
Durlach Nathaniel I.
Dynes Scott B. C.
Eatock Ruth Anne
Eddington Donald K.
Freeman Dennis M.
Frisbie Joseph A.
Frishkopf Lawrence S.
Frost Daniel A.
Girzon Gary
Goldberg R. F.
Grant Kenneth W.
Guinan John J., Jr.
Ito Yoshiko
Ketten Darlene R.
Kiang Nelson Y-S.
Kidd Robert C.
Kline Gary
Kobler James B.
Koehnke Janet A.
Leotta Daniel F.
Luongo E. M.
Machado M. E.
Macmillan Neil A.
McCue Michael P.
Melcher J. R.
Pang Xiao-Dong
Passaro Carrin
Payton Karen L.
Peake William T.
Peterson Patrick M.
Phillips Susan L.
Power Matthew H.
Rabinowitz William M.
Reed Charlotte M.
Rosowski John J.
Schneider B.
Siebert William M.
Stefanov-Wagner Frank J.
Steffens D. A.
Stephens L.
Uchanski Rosalie M.
Weiss Thomas F.
Zue Victor W.
Zurek Patrick M.
Publication venue: Research Laboratory of Electronics (RLE) at the Massachusetts Institute of Technology (MIT)
Publication date: 01/01/1987
Field of study

Contains reports on six research projects.National Institutes of Health (Grant 5 PO1 NS13126)National Institutes of Health (Grant 5 RO1 NS18682)National Institutes of Health (Grant 5 RO1 NS20322)National Institutes of Health (Grant 5 R01 NS20269)National Institutes of Health (Grant 5 T32NS 07047)Symbion, Inc.National Science Foundation (Grant BNS 83-19874)National Science Foundation (Grant BNS 83-19887)National Institutes of Health (Grant 6 RO1 NS 12846)National Institutes of Health (Grant 1 RO1 NS 21322

DSpace@MIT

Determination of articulatory parameters from speech waveforms

Author: Rogers John Albert Victor
Rogers John Albert Victor
Publication venue: Department of Electrical Engineering, Imperial College London
Publication date: 01/01/1974
Field of study

Imperial Users onl

Spiral - Imperial College Digital Repository

Howlback suppression in loudspeaking telephony by combfiltering

Author: de Jager A.C.S.
Publication venue
Publication date: 01/01/1980
Field of study

Repository TU/e

Pure OAI Repository