49 research outputs found
Apparatus And Quality Enhancement Algorithm For Mixed Excitation Linear Predictive (MELP) And Other Speech Coders
A system and method for enhancing the speech quality of the mixed excitation linear predictive (MELP) coder and other low bit-rate speech coders. The system and method employ a plosive analysis/synthesis method, which detects the frame containing a plosive signal, applies a simple model to synthesize the plosive signal, and adds the synthesized plosive to the coded speech. The system and method remains compatible with the existing MELP coder bit stream.Georgia-tech Research Corporatio
IMPROVING THE AUTOMATIC RECOGNITION OF DISTORTED SPEECH
Automatic speech recognition has a wide variety of uses in this technological age, yet speech distortions present many difficulties for accurate recognition. The research presented provides solutions that counter the detrimental effects that some distortions have on the accuracy of automatic speech recognition. Two types of speech distortions are focused on independently. They are distortions due to speech coding and distortions due to additive noise. Compensations for both types of distortion resulted in decreased recognition error.Distortions due to the speech coding process are countered through recognition of the speech directly from the bitstream, thus eliminating the need for reconstruction of the speech signal and eliminating the distortion caused by it. There is a relative difference of 6.7% between the recognition error rate of uncoded speech and that of speech reconstructed from MELP encoded parameters. The relative difference between the recognition error rate for uncoded speech and that of encoded speech recognized directly from the MELP bitstream is 3.5%. This 3.2 percentage point difference is equivalent to the accurate recognition of an additional 334 words from the 12,863 words spoken.Distortions due to noise are offset through appropriate modification of an existing noise reduction technique called minimum mean-square error log spectral amplitude enhancement. A relative difference of 28% exists between the recognition error rate of clean speech and that of speech with additive noise. Applying a speech enhancement front-end reduced this difference to 22.2%. This 5.8 percentage point difference is equivalent to the accurate recognition of an additional 540 words from the 12,863 words spoken
Advanced signal processing techniques for pitch synchronous sinusoidal speech coders
Recent trends in commercial and consumer demand have led to the increasing use of multimedia applications in mobile and Internet telephony. Although audio, video and data communications are becoming more prevalent, a major application is and will remain the transmission of speech. Speech coding techniques suited to these new trends must be developed, not only to provide high quality speech communication but also to minimise the required bandwidth for speech, so as to maximise that available for the new audio, video and data services. The majority of current speech coders employed in mobile and Internet applications employ a Code Excited Linear Prediction (CELP) model. These coders attempt to reproduce the input speech signal and can produce high quality synthetic speech at bit rates above 8 kbps. Sinusoidal speech coders tend to dominate at rates below 6 kbps but due to limitations in the sinusoidal speech coding model, their synthetic speech quality cannot be significantly improved even if their bit rate is increased. Recent developments have seen the emergence and application of Pitch Synchronous (PS) speech coding techniques to these coders in order to remove the limitations of the sinusoidal speech coding model. The aim of the research presented in this thesis is to investigate and eliminate the factors that limit the quality of the synthetic speech produced by PS sinusoidal coders. In order to achieve this innovative signal processing techniques have been developed. New parameter analysis and quantisation techniques have been produced which overcome many of the problems associated with applying PS techniques to sinusoidal coders. In sinusoidal based coders, two of the most important elements are the correct formulation of pitch and voicing values from the' input speech. The techniques introduced here have greatly improved these calculations resulting in a higher quality PS sinusoidal speech coder than was previously available. A new quantisation method which is able to reduce the distortion from quantising speech spectral information has also been developed. When these new techniques are utilised they effectively raise the synthetic speech quality of sinusoidal coders to a level comparable to that produced by CELP based schemes, making PS sinusoidal coders a promising alternative at low to medium bit rates.EThOS - Electronic Theses Online ServiceGBUnited Kingdo
Structure-Constrained Basis Pursuit for Compressively Sensing Speech
Compressed Sensing (CS) exploits the sparsity of many signals to enable sampling below the Nyquist rate. If the original signal is sufficiently sparse, the Basis Pursuit (BP) algorithm will perfectly reconstruct the original signal. Unfortunately many signals that intuitively appear sparse do not meet the threshold for sufficient sparsity . These signals require so many CS samples for accurate reconstruction that the advantages of CS disappear. This is because Basis Pursuit/Basis Pursuit Denoising only models sparsity. We developed a Structure-Constrained Basis Pursuit that models the structure of somewhat sparse signals as upper and lower bound constraints on the Basis Pursuit Denoising solution. We applied it to speech, which seems sparse but does not compress well with CS, and gained improved quality over Basis Pursuit Denoising. When a single parameter (i.e. the phone) is encoded, Normalized Mean Squared Error (NMSE) decreases by between 16.2% and 1.00% when sampling with CS between 1/10 and 1/2 the Nyquist rate, respectively. When bounds are coded as a sum of Gaussians, NMSE decreases between 28.5% and 21.6% in the same range. SCBP can be applied to any somewhat sparse signal with a predictable structure to enable improved reconstruction quality with the same number of samples
A Speech Intelligibility Estimation Method Based on Hidden Markov Model
This paper proposes a speech intelligibility estimation method based on hidden Markov model (HMM) that is widely used for speech recognition. The HMM-based method is a kind of non-intrusive speech quality measurement, which means it operates without a reference speech signal. The log-likelihood score of HMM is converted to a normalized intelligibility score. We estimate the speech intelligibility of standard digital speech coders. The experimental results show that the proposed HMM-based method gives improved performance than the conventional non-intrusive speech intelligibility evaluation tool
Speech spectrum non-stationarity detection based on line spectrum frequencies and related applications
Ankara : Department of Electrical and Electronics Engineering and The Institute of Engineering and Sciences of Bilkent University, 1998.Thesis (Master's) -- Bilkent University, 1998.Includes bibliographical references leaves 124-132In this thesis, two new speech variation measures for speech spectrum nonstationarity
detection are proposed. These measures are based on the Line
Spectrum Frequencies (LSF) and the spectral values at the LSF locations.
They are formulated to be subjectively meaningful, mathematically tractable,
and also have low computational complexity property. In order to demonstrate
the usefulness of the non-stationarity detector, two applications are presented:
The first application is an implicit speech segmentation system which detects
non-stationary regions in speech signal and obtains the boundaries of the speech
segments. The other application is a Variable Bit-Rate Mixed Excitation Linear
Predictive (VBR-MELP) vocoder utilizing a novel voice activity detector
to detect silent regions in the speech. This voice activity detector is designed
to be robust to non-stationary background noise and provides efficient coding
of silent sections and unvoiced utterances to decrease the bit-rate. Simulation
results are also presented.Ertan, Ali ErdemM.S
Speaker-dependent speech coding
Cataloged from PDF version of article.With the rapid expansion of Internet, it became feasible to have low-cost and
secure telephone calls via internet. New digital speech compression standards
were developed. Digital speech codecs can be used both in regular telephone
networks and Internet based systems. Thereby, for a secure call, speech data are
firstly compressed by digital speech codecs, and then, these compressed
packages are sent in an encoded way through Data Encryption Standard
(3DES) [1], Advanced Encryption Standard (AES) [2] encoding methods.
Compressing and encoding processes require high processor performance and
they may even require the use of high frequency processors and DSP’s to
encode the binary speech data. Current speech coders are speaker independent,
i.e., they don’t perform any speaker specific operations. They do not even
distinguish between male and female speakers. An interesting way to solve this
problem is to send speech after encoding it with a system that is based on a
specific user. This system, which can be also called as speaker dependent speech
encoding, provides a computationally efficient and relatively secure VoIP call,
with high quality and without any encoding compared to the same bit rate
standard speech codecs. Despite the disadvantages of requirement of acquiring
all the speech characteristics of users and the need for extra data space, it has
advantages such as providing secure communication because speech
characteristics of the speaker is unknown to other users and the synthesized
speech has higher quality compared to a same bit rate LPC compressed speech.Kart, Mete KemalM.S
Recommended from our members
Speech coding
Speech is the predominant means of communication between human beings and since the invention of the telephone by Alexander Graham Bell in 1876, speech services have remained to be the core service in almost all telecommunication systems. Original analog methods of telephony had the disadvantage of speech signal getting corrupted by noise, cross-talk and distortion Long haul transmissions which use repeaters to compensate for the loss in signal strength on transmission links also increase the associated noise and distortion. On the other hand digital transmission is relatively immune to noise, cross-talk and distortion primarily because of the capability to faithfully regenerate digital signal at each repeater purely based on a binary decision. Hence end-to-end performance of the digital link essentially becomes independent of the length and operating frequency bands of the link Hence from a transmission point of view digital transmission has been the preferred approach due to its higher immunity to noise. The need to carry digital speech became extremely important from a service provision point of view as well. Modem requirements have introduced the need for robust, flexible and secure services that can carry a multitude of signal types (such as voice, data and video) without a fundamental change in infrastructure. Such a requirement could not have been easily met without the advent of digital transmission systems, thereby requiring speech to be coded digitally. The term Speech Coding is often referred to techniques that represent or code speech signals either directly as a waveform or as a set of parameters by analyzing the speech signal. In either case, the codes are transmitted to the distant end where speech is reconstructed or synthesized using the received set of codes. A more generic term that is applicable to these techniques that is often interchangeably used with speech coding is the term voice coding. This term is more generic in the sense that the coding techniques are equally applicable to any voice signal whether or not it carries any intelligible information, as the term speech implies. Other terms that are commonly used are speech compression and voice compression since the fundamental idea behind speech coding is to reduce (compress) the transmission rate (or equivalently the bandwidth) And/or reduce storage requirements In this document the terms speech and voice shall be used interchangeably