1,287 research outputs found

    A variable rate speech compressor for mobile applications

    Get PDF
    One of the most promising speech coder at the bit rate of 9.6 to 4.8 kbits/s is CELP. Code Excited Linear Prediction (CELP) has been dominating 9.6 to 4.8 kbits/s region during the past 3 to 4 years. Its set back however, is its expensive implementation. As an alternative to CELP, the Base-Band CELP (CELP-BB) was developed which produced good quality speech comparable to CELP and a single chip implementable complexity as reported previously. Its robustness was also improved to tolerate errors up to 1.0 pct. and maintain intelligibility up to 5.0 pct. and more. Although, CELP-BB produces good quality speech at around 4.8 kbits/s, it has a fundamental problem when updating the pitch filter memory. A sub-optimal solution is proposed for this problem. Below 4.8 kbits/s, however, CELP-BB suffers from noticeable quantization noise as a result of the large vector dimensions used. Efficient representation of speech below 4.8 kbits/s is reported by introducing Sinusoidal Transform Coding (STC) to represent the LPC excitation which is called Sine Wave Excited LPC (SWELP). In this case, natural sounding good quality synthetic speech is obtained at around 2.4 kbits/s

    Real-Time Implementation Of LPC-10 Codec On TMS320C6713 DSP

    Get PDF
    During last two decades various speech coding algorithms have been developed. The range of toll speech frequency is from 300 Hz- 3400 Hz. Generally, human speech signal could be classified as non-stationary signal because of its fluctuation randomly over the time axis. One important assumption made to make the analysis of such signal even easier by assuming the speech signal is quasi-stationary over short range (frame). The frames of speech signal can be classified further into Voiced or Unvoiced, where the voiced part is quasi-stationary while the unvoiced part as an AWGN. The quality of the synthesized signal is degraded significantly due to the excitation of voiced part not equally spaced within the frame and the excitation of the unvoiced part is not exact AWGN. This assumption produced a non-natural speech signal but with high intelligible level. One more reason is that the frame could have voiced plus unvoiced parts within the same frame, and by classifying this frame as voiced or unvoiced due to rigid decision would drop the level of quality significantly. Speech compression commonly referred to as speech coding, where the amount of redundancies is reduced, and represent the speech signal by set of parameters in order to have very low bit rates. One of these speech coding algorithms is linear predictive coding (LPC-10). This thesis implements LPC-10 analysis and synthesis using Matlab and C coding. LPC-10 have been compared with some other speech compression algorithms like pulse code modulation (PCM), differential pulse code modulation (DPCM), and code excited linear prediction coding (CELP), in term of segmental signal to quantization noise ratio SEG-SQNR and mean squared error MSE using Matlab simulation. The focus on LPC-10 was implemented on the DSP board TMS320C6713 to test the LPC-10 algorithm in realtime. Real-time implementation on TMS320C6713 DSP board required to convert the Matlab script into C code on the DSP Board. Upon successfully completion, comparison of the results using TMS320C6713 DSP against the simulated results using Matlab in both graphical and tabular forms were made

    Audio Analysis/synthesis System

    Get PDF
    A method and apparatus for the automatic analysis, synthesis and modification of audio signals, based on an overlap-add sinusoidal model, is disclosed. Automatic analysis of amplitude, frequency and phase parameters of the model is achieved using an analysis-by-synthesis procedure which incorporates successive approximation, yielding synthetic waveforms which are very good approximations to the original waveforms and are perceptually identical to the original sounds. A generalized overlap-add sinusoidal model is introduced which can modify audio signals without objectionable artifacts. In addition, a new approach to pitch-scale modification allows for the use of arbitrary spectral envelope estimates and addresses the problems of high-frequency loss and noise amplification encountered with prior art methods. The overlap-add synthesis method provides the ability to synthesize sounds with computational efficiency rivaling that of synthesis using the discrete short-time Fourier transform (DSTFT) while eliminating the modification artifacts associated with that method.Georgia Tech Research Corporatio

    Inverse Filtering Techniques in Speech Analysis

    Get PDF
    This paper reviews certain speech analytical techniques to which the label 'inverse filtering' has been applied. The unifying features of these techniques are presented, namely:1. a basis in the source-filter theory of speech production,2. the use of a network whose transfer function is the inverse of the transfer function of one or a combination of the articulatory system filters to modify the speech wave either in the time domain or in the frequency domain.However their differences, which lie in the particular system filter being inverted and in the manner of realisation. provide a basis for the classification adopted in the paper which is as follows: (1) inverse vocal tract analogue filtering. (2) inverse vocal tract digital filtering. (3) direct inverse glottal filtering. (4) linear predictive coding. An assessment of the comparative usefulness of inverse-filtering in contemporary speech studies is given

    Comparison of Wideband Earpiece Integrations in Mobile Phone

    Get PDF
    Perinteisesti puhelinverkoissa välitettävä puhe on ollut kapeakaistaista, kaistan ollessa 300 - 3400 Hz. Voidaan kuitenkin olettaa, että laajakaistaiset puhepalvelut tulevat saamaan markkinoilla enemmän jalansijaa tulevina vuosina. Tässä lopputyössä esitellään puheenkoodauksen perusteet laajakaistaisen adaptiivisen moninopeuspuhekoodekin (AMR-WB) kanssa. Laajakaistainen puhekoodekki laajentaa puhekaistan 50-7000 Hz käyttäen 16 kHz näytetaajuutta. Käytännössä laajempi kaista tarkoittaa parannuksia puheen ymmärrettävyyteen ja tekee siitä luonnollisemman ja mukavamman kuuloista. Tämän lopputyön päätavoite on vertailla kahden eri laajakaistaisen matkapuhelinkuulokkeen integrointia. Kysymys kuuluu, kuinka paljon käyttäjä hyötyy isommasta kuulokkeesta matkapuhelimessa? Kuulokkeiden suorituskyvyn selvittämiseksi niille tehtiin objektiivisia mittauksia vapaakentässä. Mittauksia tehtiin myös puhelimelle pää- ja torsosimulaattorissa (HATS) johdottamalla kuuloke suoraan vahvistimelle, sekä lisäksi puhelun ollessa aktiivisena GSM ja WCDMA verkoissa. Objektiiviset mittaukset osoittivat kahden eri integroinnin väliset erot kuulokkeiden taajuusvasteessa ja särössä erityisesti matalilla taajuuksilla. Lopuksi tehtiin kuuntelukoe tarkoituksena selvittää erottaako loppukäyttäjä pienemmän ja isomman kuulokkeen välistä eroa käyttäen kapeakaistaisia ja laajakaistaisia puhelinääninäytteitä. Kuuntelukokeen tuloksien pohjalta voidaan sanoa, että käyttäjä erottaa kahden eri integroinnin erot ja miespuhuja hyötyy naispuhujaa enemmän isommasta kuulokkeesta laajakaistaisella puhekoodekilla.The speech in telecommunication networks has been traditionally narrowband ranging from 300 Hz to 3400 Hz. It can be expected that wideband speech call services will increase their foothold in the markets during the coming years. In this thesis speech coding basics with adaptive multirate wideband (AMR-WB) are introduced. The wideband codec widens the speech band to new range from 50 Hz to 7000 Hz using 16 kHz sampling frequency. In practice the wider band means improvements to speech intelligibility and makes it more natural and comfortable to listen to. The main focus of this thesis work is to compare two different wideband earpiece integrations. The question is how much the end-user will benefit from using a larger earpiece in a mobile phone? To find out speaker performance, objective measurements in free field were done for the earpiece modules. Measurements were performed also for the phone on head and torso simulator (HATS) by wiring the earpieces directly to a power amplifier and with over the air on GSM and WCDMA networks. The results of objective measurements showed differences between the earpiece integrations especially on low frequencies in frequency response and distortion. Finally the subjective listening test is done for comparison to see if the end-user notices the difference between smaller and larger earpiece integrations using narrowband and wideband speech samples. Based on these subjective test results it can be said that the user can differentiate between two different integrations and that a male speaker benefits more from a larger earpiece than a female speaker

    Speech coding at medium bit rates using analysis by synthesis techniques

    Get PDF
    Speech coding at medium bit rates using analysis by synthesis technique

    Comparison of CELP speech coder with a wavelet method

    Get PDF
    This thesis compares the speech quality of Code Excited Linear Predictor (CELP, Federal Standard 1016) speech coder with a new wavelet method to compress speech. The performances of both are compared by performing subjective listening tests. The test signals used are clean signals (i.e. with no background noise), speech signals with room noise and speech signals with artificial noise added. Results indicate that for clean signals and signals with predominantly voiced components the CELP standard performs better than the wavelet method but for signals with room noise the wavelet method performs much better than the CELP. For signals with artificial noise added, the results are mixed depending on the level of artificial noise added with CELP performing better for low level noise added signals and the wavelet method performing better for higher noise levels

    Postfiltering techniques in low bit-rate speech coders

    Get PDF
    Thesis (M.Eng.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 1999.Includes bibliographical references (leaves 78-80).by Azhar K. Mustapha.M.Eng

    Software and hardware implementation techniques for digital communications-related algorithms

    Get PDF
    There are essentially three areas addressed in the body of this thesis. (a) The first is a theoretical investigation into the design and development of a practically realizable implementation of a maximum-likelihood detection process to deal with digital data transmission over HF radio links. These links exhibit multipath properties with delay spreads that can easily extend over 12 to 15 milliseconds. The project was sponsored by the Ministry of Defence through the auspices of the Science and Engineering Research Council. The primary objective was to transmit voice band data at a minimum rate of 2.4 kb/s continuously for long periods of time during the day or night. Computer simulation models of HF propagation channels were created to simulate atmospheric and multipath effects of transmission from London to Washington DC, Ankara, and as far as Melbourne, Australia. Investigations into HF channel estimation are not the subject of this thesis. The detection process assumed accurate knowledge of the channel. [Continues.
    corecore