149 research outputs found
Frequency Domain Methods for Coding the Linear Predictive Residual of Speech Signals
The most frequently used speech coding paradigm is ACELP, famous because it encodes speech with high quality, while consuming a small bandwidth. ACELP performs linear prediction filtering in order to eliminate the effect of the spectral envelope from the signal. The noise-like excitation is then encoded using algebraic codebooks. The search of this codebook, however, can not be performed optimally with conventional encoders due to the correlation between their samples. Because of this, more complex algorithms are required in order to maintain the quality. Four different transformation algorithms have been implemented (DCT, DFT, Eigenvalue decomposition and Vandermonde decomposition) in order to decorrelate the samples of the innovative excitation in ACELP. These transformations have been integrated in the ACELP of the EVS codec. The transformed innovative excitation is coded using the envelope based arithmetic coder. Objective and subjective tests have been carried out to evaluate the quality of the encoding, the degree of decorrelation achieved by the transformations and the computational complexity of the algorithms
Digital Signal Processing
Contains an introduction and reports on twenty research projects.National Science Foundation (Grant ECS 84-07285)U.S. Navy - Office of Naval Research (Contract N00014-81-K-0742)National Science Foundation FellowshipSanders Associates, Inc.U.S. Air Force - Office of Scientific Research (Contract F19628-85-K-0028)Canada, Bell Northern Research ScholarshipCanada, Fonds pour la Formation de Chercheurs et l'Aide a la Recherche Postgraduate FellowshipCanada, Natural Science and Engineering Research Council Postgraduate FellowshipU.S. Navy - Office of Naval Research (Contract N00014-81-K-0472)Fanny and John Hertz Foundation FellowshipCenter for Advanced Television StudiesAmoco Foundation FellowshipU.S. Air Force - Office of Scientific Research (Contract F19628-85-K-0028
Speaker-dependent speech coding
Cataloged from PDF version of article.With the rapid expansion of Internet, it became feasible to have low-cost and
secure telephone calls via internet. New digital speech compression standards
were developed. Digital speech codecs can be used both in regular telephone
networks and Internet based systems. Thereby, for a secure call, speech data are
firstly compressed by digital speech codecs, and then, these compressed
packages are sent in an encoded way through Data Encryption Standard
(3DES) [1], Advanced Encryption Standard (AES) [2] encoding methods.
Compressing and encoding processes require high processor performance and
they may even require the use of high frequency processors and DSP’s to
encode the binary speech data. Current speech coders are speaker independent,
i.e., they don’t perform any speaker specific operations. They do not even
distinguish between male and female speakers. An interesting way to solve this
problem is to send speech after encoding it with a system that is based on a
specific user. This system, which can be also called as speaker dependent speech
encoding, provides a computationally efficient and relatively secure VoIP call,
with high quality and without any encoding compared to the same bit rate
standard speech codecs. Despite the disadvantages of requirement of acquiring
all the speech characteristics of users and the need for extra data space, it has
advantages such as providing secure communication because speech
characteristics of the speaker is unknown to other users and the synthesized
speech has higher quality compared to a same bit rate LPC compressed speech.Kart, Mete KemalM.S
Recommended from our members
Speech coding
Speech is the predominant means of communication between human beings and since the invention of the telephone by Alexander Graham Bell in 1876, speech services have remained to be the core service in almost all telecommunication systems. Original analog methods of telephony had the disadvantage of speech signal getting corrupted by noise, cross-talk and distortion Long haul transmissions which use repeaters to compensate for the loss in signal strength on transmission links also increase the associated noise and distortion. On the other hand digital transmission is relatively immune to noise, cross-talk and distortion primarily because of the capability to faithfully regenerate digital signal at each repeater purely based on a binary decision. Hence end-to-end performance of the digital link essentially becomes independent of the length and operating frequency bands of the link Hence from a transmission point of view digital transmission has been the preferred approach due to its higher immunity to noise. The need to carry digital speech became extremely important from a service provision point of view as well. Modem requirements have introduced the need for robust, flexible and secure services that can carry a multitude of signal types (such as voice, data and video) without a fundamental change in infrastructure. Such a requirement could not have been easily met without the advent of digital transmission systems, thereby requiring speech to be coded digitally. The term Speech Coding is often referred to techniques that represent or code speech signals either directly as a waveform or as a set of parameters by analyzing the speech signal. In either case, the codes are transmitted to the distant end where speech is reconstructed or synthesized using the received set of codes. A more generic term that is applicable to these techniques that is often interchangeably used with speech coding is the term voice coding. This term is more generic in the sense that the coding techniques are equally applicable to any voice signal whether or not it carries any intelligible information, as the term speech implies. Other terms that are commonly used are speech compression and voice compression since the fundamental idea behind speech coding is to reduce (compress) the transmission rate (or equivalently the bandwidth) And/or reduce storage requirements In this document the terms speech and voice shall be used interchangeably
Novel Pitch Detection Algorithm With Application to Speech Coding
This thesis introduces a novel method for accurate pitch detection and speech segmentation, named Multi-feature, Autocorrelation (ACR) and Wavelet Technique (MAWT). MAWT uses feature extraction, and ACR applied on Linear Predictive Coding (LPC) residuals, with a wavelet-based refinement step. MAWT opens the way for a unique approach to modeling: although speech is divided into segments, the success of voicing decisions is not crucial. Experiments demonstrate the superiority of MAWT in pitch period detection accuracy over existing methods, and illustrate its advantages for speech segmentation. These advantages are more pronounced for gain-varying and transitional speech, and under noisy conditions
Recommended from our members
Stable 1-Norm Error Minimization Based Linear Predictors for Speech Modeling
In linear prediction of speech, the 1-norm error minimization criterion has been shown to provide a valid alternative to the 2-norm minimization criterion. However, unlike 2-norm minimization, 1-norm minimization does not guarantee the stability of the corresponding all-pole filter and can generate saturations when this is used to synthesize speech. In this paper, we introduce two new methods to obtain intrinsically stable predictors with the 1-norm minimization. The first method is based on constraining the roots of the predictor to lie within the unit circle by reducing the numerical range of the shift operator associated with the particular prediction problem considered. The second method uses the alternative Cauchy bound to impose a convex constraint on the predictor in the 1-norm error minimization. These methods are compared with two existing methods: the Burg method, based on the 1-norm minimization of the forward and backward prediction error, and the iteratively reweighted 2-norm minimization known to converge to the 1-norm minimization with an appropriate selection of weights. The evaluation gives proof of the effectiveness of the new methods, performing as well as unconstrained 1-norm based linear prediction for modeling and coding of speech
Frequency-domain bandwidth extension for low-delay audio coding applications
MPEG-4 Spectral Band Replication (SBR) is a sophisticated high-frequency reconstruction (HFR) tool for speech and natural audio which when used in conjunction with an audio codec delivers a broadband high-quality signal at a bit rate of 48 kbps or even below. The major drawback of this technique is that it significantly increases the delay of the underlying core codec. The idea of synthetic signal reconstruction is of particular interest also in real-time communications. There, a HFR method can be employed to further loosen the channel capacity requirements. In this thesis a delay-optimized derivative of SBR is elaborated, which can be used together with a low-delay speech and audio coder like the Fraunhofer ULD. The presented approach is based on a short-time subband representation of an acoustic signal of natural or artificial origin, and as such it utilizes a filter bank for the extraction and the manipulation of sound characteristics. The system delay for a combination of the ULD coder with the proposed low-delay bandwidth extension (LD-BWE) tool adds up to 12 ms at a sampling rate of 48 kHz. At the present stage, LD-BWE generates a subjectively confirmed excellent-quality highband replica at a simulated mean data rate of 12.8 kbps.MPEG-4 Spectral Band Replication (SBR) ist ein technisch ausgereiftes Verfahren zur Rückgewinnung von hochfrequenten Signalkomponenten für Sprache und natürliches Audio, das in Verbindung mit einem Audiocodec angewandt ein hochwertiges Breitbandsignal bei einer Bitrate von nicht mehr als 48 kbps liefert. Ein wesentlicher Nachteil dieser Methode ist, dass sie die Zeitverzögerung des darunter liegenden Kerncodecs maßgeblich vergrößert. Die Idee der synthetischen Signalwiederherstellung ist in Echtzeitkommunikation ebenso von besonderem Interesse. Ein derartiges Verfahren könnte dort eingesetzt werden, um die Anforderungen an die Kanalkapazität weiter zu lockern. In dieser Arbeit wird ein latenzoptimiertes Derivat von SBR ausgearbeitet, welches zusammen mit einem minimal verzögernden Sprach- und Audiocoder, wie dem Fraunhofer ULD, verwendet werden kann. Der vorgestellte Ansatz basiert auf einer Kurzzeit-Teilband-Darstellung eines akustischen Signals natürlichen oder künstlichen Ursprungs, und greift als solcher auf eine Filterbank zur Extraktion und Manipulation von Klangcharakteristika zurück. Die Verzögerungszeit des Gesamtsystems bestehend aus dem ULD-Coder und der vorgeschlagenen Bandbreitenerweiterung beläuft sich bei einer Abtastrate von 48 kHz auf 12 ms. Einem subjektiven Hörtest zufolge, erzeugt die neu entwickelte Bandbreitenerweiterung in ihrem derzeitigen Stadium eine Kopie des Hochbandes von hervorragender Qualität bei einer simulierten mittleren Datenrate von 12.8 kbps.Ilmenau, Techn. Univ., Masterarbeit, 201
- …