317 research outputs found

    Adaptive Variable Degree-k Zero-Trees for Re-Encoding of Perceptually Quantized Wavelet-Packet Transformed Audio and High Quality Speech

    Full text link
    A fast, efficient and scalable algorithm is proposed, in this paper, for re-encoding of perceptually quantized wavelet-packet transform (WPT) coefficients of audio and high quality speech and is called "adaptive variable degree-k zero-trees" (AVDZ). The quantization process is carried out by taking into account some basic perceptual considerations, and achieves good subjective quality with low complexity. The performance of the proposed AVDZ algorithm is compared with two other zero-tree-based schemes comprising: 1- Embedded Zero-tree Wavelet (EZW) and 2- The set partitioning in hierarchical trees (SPIHT). Since EZW and SPIHT are designed for image compression, some modifications are incorporated in these schemes for their better matching to audio signals. It is shown that the proposed modifications can improve their performance by about 15-25%. Furthermore, it is concluded that the proposed AVDZ algorithm outperforms these modified versions in terms of both output average bit-rates and computation times.Comment: 30 pages (Double space), 15 figures, 5 tables, ISRN Signal Processing (in Press

    ENHANCEMENT OF SPEECH INTELLIGIBILITY USING SPEECH TRANSIENTS EXTRACTED BY A WAVELET PACKET-BASED REAL-TIME ALGORITHM

    Get PDF
    Studies have shown that transient speech, which is associated with consonants, transitions between consonants and vowels, and transitions within some vowels, is an important cue for identifying and discriminating speech sounds. However, compared to the relatively steady-state vowel segments of speech, transient speech has much lower energy and thus is easily masked by background noise. Emphasis of transient speech can improve the intelligibility of speech in background noise, but methods to demonstrate this improvement have either identified transient speech manually or proposed algorithms that cannot be implemented to run in real-time.We have developed an algorithm to automatically extract transient speech in real-time. The algorithm involves the use of a function, which we term the transitivity function, to characterize the rate of change of wavelet coefficients of a wavelet packet transform representation of a speech signal. The transitivity function is large and positive when a signal is changing rapidly and small when a signal is in steady state. Two different definitions of the transitivity function, one based on the short-time energy and the other on Mel-frequency cepstral coefficients, were evaluated experimentally, and the MFCC-based transitivity function produced better results. The extracted transient speech signal is used to create modified speech by combining it with original speech.To facilitate comparison of our transient and modified speech to speech processed using methods proposed by other researcher to emphasize transients, we developed three indices. The indices are used to characterize the extent to which a speech modification/processing method emphasizes (1) a particular region of speech, (2) consonants relative to, and (3) onsets and offsets of formants compared to steady formant. These indices are very useful because they quantify differences in speech signals that are difficult to show using spectrograms, spectra and time-domain waveforms.The transient extraction algorithm includes parameters which when varied influence the intelligibility of the extracted transient speech. The best values for these parameters were selected using psycho-acoustic testing. Measurements of speech intelligibility in background noise using psycho-acoustic testing showed that modified speech was more intelligible than original speech, especially at high noise levels (-20 and -15 dB). The incorporation of a method that automatically identifies and boosts unvoiced speech into the algorithm was evaluated and showed that this method does not result in additional speech intelligibility improvements

    A Parametric Sound Object Model for Sound Texture Synthesis

    Get PDF
    This thesis deals with the analysis and synthesis of sound textures based on parametric sound objects. An overview is provided about the acoustic and perceptual principles of textural acoustic scenes, and technical challenges for analysis and synthesis are considered. Four essential processing steps for sound texture analysis are identifi ed, and existing sound texture systems are reviewed, using the four-step model as a guideline. A theoretical framework for analysis and synthesis is proposed. A parametric sound object synthesis (PSOS) model is introduced, which is able to describe individual recorded sounds through a fi xed set of parameters. The model, which applies to harmonic and noisy sounds, is an extension of spectral modeling and uses spline curves to approximate spectral envelopes, as well as the evolution of parameters over time. In contrast to standard spectral modeling techniques, this representation uses the concept of objects instead of concatenated frames, and it provides a direct mapping between sounds of diff erent length. Methods for automatic and manual conversion are shown. An evaluation is presented in which the ability of the model to encode a wide range of di fferent sounds has been examined. Although there are aspects of sounds that the model cannot accurately capture, such as polyphony and certain types of fast modulation, the results indicate that high quality synthesis can be achieved for many different acoustic phenomena, including instruments and animal vocalizations. In contrast to many other forms of sound encoding, the parametric model facilitates various techniques of machine learning and intelligent processing, including sound clustering and principal component analysis. Strengths and weaknesses of the proposed method are reviewed, and possibilities for future development are discussed

    Offline and real time noise reduction in speech signals using the discrete wavelet packet decomposition

    Get PDF
    This thesis describes the development of an offline and real time wavelet based speech enhancement system to process speech corrupted with various amounts of white Gaussian noise and other different noise types

    Linear and nonlinear adaptive filtering and their applications to speech intelligibility enhancement

    Get PDF

    Single-Microphone Speech Enhancement Inspired by Auditory System

    Get PDF
    Enhancing quality of speech in noisy environments has been an active area of research due to the abundance of applications dealing with human voice and dependence of their performance on this quality. While original approaches in the field were mostly addressing this problem in a pure statistical framework in which the goal was to estimate speech from its sum with other independent processes (noise), during last decade, the attention of the scientific community has turned to the functionality of human auditory system. A lot of effort has been put to bridge the gap between the performance of speech processing algorithms and that of average human by borrowing the models suggested for the sound processing in the auditory system. In this thesis, we will introduce algorithms for speech enhancement inspired by two of these models i.e. the cortical representation of sounds and the hypothesized role of temporal coherence in the auditory scene analysis. After an introduction to the auditory system and the speech enhancement framework we will first show how traditional speech enhancement technics such as wiener-filtering can benefit on the feature extraction level from discriminatory capabilities of spectro-temporal representation of sounds in the cortex i.e. the cortical model. We will next focus on the feature processing as opposed to the extraction stage in the speech enhancement systems by taking advantage of models hypothesized for human attention for sound segregation. We demonstrate a mask-based enhancement method in which the temporal coherence of features is used as a criterion to elicit information about their sources and more specifically to form the masks needed to suppress the noise. Lastly, we explore how the two blocks for feature extraction and manipulation can be merged into one in a manner consistent with our knowledge about auditory system. We will do this through the use of regularized non-negative matrix factorization to optimize the feature extraction and simultaneously account for temporal dynamics to separate noise from speech

    Acoustical measurements on stages of nine U.S. concert halls

    Get PDF

    Wavelet Filter Banks in Perceptual Audio Coding

    Get PDF
    This thesis studies the application of the wavelet filter bank (WFB) in perceptual audio coding by providing brief overviews of perceptual coding, psychoacoustics, wavelet theory, and existing wavelet coding algorithms. Furthermore, it describes the poor frequency localization property of the WFB and explores one filter design method, in particular, for improving channel separation between the wavelet bands. A wavelet audio coder has also been developed by the author to test the new filters. Preliminary tests indicate that the new filters provide some improvement over other wavelet filters when coding audio signals that are stationary-like and contain only a few harmonic components, and similar results for other types of audio signals that contain many spectral and temporal components. It has been found that the WFB provides a flexible decomposition scheme through the choice of the tree structure and basis filter, but at the cost of poor localization properties. This flexibility can be a benefit in the context of audio coding but the poor localization properties represent a drawback. Determining ways to fully utilize this flexibility, while minimizing the effects of poor time-frequency localization, is an area that is still very much open for research
    • …
    corecore