469 research outputs found

    Scalable Speech Coding for IP Networks

    Get PDF
    The emergence of Voice over Internet Protocol (VoIP) has posed new challenges to the development of speech codecs. The key issue of transporting real-time voice packet over IP networks is the lack of guarantee for reasonable speech quality due to packet delay or loss. Most of the widely used narrowband codecs depend on the Code Excited Linear Prediction (CELP) coding technique. The CELP technique utilizes the long-term prediction across the frame boundaries and therefore causes error propagation in the case of packet loss and need to transmit redundant information in order to mitigate the problem. The internet Low Bit-rate Codec (iLBC) employs the frame-independent coding and therefore inherently possesses high robustness to packet loss. However, the original iLBC lacks in some of the key features of speech codecs for IP networks: Rate flexibility, Scalability, and Wideband support. This dissertation presents novel scalable narrowband and wideband speech codecs for IP networks using the frame independent coding scheme based on the iLBC. The rate flexibility is added to the iLBC by employing the discrete cosine transform (DCT) and iii the scalable algebraic vector quantization (AVQ) and by allocating different number of bits to the AVQ. The bit-rate scalability is obtained by adding the enhancement layer to the core layer of the multi-rate iLBC. The enhancement layer encodes the weighted iLBC coding error in the modified DCT (MDCT) domain. The proposed wideband codec employs the bandwidth extension technique to extend the capabilities of existing narrowband codecs to provide wideband coding functionality. The wavelet transform is also used to further enhance the performance of the proposed codec. The performance evaluation results show that the proposed codec provides high robustness to packet loss and achieves equivalent or higher speech quality than state-of-the-art codecs under the clean channel condition

    Canonical time-frequency, time-scale, and frequency-scale representations of time-varying channels

    Full text link
    Mobile communication channels are often modeled as linear time-varying filters or, equivalently, as time-frequency integral operators with finite support in time and frequency. Such a characterization inherently assumes the signals are narrowband and may not be appropriate for wideband signals. In this paper time-scale characterizations are examined that are useful in wideband time-varying channels, for which a time-scale integral operator is physically justifiable. A review of these time-frequency and time-scale characterizations is presented. Both the time-frequency and time-scale integral operators have a two-dimensional discrete characterization which motivates the design of time-frequency or time-scale rake receivers. These receivers have taps for both time and frequency (or time and scale) shifts of the transmitted signal. A general theory of these characterizations which generates, as specific cases, the discrete time-frequency and time-scale models is presented here. The interpretation of these models, namely, that they can be seen to arise from processing assumptions on the transmit and receive waveforms is discussed. Out of this discussion a third model arises: a frequency-scale continuous channel model with an associated discrete frequency-scale characterization.Comment: To appear in Communications in Information and Systems - special issue in honor of Thomas Kailath's seventieth birthda

    Glottal-synchronous speech processing

    No full text
    Glottal-synchronous speech processing is a field of speech science where the pseudoperiodicity of voiced speech is exploited. Traditionally, speech processing involves segmenting and processing short speech frames of predefined length; this may fail to exploit the inherent periodic structure of voiced speech which glottal-synchronous speech frames have the potential to harness. Glottal-synchronous frames are often derived from the glottal closure instants (GCIs) and glottal opening instants (GOIs). The SIGMA algorithm was developed for the detection of GCIs and GOIs from the Electroglottograph signal with a measured accuracy of up to 99.59%. For GCI and GOI detection from speech signals, the YAGA algorithm provides a measured accuracy of up to 99.84%. Multichannel speech-based approaches are shown to be more robust to reverberation than single-channel algorithms. The GCIs are applied to real-world applications including speech dereverberation, where SNR is improved by up to 5 dB, and to prosodic manipulation where the importance of voicing detection in glottal-synchronous algorithms is demonstrated by subjective testing. The GCIs are further exploited in a new area of data-driven speech modelling, providing new insights into speech production and a set of tools to aid deployment into real-world applications. The technique is shown to be applicable in areas of speech coding, identification and artificial bandwidth extension of telephone speec

    DWT-DCT-Based Data Hiding for Speech Bandwidth Extension

    Get PDF
    The limited narrowband frequency range, about 300-3400Hz, used in telephone network channels results in less intelligible and poor-quality telephony speech. To address this drawback, a novel robust speech bandwidth extension using Discrete Wavelet Transform- Discrete Cosine Transform Based Data Hiding (DWTDCTBDH) is proposed. In this technique, the missing speech information is embedded in the narrowband speech signal. The embedded missing speech information is recovered steadily at the receiver end to generate a wideband speech of considerably better quality. The robustness of the proposed method to quantization and channel noises is confirmed by the mean square error test. The enhancement in the quality of reconstructed wideband speech of the proposed method over conventional methods is reasserted by subjective listening and objective tests

    Adaptive Variable Degree-k Zero-Trees for Re-Encoding of Perceptually Quantized Wavelet-Packet Transformed Audio and High Quality Speech

    Full text link
    A fast, efficient and scalable algorithm is proposed, in this paper, for re-encoding of perceptually quantized wavelet-packet transform (WPT) coefficients of audio and high quality speech and is called "adaptive variable degree-k zero-trees" (AVDZ). The quantization process is carried out by taking into account some basic perceptual considerations, and achieves good subjective quality with low complexity. The performance of the proposed AVDZ algorithm is compared with two other zero-tree-based schemes comprising: 1- Embedded Zero-tree Wavelet (EZW) and 2- The set partitioning in hierarchical trees (SPIHT). Since EZW and SPIHT are designed for image compression, some modifications are incorporated in these schemes for their better matching to audio signals. It is shown that the proposed modifications can improve their performance by about 15-25%. Furthermore, it is concluded that the proposed AVDZ algorithm outperforms these modified versions in terms of both output average bit-rates and computation times.Comment: 30 pages (Double space), 15 figures, 5 tables, ISRN Signal Processing (in Press

    On optimal design and applications of linear transforms

    Get PDF
    Linear transforms are encountered in many fields of applied science and engineering. In the past, conventional block transforms provided acceptable answers to different practical problems. But now, under increasing competitive pressures, with the growing reservoir of theory and a corresponding development of computing facilities, a real demand has been created for methods that systematically improve performance. As a result the past two decades have seen the explosive growth of a class of linear transform theory known as multiresolution signal decomposition. The goal of this work is to design and apply these advanced signal processing techniques to several different problems. The optimal design of subband filter banks is considered first. Several design examples are presented for M-band filter banks. Conventional design approaches are found to present problems when the number of constraints increases. A novel optimization method is proposed using a step-by-step design of a hierarchical subband tree. This method is shown to possess performance improvements in applications such as subband image coding. The subband tree structuring is then discussed and generalized algorithms are presented. Next, the attention is focused on the interference excision problem in direct sequence spread spectrum (DSSS) communications. The analytical and experimental performance of the DSSS receiver employing excision are presented. Different excision techniques are evaluated and ranked along with the proposed adaptive subband transform-based excises. The robustness of the considered methods is investigated for either time-localized or frequency-localized interferers. A domain switchable excision algorithm is also presented. Finally, sonic of the ideas associated with the interference excision problem are utilized in the spectral shaping of a particular biological signal, namely heart rate variability. The improvements for the spectral shaping process are shown for time-frequency analysis. In general, this dissertation demonstrates the proliferation of new tools for digital signal processing

    Filter-Bank-Based Narrowband Interference Detection and Suppression in Spread Spectrum Systems

    Get PDF
    <p/> <p>A filter-bank-based narrowband interference detection and suppression method is developed and its performance is studied in a spread spectrum system. The use of an efficient, complex, critically decimated perfect reconstruction filter bank with a highly selective subband filter prototype, in combination with a newly developed excision algorithm, offers a solution with efficient implementation and performance close to the theoretical limit derived as a function of the filter bank stopband attenuation. Also methods to cope with the transient effects in case of frequency hopping interference are developed and the resulting performance shows only minor degradation in comparison to the stationary case.</p
    corecore