469 research outputs found
Scalable Speech Coding for IP Networks
The emergence of Voice over Internet Protocol (VoIP) has posed new challenges to the development of speech codecs. The key issue of transporting real-time voice packet over IP networks is the lack of guarantee for reasonable speech quality due to packet delay or loss.
Most of the widely used narrowband codecs depend on the Code Excited Linear Prediction (CELP) coding technique. The CELP technique utilizes the long-term prediction across the frame boundaries and therefore causes error propagation in the case of packet loss and need to transmit redundant information in order to mitigate the problem. The internet Low Bit-rate Codec (iLBC) employs the frame-independent coding and therefore inherently possesses high robustness to packet loss. However, the original iLBC lacks in some of the key features of speech codecs for IP networks: Rate flexibility, Scalability, and Wideband support.
This dissertation presents novel scalable narrowband and wideband speech codecs for IP networks using the frame independent coding scheme based on the iLBC. The rate flexibility is added to the iLBC by employing the discrete cosine transform (DCT) and iii the scalable algebraic vector quantization (AVQ) and by allocating different number of bits to the AVQ. The bit-rate scalability is obtained by adding the enhancement layer to the core layer of the multi-rate iLBC. The enhancement layer encodes the weighted iLBC coding error in the modified DCT (MDCT) domain. The proposed wideband codec employs the bandwidth extension technique to extend the capabilities of existing narrowband codecs to provide wideband coding functionality. The wavelet transform is also used to further enhance the performance of the proposed codec.
The performance evaluation results show that the proposed codec provides high robustness to packet loss and achieves equivalent or higher speech quality than state-of-the-art codecs under the clean channel condition
Canonical time-frequency, time-scale, and frequency-scale representations of time-varying channels
Mobile communication channels are often modeled as linear time-varying
filters or, equivalently, as time-frequency integral operators with finite
support in time and frequency. Such a characterization inherently assumes the
signals are narrowband and may not be appropriate for wideband signals. In this
paper time-scale characterizations are examined that are useful in wideband
time-varying channels, for which a time-scale integral operator is physically
justifiable. A review of these time-frequency and time-scale characterizations
is presented. Both the time-frequency and time-scale integral operators have a
two-dimensional discrete characterization which motivates the design of
time-frequency or time-scale rake receivers. These receivers have taps for both
time and frequency (or time and scale) shifts of the transmitted signal. A
general theory of these characterizations which generates, as specific cases,
the discrete time-frequency and time-scale models is presented here. The
interpretation of these models, namely, that they can be seen to arise from
processing assumptions on the transmit and receive waveforms is discussed. Out
of this discussion a third model arises: a frequency-scale continuous channel
model with an associated discrete frequency-scale characterization.Comment: To appear in Communications in Information and Systems - special
issue in honor of Thomas Kailath's seventieth birthda
Glottal-synchronous speech processing
Glottal-synchronous speech processing is a field of speech science where the pseudoperiodicity
of voiced speech is exploited. Traditionally, speech processing involves segmenting
and processing short speech frames of predefined length; this may fail to exploit the inherent
periodic structure of voiced speech which glottal-synchronous speech frames have
the potential to harness. Glottal-synchronous frames are often derived from the glottal
closure instants (GCIs) and glottal opening instants (GOIs).
The SIGMA algorithm was developed for the detection of GCIs and GOIs from
the Electroglottograph signal with a measured accuracy of up to 99.59%. For GCI and
GOI detection from speech signals, the YAGA algorithm provides a measured accuracy
of up to 99.84%. Multichannel speech-based approaches are shown to be more robust to
reverberation than single-channel algorithms.
The GCIs are applied to real-world applications including speech dereverberation,
where SNR is improved by up to 5 dB, and to prosodic manipulation where the importance
of voicing detection in glottal-synchronous algorithms is demonstrated by subjective
testing. The GCIs are further exploited in a new area of data-driven speech modelling,
providing new insights into speech production and a set of tools to aid deployment into
real-world applications. The technique is shown to be applicable in areas of speech coding,
identification and artificial bandwidth extension of telephone speec
DWT-DCT-Based Data Hiding for Speech Bandwidth Extension
The limited narrowband frequency range, about 300-3400Hz, used in telephone network channels results in less intelligible and poor-quality telephony speech. To address this drawback, a novel robust speech bandwidth extension using Discrete Wavelet Transform- Discrete Cosine Transform Based Data Hiding (DWTDCTBDH) is proposed. In this technique, the missing speech information is embedded in the narrowband speech signal. The embedded missing speech information is recovered steadily at the receiver end to generate a wideband speech of considerably better quality. The robustness of the proposed method to quantization and channel noises is confirmed by the mean square error test. The enhancement in the quality of reconstructed wideband speech of the proposed method over conventional methods is reasserted by subjective listening and objective tests
Adaptive Variable Degree-k Zero-Trees for Re-Encoding of Perceptually Quantized Wavelet-Packet Transformed Audio and High Quality Speech
A fast, efficient and scalable algorithm is proposed, in this paper, for
re-encoding of perceptually quantized wavelet-packet transform (WPT)
coefficients of audio and high quality speech and is called "adaptive variable
degree-k zero-trees" (AVDZ). The quantization process is carried out by taking
into account some basic perceptual considerations, and achieves good subjective
quality with low complexity. The performance of the proposed AVDZ algorithm is
compared with two other zero-tree-based schemes comprising: 1- Embedded
Zero-tree Wavelet (EZW) and 2- The set partitioning in hierarchical trees
(SPIHT). Since EZW and SPIHT are designed for image compression, some
modifications are incorporated in these schemes for their better matching to
audio signals. It is shown that the proposed modifications can improve their
performance by about 15-25%. Furthermore, it is concluded that the proposed
AVDZ algorithm outperforms these modified versions in terms of both output
average bit-rates and computation times.Comment: 30 pages (Double space), 15 figures, 5 tables, ISRN Signal Processing
(in Press
On optimal design and applications of linear transforms
Linear transforms are encountered in many fields of applied science and engineering. In the past, conventional block transforms provided acceptable answers to different practical problems. But now, under increasing competitive pressures, with the growing reservoir of theory and a corresponding development of computing facilities, a real demand has been created for methods that systematically improve performance. As a result the past two decades have seen the explosive growth of a class of linear transform theory known as multiresolution signal decomposition. The goal of this work is to design and apply these advanced signal processing techniques to several different problems.
The optimal design of subband filter banks is considered first. Several design examples are presented for M-band filter banks. Conventional design approaches are found to present problems when the number of constraints increases. A novel optimization method is proposed using a step-by-step design of a hierarchical subband tree. This method is shown to possess performance improvements in applications such as subband image coding. The subband tree structuring is then discussed and generalized algorithms are presented. Next, the attention is focused on the interference excision problem in direct sequence spread spectrum (DSSS) communications. The analytical and experimental performance of the DSSS receiver employing excision are presented. Different excision techniques are evaluated and ranked along with the proposed adaptive subband transform-based excises. The robustness of the considered methods is investigated for either time-localized or frequency-localized interferers. A domain switchable excision algorithm is also presented. Finally, sonic of the ideas associated with the interference excision problem are utilized in the spectral shaping of a particular biological signal, namely heart rate variability. The improvements for the spectral shaping process are shown for time-frequency analysis. In general, this dissertation demonstrates the proliferation of new tools for digital signal processing
Filter-Bank-Based Narrowband Interference Detection and Suppression in Spread Spectrum Systems
<p/> <p>A filter-bank-based narrowband interference detection and suppression method is developed and its performance is studied in a spread spectrum system. The use of an efficient, complex, critically decimated perfect reconstruction filter bank with a highly selective subband filter prototype, in combination with a newly developed excision algorithm, offers a solution with efficient implementation and performance close to the theoretical limit derived as a function of the filter bank stopband attenuation. Also methods to cope with the transient effects in case of frequency hopping interference are developed and the resulting performance shows only minor degradation in comparison to the stationary case.</p
- …