548 research outputs found

    New Directions in Subband Coding

    Get PDF
    Two very different subband coders are described. The first is a modified dynamic bit-allocation-subband coder (D-SBC) designed for variable rate coding situations and easily adaptable to noisy channel environments. It can operate at rates as low as 12 kb/s and still give good quality speech. The second coder is a 16-kb/s waveform coder, based on a combination of subband coding and vector quantization (VQ-SBC). The key feature of this coder is its short coding delay, which makes it suitable for real-time communication networks. The speech quality of both coders has been enhanced by adaptive postfiltering. The coders have been implemented on a single AT&T DSP32 signal processo

    Quality of Service Controlled Multimedia Transport Protocol

    Get PDF
    PhDThis research looks at the design of an open transport protocol that supports a range of services including multimedia over low data-rate networks. Low data-rate multimedia applications require a system that provides quality of service (QoS) assurance and flexibility. One promising field is the area of content-based coding. Content-based systems use an array of protocols to select the optimum set of coding algorithms. A content-based transport protocol integrates a content-based application to a transmission network. General transport protocols form a bottleneck in low data-rate multimedia communicationbsy limiting throughpuot r by not maintainingt iming requirementsT. his work presents an original model of a transport protocol that eliminates the bottleneck by introducing a flexible yet efficient algorithm that uses an open approach to flexibility and holistic architectureto promoteQ oS.T he flexibility andt ransparenccyo mesi n the form of a fixed syntaxt hat providesa seto f transportp rotocols emanticsT. he mediaQ oSi s maintained by defining a generic descriptor. Overall, the structure of the protocol is based on a single adaptablea lgorithm that supportsa pplication independencen, etwork independencea nd quality of service. The transportp rotocol was evaluatedth rougha set of assessmentos:f f-line; off-line for a specific application; and on-line for a specific application. Application contexts used MPEG-4 test material where the on-line assessmenuts eda modified MPEG-4 pl; yer. The performanceo f the QoSc ontrolledt ransportp rotocoli s often bettert hano thers chemews hen appropriateQ oS controlledm anagemenatl gorithmsa re selectedT. his is shownf irst for an off-line assessmenwt here the performancei s compared between the QoS controlled multiplexer,a n emulatedM PEG-4F lexMux multiplexers chemea, ndt he targetr equirements. The performanceis also shownt o be better in a real environmentw hen the QoS controlled multiplexeri s comparedw ith the real MPEG-4F lexMux scheme

    Adaptive Variable Degree-k Zero-Trees for Re-Encoding of Perceptually Quantized Wavelet-Packet Transformed Audio and High Quality Speech

    Full text link
    A fast, efficient and scalable algorithm is proposed, in this paper, for re-encoding of perceptually quantized wavelet-packet transform (WPT) coefficients of audio and high quality speech and is called "adaptive variable degree-k zero-trees" (AVDZ). The quantization process is carried out by taking into account some basic perceptual considerations, and achieves good subjective quality with low complexity. The performance of the proposed AVDZ algorithm is compared with two other zero-tree-based schemes comprising: 1- Embedded Zero-tree Wavelet (EZW) and 2- The set partitioning in hierarchical trees (SPIHT). Since EZW and SPIHT are designed for image compression, some modifications are incorporated in these schemes for their better matching to audio signals. It is shown that the proposed modifications can improve their performance by about 15-25%. Furthermore, it is concluded that the proposed AVDZ algorithm outperforms these modified versions in terms of both output average bit-rates and computation times.Comment: 30 pages (Double space), 15 figures, 5 tables, ISRN Signal Processing (in Press

    Decoder Hardware Architecture for HEVC

    Get PDF
    This chapter provides an overview of the design challenges faced in the implementation of hardware HEVC decoders. These challenges can be attributed to the larger and diverse coding block sizes and transform sizes, the larger interpolation filter for motion compensation, the increased number of steps in intra prediction and the introduction of a new in-loop filter. Several solutions to address these implementation challenges are discussed. As a reference, results for an HEVC decoder test chip are also presented.Texas Instruments Incorporate

    The development of speech coding and the first standard coder for public mobile telephony

    Get PDF
    This thesis describes in its core chapter (Chapter 4) the original algorithmic and design features of the ??rst coder for public mobile telephony, the GSM full-rate speech coder, as standardized in 1988. It has never been described in so much detail as presented here. The coder is put in a historical perspective by two preceding chapters on the history of speech production models and the development of speech coding techniques until the mid 1980s, respectively. In the epilogue a brief review is given of later developments in speech coding. The introductory Chapter 1 starts with some preliminaries. It is de- ??ned what speech coding is and the reader is introduced to speech coding standards and the standardization institutes which set them. Then, the attributes of a speech coder playing a role in standardization are explained. Subsequently, several applications of speech coders - including mobile telephony - will be discussed and the state of the art in speech coding will be illustrated on the basis of some worldwide recognized standards. Chapter 2 starts with a summary of the features of speech signals and their source, the human speech organ. Then, historical models of speech production which form the basis of di??erent kinds of modern speech coders are discussed. Starting with a review of ancient mechanical models, we will arrive at the electrical source-??lter model of the 1930s. Subsequently, the acoustic-tube models as they arose in the 1950s and 1960s are discussed. Finally the 1970s are reviewed which brought the discrete-time ??lter model on the basis of linear prediction. In a unique way the logical sequencing of these models is exposed, and the links are discussed. Whereas the historical models are discussed in a narrative style, the acoustic tube models and the linear prediction tech nique as applied to speech, are subject to more mathematical analysis in order to create a sound basis for the treatise of Chapter 4. This trend continues in Chapter 3, whenever instrumental in completing that basis. In Chapter 3 the reader is taken by the hand on a guided tour through time during which successive speech coding methods pass in review. In an original way special attention is paid to the evolutionary aspect. Speci??cally, for each newly proposed method it is discussed what it added to the known techniques of the time. After presenting the relevant predecessors starting with Pulse Code Modulation (PCM) and the early vocoders of the 1930s, we will arrive at Residual-Excited Linear Predictive (RELP) coders, Analysis-by-Synthesis systems and Regular- Pulse Excitation in 1984. The latter forms the basis of the GSM full-rate coder. In Chapter 4, which constitutes the core of this thesis, explicit forms of Multi-Pulse Excited (MPE) and Regular-Pulse Excited (RPE) analysis-by-synthesis coding systems are developed. Starting from current pulse-amplitude computation methods in 1984, which included solving sets of equations (typically of order 10-16) two hundred times a second, several explicit-form designs are considered by which solving sets of equations in real time is avoided. Then, the design of a speci??c explicitform RPE coder and an associated eÆcient architecture are described. The explicit forms and the resulting architectural features have never been published in so much detail as presented here. Implementation of such a codec enabled real-time operation on a state-of-the-art singlechip digital signal processor of the time. This coder, at a bit rate of 13 kbit/s, has been selected as the Full-Rate GSM standard in 1988. Its performance is recapitulated. Chapter 5 is an epilogue brie y reviewing the major developments in speech coding technology after 1988. Many speech coding standards have been set, for mobile telephony as well as for other applications, since then. The chapter is concluded by an outlook

    Non-intrusive identification of speech codecs in digital audio signals

    Get PDF
    Speech compression has become an integral component in all modern telecommunications networks. Numerous codecs have been developed and deployed for efficiently transmitting voice signals while maintaining high perceptual quality. Because of the diversity of speech codecs used by different carriers and networks, the ability to distinguish between different codecs lends itself to a wide variety of practical applications, including determining call provenance, enhancing network diagnostic metrics, and improving automated speaker recognition. However, few research efforts have attempted to provide a methodology for identifying amongst speech codecs in an audio signal. In this research, we demonstrate a novel approach for accurately determining the presence of several contemporary speech codecs in a non-intrusive manner. The methodology developed in this research demonstrates techniques for analyzing an audio signal such that the subtle noise components introduced by the codec processing are accentuated while most of the original speech content is eliminated. Using these techniques, an audio signal may be profiled to gather a set of values that effectively characterize the codec present in the signal. This procedure is first applied to a large data set of audio signals from known codecs to develop a set of trained profiles. Thereafter, signals from unknown codecs may be similarly profiled, and the profiles compared to each of the known training profiles in order to decide which codec is the best match with the unknown signal. Overall, the proposed strategy generates extremely favorable results, with codecs being identified correctly in nearly 95% of all test signals. In addition, the profiling process is shown to require a very short analysis length of less than 4 seconds of audio to achieve these results. Both the identification rate and the small analysis window represent dramatic improvements over previous efforts in speech codec identification

    Recognizing Voice Over IP: A Robust Front-End for Speech Recognition on the World Wide Web

    Get PDF
    The Internet Protocol (IP) environment poses two relevant sources of distortion to the speech recognition problem: lossy speech coding and packet loss. In this paper, we propose a new front-end for speech recognition over IP networks. Specifically, we suggest extracting the recognition feature vectors directly from the encoded speech (i.e., the bit stream) instead of decoding it and subsequently extracting the feature vectors. This approach offers two significant benefits. First, the recognition system is only affected by the quantization distortion of the spectral envelope. Thus, we are avoiding the influence of other sources of distortion due to the encoding-decoding process. Second, when packet loss occurs, our front-end becomes more effective since it is not constrained to the error handling mechanism of the codec. We have considered the ITU G.723.1 standard codec, which is one of the most preponderant coding algorithms in voice over IP (VoIP) and compared the proposed front-end with the conventional approach in two automatic speech recognition (ASR) tasks, namely, speaker-independent isolated digit recognition and speaker-independent continuous speech recognition. In general, our approach outperforms the conventional procedure, for a variety of simulated packet loss rates. Furthermore, the improvement is higher as network conditions worsen.Publicad

    Receptive Prosody Skills in Individuals with High Functioning Autism Spectrum Disorders

    Get PDF
    Prosodic differences have been noted in the speech production of individuals with Autism Spectrum Disorders (ASD); however, little is known regarding their ability to perceive and understand features of prosody. It has been determined that children with typical development (TD) can recognize and utilize the prosodic cue of contrastive stress to facilitate interpretation of spoken instructions (Arnold, 2008). We examined this skill in 12 children and adolescents with ASD, and 12 with TD through the analysis of eye fixations to objects during instructions with varying discourse statuses (given or new) and stress patterns (accented or unaccented). Results indicated that both the participants with TD and with ASD were able to perceive and interpret the prosodic cue of contrastive stress within the contextual communication task. No relationships between language, cognitive, or expressive prosody skills and receptive prosody skills were found. Possible explanations, and clinical implications are discussed
    • …
    corecore