35 research outputs found

    Low Delay Sparse and Mixed Excitation CELP Coders for Wideband Speech Coding

    Get PDF
    Code Excited Linear Prediction (CELP) algorithmsare proposed for compression of speech in 8 kHz band atswitched or variable bit rate and algorithmic delay not exceeding2 msec. Two structures of Low-Delay CELP coders are analyzed:Low-delay sparse excitation and mixed excitation CELP. Sparseexcitation is based on MP-MLQ and multilayer models. Mixedexcitation CELP algorithm stems from the narrowband G.728standard. As opposed to G.728 LD-CELP coder, mixed excitationcodebook consists of pseudorandom vectors and sequencesobtained with Long-Term Prediction (LTP). Variable rate codingconsists in maximizing vector dimension while keeping therequired speech quality. Good speech quality (MOS=3.9according to PESQ algorithm) is obtained at average bit rate 33.5kbit/sec

    Speaker Recognition in the VoIP Environment

    Get PDF
    Tato práce popisuje použití systémů pro rozpoznávání mluvčího v~prostředí VoIP, úspěšnost systému a přístupy k jejímu zlepšení. Popisuje architekturu těchto systémů, metriky pro vyhodnocení jejich úspěšnosti a klíčové komponenty VoIP z hlediska rozpoznávání mluvčího. Je zde popsáno vytvoření simulace VoIP prostředí, úspěšnost systému je vyhodnocena na datech pocházejících z různých druhů VoIP prostředí a výsledky jsou demostrovány. Adaptace a kalibrace systému je provedena a jejich přínosy zhodnoceny.This work describes using speaker recognition systems in the VoIP environment, system performance and approaches to improving it. System architecture, evaluation metrics and VoIP technology key components from the view of speaker recognition are described. VoIP environment simulation is described. Speaker recognition system's performance is evaluated on data sets from various kinds of VoIP environments and the results are demonstrated. System adaptation and calibration is performed and their benefits are discussed.

    A configurable vector processor for accelerating speech coding algorithms

    Get PDF
    The growing demand for voice-over-packer (VoIP) services and multimedia-rich applications has made increasingly important the efficient, real-time implementation of low-bit rates speech coders on embedded VLSI platforms. Such speech coders are designed to substantially reduce the bandwidth requirements thus enabling dense multichannel gateways in small form factor. This however comes at a high computational cost which mandates the use of very high performance embedded processors. This thesis investigates the potential acceleration of two major ITU-T speech coding algorithms, namely G.729A and G.723.1, through their efficient implementation on a configurable extensible vector embedded CPU architecture. New scalar and vector ISAs were introduced which resulted in up to 80% reduction in the dynamic instruction count of both workloads. These instructions were subsequently encapsulated into a parametric, hybrid SISD (scalar processor)–SIMD (vector) processor. This work presents the research and implementation of the vector datapath of this vector coprocessor which is tightly-coupled to a Sparc-V8 compliant CPU, the optimization and simulation methodologies employed and the use of Electronic System Level (ESL) techniques to rapidly design SIMD datapaths

    Non-intrusive identification of speech codecs in digital audio signals

    Get PDF
    Speech compression has become an integral component in all modern telecommunications networks. Numerous codecs have been developed and deployed for efficiently transmitting voice signals while maintaining high perceptual quality. Because of the diversity of speech codecs used by different carriers and networks, the ability to distinguish between different codecs lends itself to a wide variety of practical applications, including determining call provenance, enhancing network diagnostic metrics, and improving automated speaker recognition. However, few research efforts have attempted to provide a methodology for identifying amongst speech codecs in an audio signal. In this research, we demonstrate a novel approach for accurately determining the presence of several contemporary speech codecs in a non-intrusive manner. The methodology developed in this research demonstrates techniques for analyzing an audio signal such that the subtle noise components introduced by the codec processing are accentuated while most of the original speech content is eliminated. Using these techniques, an audio signal may be profiled to gather a set of values that effectively characterize the codec present in the signal. This procedure is first applied to a large data set of audio signals from known codecs to develop a set of trained profiles. Thereafter, signals from unknown codecs may be similarly profiled, and the profiles compared to each of the known training profiles in order to decide which codec is the best match with the unknown signal. Overall, the proposed strategy generates extremely favorable results, with codecs being identified correctly in nearly 95% of all test signals. In addition, the profiling process is shown to require a very short analysis length of less than 4 seconds of audio to achieve these results. Both the identification rate and the small analysis window represent dramatic improvements over previous efforts in speech codec identification

    Steganography integration into a low-bit rate speech codec

    Get PDF
    Low bit-rate speech codecs have been widely used in audio communications like VoIP and mobile communications, so that steganography in low bit-rate audio streams would have broad applications in practice. In this paper, the authors propose a new algorithm for steganography in low bit-rate VoIP audio streams by integrating information hiding into the process of speech encoding. The proposed algorithm performs data embedding while pitch period prediction is conducted during low bit-rate speech encoding, thus maintaining synchronization between information hiding and speech encoding. The steganography algorithm can achieve high quality of speech and prevent detection of steganalysis, but also has great compatibility with a standard low bit-rate speech codec without causing further delay by data embedding and extraction. Testing shows, with the proposed algorithm, the data embedding rate of the secret message can attain 4 bits / frame (133.3 bits / second)

    Adaptive header compression techniques for mobile multimedia networks

    Get PDF
    EThOS - Electronic Theses Online ServiceGBUnited Kingdo

    A comprehensive VoIP system with PSTN connectivity.

    Get PDF
    Yuen Ka-nang.Thesis (M.Phil.)--Chinese University of Hong Kong, 2001.Includes bibliographical references (leaves 133-135).Abstracts in English and Chinese.Abstract --- p.iAcknowledgement --- p.iiiChapter 1. --- INTRODUCTION --- p.1Chapter 1.1. --- Background --- p.1Chapter 1.2. --- Objectives --- p.1Chapter 1.3. --- Overview of Thesis --- p.2Chapter 2. --- NETWORK ASPECT OF THE VOIP TECHNOLOGY --- p.3Chapter 2.1. --- VoIP Overview --- p.3Chapter 2.2. --- Elements in VoIP --- p.3Chapter 2.2.1. --- Call Setup --- p.3Chapter 2.2.2. --- Media Capture/Playback --- p.4Chapter 2.2.3. --- Media Encoding/Decoding --- p.4Chapter 2.2.4. --- Media Transportation --- p.5Chapter 2.3. --- Performance Factors Affecting VoIP --- p.6Chapter 2.3.1. --- Network Bandwidth --- p.6Chapter 2.3.2. --- Latency --- p.6Chapter 2.3.3. --- Packet Loss --- p.7Chapter 2.3.4. --- Voice Quality --- p.7Chapter 2.3.5. --- Quality of Service (QoS) --- p.7Chapter 2.4. --- Different Requirements of Intranet VoIP and Internet VoIP --- p.8Chapter 2.4.1. --- Packet Loss/Delay/Jitter --- p.8Chapter 2.4.2. --- Interoperability --- p.9Chapter 2.4.3. --- Available Bandwidth --- p.9Chapter 2.4.4. --- Security Requirement --- p.10Chapter 2.5. --- Some Feasibility Investigations --- p.10Chapter 2.5.1. --- Bandwidth Calculation --- p.10Chapter 2.5.2. --- Simulation --- p.12Chapter 2.5.3. --- Conclusion --- p.17Chapter 2.5.4. --- Simulation Restrictions --- p.17Chapter 3. --- SOFTWARE ASPECT OF THE VOIP TECHNOLOGY --- p.19Chapter 3.1. --- VoIP Client in JMF --- p.19Chapter 3.1.1. --- Architecture --- p.20Chapter 3.1.2. --- Incoming Voice Stream Handling --- p.23Chapter 3.1.3. --- Outgoing Voice Stream Handling --- p.23Chapter 3.1.4. --- Relation between Incoming/Outgoing Voice Stream Handling --- p.23Chapter 3.1.5. --- Areas for Further Improvement --- p.25Chapter 3.2. --- Capture/Playback Enhanced VoIP Client --- p.26Chapter 3.2.1. --- Architecture --- p.27Chapter 3.2.2. --- Native Voice Playback Mechanism --- p.29Chapter 3.2.3. --- Native Voice Capturing Mechanism --- p.31Chapter 3.3. --- Win32 C++ VoIP Client --- p.31Chapter 3.3.1. --- Objectives --- p.32Chapter 3.3.2. --- Architecture --- p.33Chapter 3.3.3. --- Problems and Solutions in Implementation --- p.37Chapter 3.4. --- Win32 DirectSound C++ VoIP Client --- p.38Chapter 3.4.1. --- Architecture --- p.39Chapter 3.4.2. --- DirectSound Voice Playback Mechanism --- p.40Chapter 3.4.3. --- DirectSound Voice Capturing Mechanism --- p.44Chapter 3.5. --- Testing VoIP Clients --- p.45Chapter 3.5.1. --- Setup of Experiment --- p.45Chapter 3.5.2. --- Experiment Results --- p.47Chapter 3.5.3. --- Experiment Conclusion --- p.48Chapter 3.6. --- Real-time Voice Stream Mixing Server --- p.48Chapter 3.6.1. --- Structure Overview --- p.48Chapter 3.6.2. --- Experiment --- p.53Chapter 3.6.3. --- Conclusion --- p.54Chapter 4. --- EXPERIMENTAL STUDIES --- p.55Chapter 4.1. --- Pure IP-side VoIP-based Call Center ´ؤ VoIP in Education --- p.55Chapter 4.1.1. --- Architecture --- p.55Chapter 4.1.2. --- Client Structure --- p.56Chapter 4.1.3. --- Client Applet User Interface --- p.58Chapter 4.1.4. --- Observations --- p.63Chapter 4.2. --- A Simple PBX Experiment --- p.63Chapter 4.2.1. --- Structural Overview --- p.63Chapter 4.2.2. --- PSTN Gateway Server Program --- p.64Chapter 4.2.3. --- Problems and Solutions in Implementation --- p.66Chapter 4.2.4. --- Experiment 1 --- p.66Chapter 4.2.5. --- Experiment 2 --- p.68Chapter 5. --- A COMPREHENSIVE VOIP PROJECT 一 GRADUATE SECOND PHONE (GSP) --- p.72Chapter 5.1. --- Overview --- p.72Chapter 5.1.1. --- Background --- p.72Chapter 5.1.2. --- Architecture --- p.76Chapter 5.1.3. --- Technologies Used --- p.78Chapter 5.1.4. --- Major Functions --- p.80Chapter 5.2. --- Client --- p.84Chapter 5.2.1. --- Structure Overview --- p.85Chapter 5.2.2. --- Connection Procedure --- p.89Chapter 5.2.3. --- User Interface --- p.91Chapter 5.2.4. --- Observations --- p.92Chapter 5.3. --- Gateway --- p.94Chapter 5.3.1. --- Structure Overview --- p.94Chapter 5.3.2. --- Connection Procedure --- p.97Chapter 5.3.3. --- Caller ID Simulator --- p.97Chapter 5.3.4. --- Observations --- p.98Chapter 5.4. --- Server --- p.101Chapter 5.4.1. --- Structure Overview --- p.101Chapter 5.5. --- Details of Major Functions --- p.103Chapter 5.5.1. --- Secure Local Voice Message Box --- p.104Chapter 5.5.2. --- Call Distribution --- p.106Chapter 5.5.3. --- Call Forward --- p.112Chapter 5.5.4. --- Call Transfer --- p.115Chapter 5.6. --- Experiments --- p.116Chapter 5.6.1. --- Secure Local Voice Message Box --- p.117Chapter 5.6.2. --- Call Distribution --- p.118Chapter 5.6.3. --- Call Forward --- p.121Chapter 5.6.4. --- Call Transfer --- p.122Chapter 5.6.5. --- Dial Out --- p.124Chapter 5.7. --- Observations --- p.125Chapter 5.8. --- Outlook --- p.126Chapter 5.9. --- Alternatives --- p.127Chapter 5.9.1. --- Netmeeting --- p.127Chapter 5.9.2. --- OpenH323 --- p.128Chapter 6. --- CONCLUSIONS --- p.129Bibliography --- p.13

    Voice Call Capacity Over Wireless Mesh Networks

    Get PDF
    The goal of this thesis is to understand the voice call carrying capacity of an IEEE 802.11b/e based ad hoc network. We begin with the modelling of conversational speech and define a six state semi-Markov voice model based on ITU-T P59 recommendation. We perform a theoretical analysis of the voice model and compare it with results obtained via simulations. Using a Java based IEEE 802.11 medium access layer simulator, we determine the upper-bound for the number of voice calls carried by an ad hoc network. We use a linear topology with the ideal carrier sensing range and evaluate the number of calls carried using packet loss and packet delay as metrics. We observe that, for one, two, three and four hop, 5.5 Mbps IEEE 802.11 wireless links have an upper-bound of eight, six, five, and three voice calls respectively. We then consider a carrier sensing range and a path loss model and compare them with the ideal case. We observe, after considering a carrier sensing range with path loss model, there is a reduction in the number of calls carried by the linear networks. One, two, three and four hop 5.5 Mbps IEEE 802.11 wireless links support eight, five, four, and two voice calls respectively, when a carrier sensing range and a path loss model is considered. We also find that by adopting packet dropping policies at the nodes, we improve the call carrying capacity and quality of service on the network. In our simulations of a two hop network in path loss conditions, we find that, by adopting a time delay based packet dropping policy at the nodes, the number of calls supported simultaneously increased from five to six. In a four hop linear network we find that by total packet loss is reduced by 20%, adopting a random packet dropping policy and by 50% adopting a time delay based packet dropping policy. Although there is no change in number of calls supported, load on the network is reduced

    Steganography Integration Into a Low-Bit Rate Speech Codec

    Full text link

    The development of speech coding and the first standard coder for public mobile telephony

    Get PDF
    This thesis describes in its core chapter (Chapter 4) the original algorithmic and design features of the ??rst coder for public mobile telephony, the GSM full-rate speech coder, as standardized in 1988. It has never been described in so much detail as presented here. The coder is put in a historical perspective by two preceding chapters on the history of speech production models and the development of speech coding techniques until the mid 1980s, respectively. In the epilogue a brief review is given of later developments in speech coding. The introductory Chapter 1 starts with some preliminaries. It is de- ??ned what speech coding is and the reader is introduced to speech coding standards and the standardization institutes which set them. Then, the attributes of a speech coder playing a role in standardization are explained. Subsequently, several applications of speech coders - including mobile telephony - will be discussed and the state of the art in speech coding will be illustrated on the basis of some worldwide recognized standards. Chapter 2 starts with a summary of the features of speech signals and their source, the human speech organ. Then, historical models of speech production which form the basis of di??erent kinds of modern speech coders are discussed. Starting with a review of ancient mechanical models, we will arrive at the electrical source-??lter model of the 1930s. Subsequently, the acoustic-tube models as they arose in the 1950s and 1960s are discussed. Finally the 1970s are reviewed which brought the discrete-time ??lter model on the basis of linear prediction. In a unique way the logical sequencing of these models is exposed, and the links are discussed. Whereas the historical models are discussed in a narrative style, the acoustic tube models and the linear prediction tech nique as applied to speech, are subject to more mathematical analysis in order to create a sound basis for the treatise of Chapter 4. This trend continues in Chapter 3, whenever instrumental in completing that basis. In Chapter 3 the reader is taken by the hand on a guided tour through time during which successive speech coding methods pass in review. In an original way special attention is paid to the evolutionary aspect. Speci??cally, for each newly proposed method it is discussed what it added to the known techniques of the time. After presenting the relevant predecessors starting with Pulse Code Modulation (PCM) and the early vocoders of the 1930s, we will arrive at Residual-Excited Linear Predictive (RELP) coders, Analysis-by-Synthesis systems and Regular- Pulse Excitation in 1984. The latter forms the basis of the GSM full-rate coder. In Chapter 4, which constitutes the core of this thesis, explicit forms of Multi-Pulse Excited (MPE) and Regular-Pulse Excited (RPE) analysis-by-synthesis coding systems are developed. Starting from current pulse-amplitude computation methods in 1984, which included solving sets of equations (typically of order 10-16) two hundred times a second, several explicit-form designs are considered by which solving sets of equations in real time is avoided. Then, the design of a speci??c explicitform RPE coder and an associated eÆcient architecture are described. The explicit forms and the resulting architectural features have never been published in so much detail as presented here. Implementation of such a codec enabled real-time operation on a state-of-the-art singlechip digital signal processor of the time. This coder, at a bit rate of 13 kbit/s, has been selected as the Full-Rate GSM standard in 1988. Its performance is recapitulated. Chapter 5 is an epilogue brie y reviewing the major developments in speech coding technology after 1988. Many speech coding standards have been set, for mobile telephony as well as for other applications, since then. The chapter is concluded by an outlook
    corecore