335 research outputs found

    Quantisation mechanisms in multi-protoype waveform coding

    Get PDF
    Prototype Waveform Coding is one of the most promising methods for speech coding at low bit rates over telecommunications networks. This thesis investigates quantisation mechanisms in Multi-Prototype Waveform (MPW) coding, and two prototype waveform quantisation algorithms for speech coding at bit rates of 2.4kb/s are proposed. Speech coders based on these algorithms have been found to be capable of producing coded speech with equivalent perceptual quality to that generated by the US 1016 Federal Standard CELP-4.8kb/s algorithm. The two proposed prototype waveform quantisation algorithms are based on Prototype Waveform Interpolation (PWI). The first algorithm is in an open loop architecture (Open Loop Quantisation). In this algorithm, the speech residual is represented as a series of prototype waveforms (PWs). The PWs are extracted in both voiced and unvoiced speech, time aligned and quantised and, at the receiver, the excitation is reconstructed by smooth interpolation between them. For low bit rate coding, the PW is decomposed into a slowly evolving waveform (SEW) and a rapidly evolving waveform (REW). The SEW is coded using vector quantisation on both magnitude and phase spectra. The SEW codebook search is based on the best matching of the SEW and the SEW codebook vector. The REW phase spectra is not quantised, but it is recovered using Gaussian noise. The REW magnitude spectra, on the other hand, can be either quantised with a certain update rate or only derived according to SEW behaviours

    Efficient algorithms for scalable video coding

    Get PDF
    A scalable video bitstream specifically designed for the needs of various client terminals, network conditions, and user demands is much desired in current and future video transmission and storage systems. The scalable extension of the H.264/AVC standard (SVC) has been developed to satisfy the new challenges posed by heterogeneous environments, as it permits a single video stream to be decoded fully or partially with variable quality, resolution, and frame rate in order to adapt to a specific application. This thesis presents novel improved algorithms for SVC, including: 1) a fast inter-frame and inter-layer coding mode selection algorithm based on motion activity; 2) a hierarchical fast mode selection algorithm; 3) a two-part Rate Distortion (RD) model targeting the properties of different prediction modes for the SVC rate control scheme; and 4) an optimised Mean Absolute Difference (MAD) prediction model. The proposed fast inter-frame and inter-layer mode selection algorithm is based on the empirical observation that a macroblock (MB) with slow movement is more likely to be best matched by one in the same resolution layer. However, for a macroblock with fast movement, motion estimation between layers is required. Simulation results show that the algorithm can reduce the encoding time by up to 40%, with negligible degradation in RD performance. The proposed hierarchical fast mode selection scheme comprises four levels and makes full use of inter-layer, temporal and spatial correlation aswell as the texture information of each macroblock. Overall, the new technique demonstrates the same coding performance in terms of picture quality and compression ratio as that of the SVC standard, yet produces a saving in encoding time of up to 84%. Compared with state-of-the-art SVC fast mode selection algorithms, the proposed algorithm achieves a superior computational time reduction under very similar RD performance conditions. The existing SVC rate distortion model cannot accurately represent the RD properties of the prediction modes, because it is influenced by the use of inter-layer prediction. A separate RD model for inter-layer prediction coding in the enhancement layer(s) is therefore introduced. Overall, the proposed algorithms improve the average PSNR by up to 0.34dB or produce an average saving in bit rate of up to 7.78%. Furthermore, the control accuracy is maintained to within 0.07% on average. As aMADprediction error always exists and cannot be avoided, an optimisedMADprediction model for the spatial enhancement layers is proposed that considers the MAD from previous temporal frames and previous spatial frames together, to achieve a more accurateMADprediction. Simulation results indicate that the proposedMADprediction model reduces the MAD prediction error by up to 79% compared with the JVT-W043 implementation

    An investigation into the requirements for an efficient image transmission system over an ATM network

    Get PDF
    This thesis looks into the problems arising in an image transmission system when transmitting over an A TM network. Two main areas were investigated: (i) an alternative coding technique to reduce the bit rate required; and (ii) concealment of errors due to cell loss, with emphasis on processing in the transform domain of DCT-based images. [Continues.

    Advanced signal processing techniques for pitch synchronous sinusoidal speech coders

    Get PDF
    Recent trends in commercial and consumer demand have led to the increasing use of multimedia applications in mobile and Internet telephony. Although audio, video and data communications are becoming more prevalent, a major application is and will remain the transmission of speech. Speech coding techniques suited to these new trends must be developed, not only to provide high quality speech communication but also to minimise the required bandwidth for speech, so as to maximise that available for the new audio, video and data services. The majority of current speech coders employed in mobile and Internet applications employ a Code Excited Linear Prediction (CELP) model. These coders attempt to reproduce the input speech signal and can produce high quality synthetic speech at bit rates above 8 kbps. Sinusoidal speech coders tend to dominate at rates below 6 kbps but due to limitations in the sinusoidal speech coding model, their synthetic speech quality cannot be significantly improved even if their bit rate is increased. Recent developments have seen the emergence and application of Pitch Synchronous (PS) speech coding techniques to these coders in order to remove the limitations of the sinusoidal speech coding model. The aim of the research presented in this thesis is to investigate and eliminate the factors that limit the quality of the synthetic speech produced by PS sinusoidal coders. In order to achieve this innovative signal processing techniques have been developed. New parameter analysis and quantisation techniques have been produced which overcome many of the problems associated with applying PS techniques to sinusoidal coders. In sinusoidal based coders, two of the most important elements are the correct formulation of pitch and voicing values from the' input speech. The techniques introduced here have greatly improved these calculations resulting in a higher quality PS sinusoidal speech coder than was previously available. A new quantisation method which is able to reduce the distortion from quantising speech spectral information has also been developed. When these new techniques are utilised they effectively raise the synthetic speech quality of sinusoidal coders to a level comparable to that produced by CELP based schemes, making PS sinusoidal coders a promising alternative at low to medium bit rates.EThOS - Electronic Theses Online ServiceGBUnited Kingdo

    Algorithms and architectures for the multirate additive synthesis of musical tones

    Get PDF
    In classical Additive Synthesis (AS), the output signal is the sum of a large number of independently controllable sinusoidal partials. The advantages of AS for music synthesis are well known as is the high computational cost. This thesis is concerned with the computational optimisation of AS by multirate DSP techniques. In note-based music synthesis, the expected bounds of the frequency trajectory of each partial in a finite lifecycle tone determine critical time-invariant partial-specific sample rates which are lower than the conventional rate (in excess of 40kHz) resulting in computational savings. Scheduling and interpolation (to suppress quantisation noise) for many sample rates is required, leading to the concept of Multirate Additive Synthesis (MAS) where these overheads are minimised by synthesis filterbanks which quantise the set of available sample rates. Alternative AS optimisations are also appraised. It is shown that a hierarchical interpretation of the QMF filterbank preserves AS generality and permits efficient context-specific adaptation of computation to required note dynamics. Practical QMF implementation and the modifications necessary for MAS are discussed. QMF transition widths can be logically excluded from the MAS paradigm, at a cost. Therefore a novel filterbank is evaluated where transition widths are physically excluded. Benchmarking of a hypothetical orchestral synthesis application provides a tentative quantitative analysis of the performance improvement of MAS over AS. The mapping of MAS into VLSI is opened by a review of sine computation techniques. Then the functional specification and high-level design of a conceptual MAS Coprocessor (MASC) is developed which functions with high autonomy in a loosely-coupled master- slave configuration with a Host CPU which executes filterbanks in software. Standard hardware optimisation techniques are used, such as pipelining, based upon the principle of an application-specific memory hierarchy which maximises MASC throughput

    Detection processes for digital satellite modems

    Get PDF
    The aim of this study is to devise detectors for digital satellite modems, that have tolerances to additive white Gaussian noise which are as close as possible to that for optimal detection, at a fraction of the equipment complexity required for optimal detection. Computer simulation tests and theoretical analyses are used to compare the proposed detectors. [Continues.

    3D multiple description coding for error resilience over wireless networks

    Get PDF
    Mobile communications has gained a growing interest from both customers and service providers alike in the last 1-2 decades. Visual information is used in many application domains such as remote health care, video –on demand, broadcasting, video surveillance etc. In order to enhance the visual effects of digital video content, the depth perception needs to be provided with the actual visual content. 3D video has earned a significant interest from the research community in recent years, due to the tremendous impact it leaves on viewers and its enhancement of the user’s quality of experience (QoE). In the near future, 3D video is likely to be used in most video applications, as it offers a greater sense of immersion and perceptual experience. When 3D video is compressed and transmitted over error prone channels, the associated packet loss leads to visual quality degradation. When a picture is lost or corrupted so severely that the concealment result is not acceptable, the receiver typically pauses video playback and waits for the next INTRA picture to resume decoding. Error propagation caused by employing predictive coding may degrade the video quality severely. There are several ways used to mitigate the effects of such transmission errors. One widely used technique in International Video Coding Standards is error resilience. The motivation behind this research work is that, existing schemes for 2D colour video compression such as MPEG, JPEG and H.263 cannot be applied to 3D video content. 3D video signals contain depth as well as colour information and are bandwidth demanding, as they require the transmission of multiple high-bandwidth 3D video streams. On the other hand, the capacity of wireless channels is limited and wireless links are prone to various types of errors caused by noise, interference, fading, handoff, error burst and network congestion. Given the maximum bit rate budget to represent the 3D scene, optimal bit-rate allocation between texture and depth information rendering distortion/losses should be minimised. To mitigate the effect of these errors on the perceptual 3D video quality, error resilience video coding needs to be investigated further to offer better quality of experience (QoE) to end users. This research work aims at enhancing the error resilience capability of compressed 3D video, when transmitted over mobile channels, using Multiple Description Coding (MDC) in order to improve better user’s quality of experience (QoE). Furthermore, this thesis examines the sensitivity of the human visual system (HVS) when employed to view 3D video scenes. The approach used in this study is to use subjective testing in order to rate people’s perception of 3D video under error free and error prone conditions through the use of a carefully designed bespoke questionnaire.EThOS - Electronic Theses Online ServicePetroleum Technology Development Fund (PTDF)GBUnited Kingdo

    Primitives and design of the intelligent pixel multimedia communicator

    Get PDF
    Communication systems arc an ever more essential component of our modern global society. Mobile communications systems are still in a state of rapid advancement and growth. Technology is constantly evolving at a rapid pace in ever more diverse areas and the emerging mobile multimedia based communication systems offer new challenges for both current and future technologies. To realise the full potential of mobile multimedia communication systems there is a need to explore new options to solve some of the fundamental problems facing the technology. In particular, the complexity of such a system within an infrastructure framework that is inherently limited by its power sources and has very restricted transmission bandwidth demands new methodologies and approaches
    • …
    corecore