419 research outputs found

    A Cost Shared Quantization Algorithm and its Implementation for Multi-Standard Video CODECS

    Get PDF
    The current trend of digital convergence creates the need for the video encoder and decoder system, known as codec in short, that should support multiple video standards on a single platform. In a modern video codec, quantization is a key unit used for video compression. In this thesis, a generalized quantization algorithm and hardware implementation is presented to compute quantized coefficient for six different video codecs including the new developing codec High Efficiency Video Coding (HEVC). HEVC, successor to H.264/MPEG-4 AVC, aims to substantially improve coding efficiency compared to AVC High Profile. The thesis presents a high performance circuit shared architecture that can perform the quantization operation for HEVC, H.264/AVC, AVS, VC-1, MPEG- 2/4 and Motion JPEG (MJPEG). Since HEVC is still in drafting stage, the architecture was designed in such a way that any final changes can be accommodated into the design. The proposed quantizer architecture is completely division free as the division operation is replaced by multiplication, shift and addition operations. The design was implemented on FPGA and later synthesized in CMOS 0.18 μm technology. The results show that the proposed design satisfies the requirement of all codecs with a maximum decoding capability of 60 fps at 187.3 MHz for Xilinx Virtex4 LX60 FPGA of a 1080p HD video. The scheme is also suitable for low-cost implementation in modern multi-codec systems

    Network streaming and compression for mixed reality tele-immersion

    Get PDF
    Bulterman, D.C.A. [Promotor]Cesar, P.S. [Copromotor

    Distributed video coding for wireless video sensor networks: a review of the state-of-the-art architectures

    Get PDF
    Distributed video coding (DVC) is a relatively new video coding architecture originated from two fundamental theorems namely, Slepian–Wolf and Wyner–Ziv. Recent research developments have made DVC attractive for applications in the emerging domain of wireless video sensor networks (WVSNs). This paper reviews the state-of-the-art DVC architectures with a focus on understanding their opportunities and gaps in addressing the operational requirements and application needs of WVSNs

    Scalable Speech Coding for IP Networks

    Get PDF
    The emergence of Voice over Internet Protocol (VoIP) has posed new challenges to the development of speech codecs. The key issue of transporting real-time voice packet over IP networks is the lack of guarantee for reasonable speech quality due to packet delay or loss. Most of the widely used narrowband codecs depend on the Code Excited Linear Prediction (CELP) coding technique. The CELP technique utilizes the long-term prediction across the frame boundaries and therefore causes error propagation in the case of packet loss and need to transmit redundant information in order to mitigate the problem. The internet Low Bit-rate Codec (iLBC) employs the frame-independent coding and therefore inherently possesses high robustness to packet loss. However, the original iLBC lacks in some of the key features of speech codecs for IP networks: Rate flexibility, Scalability, and Wideband support. This dissertation presents novel scalable narrowband and wideband speech codecs for IP networks using the frame independent coding scheme based on the iLBC. The rate flexibility is added to the iLBC by employing the discrete cosine transform (DCT) and iii the scalable algebraic vector quantization (AVQ) and by allocating different number of bits to the AVQ. The bit-rate scalability is obtained by adding the enhancement layer to the core layer of the multi-rate iLBC. The enhancement layer encodes the weighted iLBC coding error in the modified DCT (MDCT) domain. The proposed wideband codec employs the bandwidth extension technique to extend the capabilities of existing narrowband codecs to provide wideband coding functionality. The wavelet transform is also used to further enhance the performance of the proposed codec. The performance evaluation results show that the proposed codec provides high robustness to packet loss and achieves equivalent or higher speech quality than state-of-the-art codecs under the clean channel condition

    Resource-Constrained Low-Complexity Video Coding for Wireless Transmission

    Get PDF

    Audio Compression using a Modified Vector Quantization algorithm for Mastering Applications

    Get PDF
    Audio data compression is used to reduce the transmission bandwidth and storage requirements of audio data. It is the second stage in the audio mastering process with audio equalization being the first stage. Compression algorithms such as BSAC, MP3 and AAC are used as standards in this paper. The challenge faced in audio compression is compressing the signal at low bit rates. The previous algorithms which work well at low bit rates cannot be dominant at higher bit rates and vice-versa. This paper proposes an altered form of vector quantization algorithm which produces a scalable bit stream which has a number of fine layers of audio fidelity. This modified form of the vector quantization algorithm is used to generate a perceptually audio coder which is scalable and uses the quantization and encoding stages which are responsible for the psychoacoustic and arithmetical terminations that are actually detached as practically all the data detached during the prediction phases at the encoder side is supplemented towards the audio signal at decoder stage. Therefore, clearly the quantization phase which is modified to produce a bit stream which is scalable. This modified algorithm works well at both lower and higher bit rates. Subjective evaluations were done by audio professionals using the MUSHRA test and the mean normalized scores at various bit rates was noted and compared with the previous algorithms

    A NOVEL JOINT PERCEPTUAL ENCRYPTION AND WATERMARKING SCHEME (JPEW) WITHIN JPEG FRAMEWORK

    Get PDF
    Due to the rapid growth in internet and multimedia technologies, many new commercial applications like video on demand (VOD), pay-per-view and real-time multimedia broadcast etc, have emerged. To ensure the integrity and confidentiality of the multimedia content, the content is usually watermarked and then encrypted or vice versa. If the multimedia content needs to be watermarked and encrypted at the same time, the watermarking function needs to be performed first followed by encryption function. Hence, if the watermark needs to be extracted then the multimedia data needs to be decrypted first followed by extraction of the watermark. This results in large computational overhead. The solution provided in the literature for this problem is by using what is called partial encryption, in which media data are partitioned into two parts - one to be watermarked and the other is encrypted. In addition, some multimedia applications i.e. video on demand (VOD), Pay-TV, pay-per-view etc, allow multimedia content preview which involves „perceptual‟ encryption wherein all or some selected part of the content is, perceptually speaking, distorted with an encryption key. Up till now no joint perceptual encryption and watermarking scheme has been proposed in the literature. In this thesis, a novel Joint Perceptual Encryption and Watermarking (JPEW) scheme is proposed that is integrated within JPEG standard. The design of JPEW involves the design and development of both perceptual encryption and watermarking schemes that are integrated in JPEG and feasible within the „partial‟ encryption framework. The perceptual encryption scheme exploits the energy distribution of AC components and DC components bitplanes of continuous-tone images and is carried out by selectively encrypting these AC coefficients and DC components bitplanes. The encryption itself is based on a chaos-based permutation reported in an earlier work. Similarly, in contrast to the traditional watermarking schemes, the proposed watermarking scheme makes use of DC component of the image and it is carried out by selectively substituting certain bitplanes of DC components with watermark bits. vi ii Apart from the aforesaid JPEW, additional perceptual encryption scheme, integrated in JPEG, has also been proposed. The scheme is outside of joint framework and implements perceptual encryption on region of interest (ROI) by scrambling the DCT blocks of the chosen ROI. The performances of both, perceptual encryption and watermarking schemes are evaluated and compared with Quantization Index modulation (QIM) based watermarking scheme and reversible Histogram Spreading (RHS) based perceptual encryption scheme. The results show that the proposed watermarking scheme is imperceptible and robust, and suitable for authentication. Similarly, the proposed perceptual encryption scheme outperforms the RHS based scheme in terms of number of operations required to achieve a given level of perceptual encryption and provides control over the amount of perceptual encryption. The overall security of the JPEW has also been evaluated. Additionally, the performance of proposed separate perceptual encryption scheme has been thoroughly evaluated in terms of security and compression efficiency. The scheme is found to be simpler in implementation, have insignificant effect on compression ratios and provide more options for the selection of control factor

    Sparse representation for audio noise removal using zero-zone quantizers

    Get PDF
    In zero zone quantization, bins around zero are quantized to a zero value. This kind of quantization can be applied on orthogonal transforms to remove the unwanted or redundant signal. Transforms reveal structures and properties of a signal and hence careful application of a zero zone over the transform coefficients leads to noise removal. In this thesis, such quantizers are applied over Discrete Fourier Transform and Karhunen Loeve Transform coefficients separately, and outputs compared. Further, the localization of the zero zones to certain frequencies leads to better performance in terms of noise removal. PEAQ (Perceptual Evaluation of Audio Quality) scores have been used to measure the objective quality of the denoised signal

    Apprentissage automatique pour le codage cognitif de la parole

    Get PDF
    Depuis les années 80, les codecs vocaux reposent sur des stratégies de codage à court terme qui fonctionnent au niveau de la sous-trame ou de la trame (généralement 5 à 20 ms). Les chercheurs ont essentiellement ajusté et combiné un nombre limité de technologies disponibles (transformation, prédiction linéaire, quantification) et de stratégies (suivi de forme d'onde, mise en forme du bruit) pour construire des architectures de codage de plus en plus complexes. Dans cette thèse, plutôt que de s'appuyer sur des stratégies de codage à court terme, nous développons un cadre alternatif pour la compression de la parole en codant les attributs de la parole qui sont des caractéristiques perceptuellement importantes des signaux vocaux. Afin d'atteindre cet objectif, nous résolvons trois problèmes de complexité croissante, à savoir la classification, la prédiction et l'apprentissage des représentations. La classification est un élément courant dans les conceptions de codecs modernes. Dans un premier temps, nous concevons un classifieur pour identifier les émotions, qui sont parmi les attributs à long terme les plus complexes de la parole. Dans une deuxième étape, nous concevons un prédicteur d'échantillon de parole, qui est un autre élément commun dans les conceptions de codecs modernes, pour mettre en évidence les avantages du traitement du signal de parole à long terme et non linéaire. Ensuite, nous explorons les variables latentes, un espace de représentations de la parole, pour coder les attributs de la parole à court et à long terme. Enfin, nous proposons un réseau décodeur pour synthétiser les signaux de parole à partir de ces représentations, ce qui constitue notre dernière étape vers la construction d'une méthode complète de compression de la parole basée sur l'apprentissage automatique de bout en bout. Bien que chaque étape de développement proposée dans cette thèse puisse faire partie d'un codec à elle seule, chaque étape fournit également des informations et une base pour la prochaine étape de développement jusqu'à ce qu'un codec entièrement basé sur l'apprentissage automatique soit atteint. Les deux premières étapes, la classification et la prédiction, fournissent de nouveaux outils qui pourraient remplacer et améliorer des éléments des codecs existants. Dans la première étape, nous utilisons une combinaison de modèle source-filtre et de machine à état liquide (LSM), pour démontrer que les caractéristiques liées aux émotions peuvent être facilement extraites et classées à l'aide d'un simple classificateur. Dans la deuxième étape, un seul réseau de bout en bout utilisant une longue mémoire à court terme (LSTM) est utilisé pour produire des trames vocales avec une qualité subjective élevée pour les applications de masquage de perte de paquets (PLC). Dans les dernières étapes, nous nous appuyons sur les résultats des étapes précédentes pour concevoir un codec entièrement basé sur l'apprentissage automatique. un réseau d'encodage, formulé à l'aide d'un réseau neuronal profond (DNN) et entraîné sur plusieurs bases de données publiques, extrait et encode les représentations de la parole en utilisant la prédiction dans un espace latent. Une approche d'apprentissage non supervisé basée sur plusieurs principes de cognition est proposée pour extraire des représentations à partir de trames de parole courtes et longues en utilisant l'information mutuelle et la perte contrastive. La capacité de ces représentations apprises à capturer divers attributs de la parole à court et à long terme est démontrée. Enfin, une structure de décodage est proposée pour synthétiser des signaux de parole à partir de ces représentations. L'entraînement contradictoire est utilisé comme une approximation des mesures subjectives de la qualité de la parole afin de synthétiser des échantillons de parole à consonance naturelle. La haute qualité perceptuelle de la parole synthétisée ainsi obtenue prouve que les représentations extraites sont efficaces pour préserver toutes sortes d'attributs de la parole et donc qu'une méthode de compression complète est démontrée avec l'approche proposée.Abstract: Since the 80s, speech codecs have relied on short-term coding strategies that operate at the subframe or frame level (typically 5 to 20ms). Researchers essentially adjusted and combined a limited number of available technologies (transform, linear prediction, quantization) and strategies (waveform matching, noise shaping) to build increasingly complex coding architectures. In this thesis, rather than relying on short-term coding strategies, we develop an alternative framework for speech compression by encoding speech attributes that are perceptually important characteristics of speech signals. In order to achieve this objective, we solve three problems of increasing complexity, namely classification, prediction and representation learning. Classification is a common element in modern codec designs. In a first step, we design a classifier to identify emotions, which are among the most complex long-term speech attributes. In a second step, we design a speech sample predictor, which is another common element in modern codec designs, to highlight the benefits of long-term and non-linear speech signal processing. Then, we explore latent variables, a space of speech representations, to encode both short-term and long-term speech attributes. Lastly, we propose a decoder network to synthesize speech signals from these representations, which constitutes our final step towards building a complete, end-to-end machine-learning based speech compression method. The first two steps, classification and prediction, provide new tools that could replace and improve elements of existing codecs. In the first step, we use a combination of source-filter model and liquid state machine (LSM), to demonstrate that features related to emotions can be easily extracted and classified using a simple classifier. In the second step, a single end-to-end network using long short-term memory (LSTM) is shown to produce speech frames with high subjective quality for packet loss concealment (PLC) applications. In the last steps, we build upon the results of previous steps to design a fully machine learning-based codec. An encoder network, formulated using a deep neural network (DNN) and trained on multiple public databases, extracts and encodes speech representations using prediction in a latent space. An unsupervised learning approach based on several principles of cognition is proposed to extract representations from both short and long frames of data using mutual information and contrastive loss. The ability of these learned representations to capture various short- and long-term speech attributes is demonstrated. Finally, a decoder structure is proposed to synthesize speech signals from these representations. Adversarial training is used as an approximation to subjective speech quality measures in order to synthesize natural-sounding speech samples. The high perceptual quality of synthesized speech thus achieved proves that the extracted representations are efficient at preserving all sorts of speech attributes and therefore that a complete compression method is demonstrated with the proposed approach
    corecore