Search CORE

756 research outputs found

Motion compensation and very low bit rate video coding

Author: Lin Shu
Publication venue: Digital Commons @ NJIT
Publication date: 31/05/1997
Field of study

Recently, many activities of the International Telecommunication Union (ITU) and the International Standard Organization (ISO) are leading to define new standards for very low bit-rate video coding, such as H.263 and MPEG-4 after successful applications of the international standards H.261 and MPEG-1/2 for video coding above 64kbps. However, at very low bit-rate the classic block matching based DCT video coding scheme suffers seriously from blocking artifacts which degrade the quality of reconstructed video frames considerably. To solve this problem, a new technique in which motion compensation is based on dense motion field is presented in this dissertation. Four efficient new video coding algorithms based on this new technique for very low bit-rate are proposed. (1) After studying model-based video coding algorithms, we propose an optical flow based video coding algorithm with thresh-olding techniques. A statistic model is established for distribution of intensity difference between two successive frames, and four thresholds are used to control the bit-rate and the quality of reconstructed frames. It outperforms the typical model-based techniques in terms of complexity and quality of reconstructed frames. (2) An efficient algorithm using DCT coded optical flow. It is found that dense motion fields can be modeled as the first order auto-regressive model, and efficiently compressed with DCT technique, hence achieving very low bit-rate and higher visual quality than the H.263/TMN5. (3) A region-based discrete wavelet transform video coding algorithm. This algorithm implements dense motion field and regions are segmented according to their content significance. The DWT is applied to residual images region by region, and bits are adaptively allocated to regions. It improves the visual quality and PSNR of significant regions while maintaining low bit-rate. (4) A segmentation-based video coding algorithm for stereo sequence. A correlation-feedback algorithm with Kalman filter is utilized to improve the accuracy of optical flow fields. Three criteria, which are associated with 3-D information, 2-D connectivity and motion vector fields, respectively, are defined for object segmentation. A chain code is utilized to code the shapes of the segmented objects. it can achieve very high compression ratio up to several thousands

Digital Commons @ New Jersey Institute of Technology (NJIT)

On the design of fast and efficient wavelet image coders with reduced memory usage

Author: Oliver Gil José Salvador
Publication venue: 'Universitat Politecnica de Valencia'
Publication date: 06/05/2008
Field of study

Image compression is of great importance in multimedia systems and applications because it drastically reduces bandwidth requirements for transmission and memory requirements for storage. Although earlier standards for image compression were based on the Discrete Cosine Transform (DCT), a recently developed mathematical technique, called Discrete Wavelet Transform (DWT), has been found to be more efficient for image coding. Despite improvements in compression efficiency, wavelet image coders significantly increase memory usage and complexity when compared with DCT-based coders. A major reason for the high memory requirements is that the usual algorithm to compute the wavelet transform requires the entire image to be in memory. Although some proposals reduce the memory usage, they present problems that hinder their implementation. In addition, some wavelet image coders, like SPIHT (which has become a benchmark for wavelet coding), always need to hold the entire image in memory. Regarding the complexity of the coders, SPIHT can be considered quite complex because it performs bit-plane coding with multiple image scans. The wavelet-based JPEG 2000 standard is still more complex because it improves coding efficiency through time-consuming methods, such as an iterative optimization algorithm based on the Lagrange multiplier method, and high-order context modeling. In this thesis, we aim to reduce memory usage and complexity in wavelet-based image coding, while preserving compression efficiency. To this end, a run-length encoder and a tree-based wavelet encoder are proposed. In addition, a new algorithm to efficiently compute the wavelet transform is presented. This algorithm achieves low memory consumption using line-by-line processing, and it employs recursion to automatically place the order in which the wavelet transform is computed, solving some synchronization problems that have not been tackled by previous proposals. The proposed encodeOliver Gil, JS. (2006). On the design of fast and efficient wavelet image coders with reduced memory usage [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/1826Palanci

Crossref

RiuNet

Context-based compression algorithms for text and image data.

Author
Publication venue
Publication date: 01/01/1997
Field of study

Wong Ling.Thesis (M.Phil.)--Chinese University of Hong Kong, 1997.Includes bibliographical references (leaves 80-85).ABSTRACT --- p.1Chapter 1. --- INTRODUCTION --- p.2Chapter 1.1 --- motivation --- p.4Chapter 1.2 --- Original Contributions --- p.5Chapter 1.3 --- thesis Structure --- p.5Chapter 2. --- BACKGROUND --- p.7Chapter 2.1 --- information theory --- p.7Chapter 2.2 --- early compression --- p.8Chapter 2.2.1 --- Some Source Codes --- p.10Chapter 2.2.1.1 --- Huffman Code --- p.10Chapter 2.2.1.2 --- Tutstall Code --- p.10Chapter 2.2.1.3 --- Arithmetic Code --- p.11Chapter 2.3 --- modern techniques for compression --- p.14Chapter 2.3.1 --- Statistical Modeling --- p.14Chapter 2.3.1.1 --- Context Modeling --- p.15Chapter 2.3.1.2 --- State Based Modeling --- p.17Chapter 2.3.2 --- Dictionary Based Compression --- p.17Chapter 2.3.2.1 --- LZ-compression --- p.19Chapter 2.3.3 --- Other Compression Techniques --- p.20Chapter 2.3.3.1 --- Block Sorting --- p.20Chapter 2.3.3.2 --- Context Tree Weighting --- p.21Chapter 3. --- SYMBOL REMAPPING --- p.22Chapter 3. 1 --- reviews on Block Sorting --- p.22Chapter 3.1.1 --- Forward Transformation --- p.23Chapter 3.1.2 --- Inverse Transformation --- p.24Chapter 3.2 --- Ordering Method --- p.25Chapter 3.3 --- discussions --- p.27Chapter 4. --- CONTENT PREDICTION --- p.29Chapter 4.1 --- Prediction and Ranking Schemes --- p.29Chapter 4.1.1 --- Content Predictor --- p.29Chapter 4.1.2 --- Ranking Techn ique --- p.30Chapter 4.2 --- Reviews on Context Sorting --- p.31Chapter 4.2.1 --- Context Sorting basis --- p.31Chapter 4.3 --- General Framework of Content Prediction --- p.31Chapter 4.3.1 --- A Baseline Version --- p.32Chapter 4.3.2 --- Context Length Merge --- p.34Chapter 4.4 --- Discussions --- p.36Chapter 5. --- BOUNDED-LENGTH BLOCK SORTING --- p.38Chapter 5.1 --- block sorting with bounded context length --- p.38Chapter 5.1.1 --- Forward Transformation --- p.38Chapter 5.1.2 --- Reverse Transformation --- p.39Chapter 5.2 --- Locally Adaptive Entropy Coding --- p.43Chapter 5.3 --- discussion --- p.45Chapter 6. --- CONTEXT CODING FOR IMAGE DATA --- p.47Chapter 6.1 --- Digital Images --- p.47Chapter 6.1.1 --- Redundancy --- p.48Chapter 6.2 --- model of a compression system --- p.49Chapter 6.2.1 --- Representation --- p.49Chapter 6.2.2 --- Quantization --- p.50Chapter 6.2.3 --- Lossless coding --- p.51Chapter 6.3 --- The Embedded Zerotree Wavelet Coding --- p.51Chapter 6.3.1 --- Simple Zerotree-like Implementation --- p.53Chapter 6.3.2 --- Analysis of Zerotree Coding --- p.54Chapter 6.3.2.1 --- Linkage between Coefficients --- p.55Chapter 6.3.2.2 --- Design of Uniform Threshold Quantizer with Dead Zone --- p.58Chapter 6.4 --- Extensions on Wavelet Coding --- p.59Chapter 6.4.1 --- Coefficients Scanning --- p.60Chapter 6.5 --- Discussions --- p.61Chapter 7. --- CONCLUSIONS --- p.63Chapter 7.1 --- Future Research --- p.64APPENDIX --- p.65Chapter A --- Lossless Compression Results --- p.65Chapter B --- Image Compression Standards --- p.72Chapter C --- human Visual System Characteristics --- p.75Chapter D --- Lossy Compression Results --- p.76COMPRESSION GALLERY --- p.77Context-based Wavelet Coding --- p.75RD-OPT-based jpeg Compression --- p.76SPIHT Wavelet Compression --- p.77REFERENCES --- p.8

CUHK Digital Repository

Bag-of-words representations for computer audition

Author: Schmitt Maximilian
Publication venue
Publication date: 20/05/2022
Field of study

Computer audition is omnipresent in everyday life, in applications ranging from personalised virtual agents to health care. From a technical point of view, the goal is to robustly classify the content of an audio signal in terms of a defined set of labels, such as, e.g., the acoustic scene, a medical diagnosis, or, in the case of speech, what is said or how it is said. Typical approaches employ machine learning (ML), which means that task-specific models are trained by means of examples. Despite recent successes in neural network-based end-to-end learning, taking the raw audio signal as input, models relying on hand-crafted acoustic features are still superior in some domains, especially for tasks where data is scarce. One major issue is nevertheless that a sequence of acoustic low-level descriptors (LLDs) cannot be fed directly into many ML algorithms as they require a static and fixed-length input. Moreover, also for dynamic classifiers, compressing the information of the LLDs over a temporal block by summarising them can be beneficial. However, the type of instance-level representation has a fundamental impact on the performance of the model. In this thesis, the so-called bag-of-audio-words (BoAW) representation is investigated as an alternative to the standard approach of statistical functionals. BoAW is an unsupervised method of representation learning, inspired from the bag-of-words method in natural language processing, forming a histogram of the terms present in a document. The toolkit openXBOW is introduced, enabling systematic learning and optimisation of these feature representations, unified across arbitrary modalities of numeric or symbolic descriptors. A number of experiments on BoAW are presented and discussed, focussing on a large number of potential applications and corresponding databases, ranging from emotion recognition in speech to medical diagnosis. The evaluations include a comparison of different acoustic LLD sets and configurations of the BoAW generation process. The key findings are that BoAW features are a meaningful alternative to statistical functionals, offering certain benefits, while being able to preserve the advantages of functionals, such as data-independence. Furthermore, it is shown that both representations are complementary and their fusion improves the performance of a machine listening system.Maschinelles Hören ist im täglichen Leben allgegenwärtig, mit Anwendungen, die von personalisierten virtuellen Agenten bis hin zum Gesundheitswesen reichen. Aus technischer Sicht besteht das Ziel darin, den Inhalt eines Audiosignals hinsichtlich einer Auswahl definierter Labels robust zu klassifizieren. Die Labels beschreiben bspw. die akustische Umgebung der Aufnahme, eine medizinische Diagnose oder - im Falle von Sprache - was gesagt wird oder wie es gesagt wird. Übliche Ansätze hierzu verwenden maschinelles Lernen, d.h., es werden anwendungsspezifische Modelle anhand von Beispieldaten trainiert. Trotz jüngster Erfolge beim Ende-zu-Ende-Lernen mittels neuronaler Netze, in welchen das unverarbeitete Audiosignal als Eingabe benutzt wird, sind Modelle, die auf definierten akustischen Merkmalen basieren, in manchen Bereichen weiterhin überlegen. Dies gilt im Besonderen für Einsatzzwecke, für die nur wenige Daten vorhanden sind. Allerdings besteht dabei das Problem, dass Zeitfolgen von akustischen Deskriptoren in viele Algorithmen des maschinellen Lernens nicht direkt eingespeist werden können, da diese eine statische Eingabe fester Länge benötigen. Außerdem kann es auch für dynamische (zeitabhängige) Klassifikatoren vorteilhaft sein, die Deskriptoren über ein gewisses Zeitintervall zusammenzufassen. Jedoch hat die Art der Merkmalsdarstellung einen grundlegenden Einfluss auf die Leistungsfähigkeit des Modells. In der vorliegenden Dissertation wird der sogenannte Bag-of-Audio-Words-Ansatz (BoAW) als Alternative zum Standardansatz der statistischen Funktionale untersucht. BoAW ist eine Methode des unüberwachten Lernens von Merkmalsdarstellungen, die von der Bag-of-Words-Methode in der Computerlinguistik inspiriert wurde, bei der ein Textdokument als Histogramm der vorkommenden Wörter beschrieben wird. Das Toolkit openXBOW wird vorgestellt, welches systematisches Training und Optimierung dieser Merkmalsdarstellungen - vereinheitlicht für beliebige Modalitäten mit numerischen oder symbolischen Deskriptoren - erlaubt. Es werden einige Experimente zum BoAW-Ansatz durchgeführt und diskutiert, die sich auf eine große Zahl möglicher Anwendungen und entsprechende Datensätze beziehen, von der Emotionserkennung in gesprochener Sprache bis zur medizinischen Diagnostik. Die Auswertungen beinhalten einen Vergleich verschiedener akustischer Deskriptoren und Konfigurationen der BoAW-Methode. Die wichtigsten Erkenntnisse sind, dass BoAW-Merkmalsvektoren eine geeignete Alternative zu statistischen Funktionalen darstellen, gewisse Vorzüge bieten und gleichzeitig wichtige Eigenschaften der Funktionale, wie bspw. die Datenunabhängigkeit, erhalten können. Zudem wird gezeigt, dass beide Darstellungen komplementär sind und eine Fusionierung die Leistungsfähigkeit eines Systems des maschinellen Hörens verbessert

OPUS Augsburg

Dimension reduction of image and audio space

Author: Hagstrom Blaine Lee
Publication venue: Digital Scholarship@UNLV
Publication date: 01/01/1999
Field of study

The reduction of data necessary for storage or transmission is a desirable goal in the digital video and audio domain. Compression schemes strive to reduce the amount of storage space or bandwidth necessary to keep or move the data. Data reduction can be accomplished so that visually or audibly unnecessary data is removed or recoded thus aiding the compression phase of the data processing. The characterization and identification of data that can be successfully removed or reduced is the purpose of this work. New philosophy, theory and methods for data processing are presented towards the goal of data reduction. The philosophy and theory developed in this work establish a foundation for high speed data reduction suitable for multi-media applications. The developed methods encompass motion detection and edge detection as features of the systems. The philosophy of energy flow analysis in video processing enables the consideration of noise in digital video data. Research into noise versus motion leads to an efficient and successful method of identifying motion in a sequence. The research of the underlying statistical properties of vector quantization provides an insight into the performance characteristics of vector quantization and leads to successful improvements in application. The underlying statistical properties of the vector quantization process are analyzed and three theorems are developed and proved. The theorems establish the statistical distributions and probability densities of various metrics of the vector quantization process. From these properties, an intelligent and efficient algorithm design is developed and tested. The performance improvements in both time and quality are established through algorithm analysis and empirical testing. The empirical results are presented

University of Nevada, Las Vegas Repository

Digital Multimedia Forensics and Anti-Forensics

Author: Stamm Matthew Christopher
Publication venue
Publication date: 01/01/2012
Field of study

As the use of digital multimedia content such as images and video has increased, so has the means and the incentive to create digital forgeries. Presently, powerful editing software allows forgers to create perceptually convincing digital forgeries. Accordingly, there is a great need for techniques capable of authenticating digital multimedia content. In response to this, researchers have begun developing digital forensic techniques capable of identifying digital forgeries. These forensic techniques operate by detecting imperceptible traces left by editing operations in digital multimedia content. In this dissertation, we propose several new digital forensic techniques to detect evidence of editing in digital multimedia content. We begin by identifying the fingerprints left by pixel value mappings and show how these can be used to detect the use of contrast enhancement in images. We use these fingerprints to perform a number of additional forensic tasks such as identifying cut-and-paste forgeries, detecting the addition of noise to previously JPEG compressed images, and estimating the contrast enhancement mapping used to alter an image. Additionally, we consider the problem of multimedia security from the forger's point of view. We demonstrate that an intelligent forger can design anti-forensic operations to hide editing fingerprints and fool forensic techniques. We propose an anti-forensic technique to remove compression fingerprints from digital images and show that this technique can be used to fool several state-of-the-art forensic algorithms. We examine the problem of detecting frame deletion in digital video and develop both a technique to detect frame deletion and an anti-forensic technique to hide frame deletion fingerprints. We show that this anti-forensic operation leaves behind fingerprints of its own and propose a technique to detect the use of frame deletion anti-forensics. The ability of a forensic investigator to detect both editing and the use of anti-forensics results in a dynamic interplay between the forger and forensic investigator. We use develop a game theoretic framework to analyze this interplay and identify the set of actions that each party will rationally choose. Additionally, we show that anti-forensics can be used protect against reverse engineering. To demonstrate this, we propose an anti-forensic module that can be integrated into digital cameras to protect color interpolation methods

Digital Repository at the University of Maryland

Colour image coding with wavelets and matching pursuit

Author: Maciol Ryszard
Publication venue
Publication date
Field of study

This thesis considers sparse approximation of still images as the basis of a lossy compression system. The Matching Pursuit (MP) algorithm is presented as a method particularly suited for application in lossy scalable image coding. Its multichannel extension, capable of exploiting inter-channel correlations, is found to be an efficient way to represent colour data in RGB colour space. Known problems with MP, high computational complexity of encoding and dictionary design, are tackled by finding an appropriate partitioning of an image. The idea of performing MP in the spatio-frequency domain after transform such as Discrete Wavelet Transform (DWT) is explored. The main challenge, though, is to encode the image representation obtained after MP into a bit-stream. Novel approaches for encoding the atomic decomposition of a signal and colour amplitudes quantisation are proposed and evaluated. The image codec that has been built is capable of competing with scalable coders such as JPEG 2000 and SPIHT in terms of compression ratio

Aston Publications Explorer

A DWT based perceptual video coding framework: concepts, issues and techniques

Author: Mei L
Publication venue: RMIT University
Publication date: 01/01/2008
Field of study

The work in this thesis explore the DWT based video coding by the introduction of a novel DWT (Discrete Wavelet Transform) / MC (Motion Compensation) / DPCM (Differential Pulse Code Modulation) video coding framework, which adopts the EBCOT as the coding engine for both the intra- and the inter-frame coder. The adaptive switching mechanism between the frame/field coding modes is investigated for this coding framework. The Low-Band-Shift (LBS) is employed for the MC in the DWT domain. The LBS based MC is proven to provide consistent improvement on the Peak Signal-to-Noise Ratio (PSNR) of the coded video over the simple Wavelet Tree (WT) based MC. The Adaptive Arithmetic Coding (AAC) is adopted to code the motion information. The context set of the Adaptive Binary Arithmetic Coding (ABAC) for the inter-frame data is redesigned based on the statistical analysis. To further improve the perceived picture quality, a Perceptual Distortion Measure (PDM) based on human vision model is used for the EBCOT of the intra-frame coder. A visibility assessment of the quantization error of various subbands in the DWT domain is performed through subjective tests. In summary, all these findings have solved the issues originated from the proposed perceptual video coding framework. They include: a working DWT/MC/DPCM video coding framework with superior coding efficiency on sequences with translational or head-shoulder motion; an adaptive switching mechanism between frame and field coding mode; an effective LBS based MC scheme in the DWT domain; a methodology of the context design for entropy coding of the inter-frame data; a PDM which replaces the MSE inside the EBCOT coding engine for the intra-frame coder, which provides improvement on the perceived quality of intra-frames; a visibility assessment to the quantization errors in the DWT domain

RMIT Research Repository

Energy efficient hardware acceleration of multimedia processing tools

Author: Kinane Andrew
Publication venue: Dublin City University. School of Electronic Engineering
Publication date: 01/01/2006
Field of study

The world of mobile devices is experiencing an ongoing trend of feature enhancement and generalpurpose multimedia platform convergence. This trend poses many grand challenges, the most pressing being their limited battery life as a consequence of delivering computationally demanding features. The envisaged mobile application features can be considered to be accelerated by a set of underpinning hardware blocks Based on the survey that this thesis presents on modem video compression standards and their associated enabling technologies, it is concluded that tight energy and throughput constraints can still be effectively tackled at algorithmic level in order to design re-usable optimised hardware acceleration cores. To prove these conclusions, the work m this thesis is focused on two of the basic enabling technologies that support mobile video applications, namely the Shape Adaptive Discrete Cosine Transform (SA-DCT) and its inverse, the SA-IDCT. The hardware architectures presented in this work have been designed with energy efficiency in mind. This goal is achieved by employing high level techniques such as redundant computation elimination, parallelism and low switching computation structures. Both architectures compare favourably against the relevant pnor art in the literature. The SA-DCT/IDCT technologies are instances of a more general computation - namely, both are Constant Matrix Multiplication (CMM) operations. Thus, this thesis also proposes an algorithm for the efficient hardware design of any general CMM-based enabling technology. The proposed algorithm leverages the effective solution search capability of genetic programming. A bonus feature of the proposed modelling approach is that it is further amenable to hardware acceleration. Another bonus feature is an early exit mechanism that achieves large search space reductions .Results show an improvement on state of the art algorithms with future potential for even greater savings

Irish Universities

DCU Online Research Access Service