126 research outputs found
Improved Image Partitioning for Compression and Representation using the Lab Color Space in the LAR Image Codec
International audienceThe LAR codec is an advanced image compression method relying on a quadtree partitioning of the image. The partitioning strongly impacts the LAR codec efficiency and enables both compression and representation efficiency. In order to increase the perceptual representation abilities without penalizing the compression efficiency we introduce and evaluate two partitioning criteria working in the Lab color space. These criteria are confronted to the original criterion and their compression and robustness performances are analyzed
3D coding tools final report
Livrable D4.3 du projet ANR PERSEECe rapport a été réalisé dans le cadre du projet ANR PERSEE (n° ANR-09-BLAN-0170). Exactement il correspond au livrable D4.3 du projet. Son titre : 3D coding tools final repor
A family of stereoscopic image compression algorithms using wavelet transforms
With the standardization of JPEG-2000, wavelet-based image and video
compression technologies are gradually replacing the popular DCT-based methods. In
parallel to this, recent developments in autostereoscopic display technology is now
threatening to revolutionize the way in which consumers are used to enjoying the
traditional 2-D display based electronic media such as television, computer and
movies. However, due to the two-fold bandwidth/storage space requirement of
stereoscopic imaging, an essential requirement of a stereo imaging system is efficient
data compression.
In this thesis, seven wavelet-based stereo image compression algorithms are
proposed, to take advantage of the higher data compaction capability and better
flexibility of wavelets. [Continues.
Recent Advances in Signal Processing
The signal processing task is a very critical issue in the majority of new technological inventions and challenges in a variety of applications in both science and engineering fields. Classical signal processing techniques have largely worked with mathematical models that are linear, local, stationary, and Gaussian. They have always favored closed-form tractability over real-world accuracy. These constraints were imposed by the lack of powerful computing tools. During the last few decades, signal processing theories, developments, and applications have matured rapidly and now include tools from many areas of mathematics, computer science, physics, and engineering. This book is targeted primarily toward both students and researchers who want to be exposed to a wide variety of signal processing techniques and algorithms. It includes 27 chapters that can be categorized into five different areas depending on the application at hand. These five categories are ordered to address image processing, speech processing, communication systems, time-series analysis, and educational packages respectively. The book has the advantage of providing a collection of applications that are completely independent and self-contained; thus, the interested reader can choose any chapter and skip to another without losing continuity
Recommended from our members
Automatic assessment and enhancement of streaming video quality under bandwidth and dynamic range limitations
The explosion in the amount of video content being streamed over the internet in recent years has accelerated the demand for effective and efficient methods for assessing and improving the perceptual quality of images and videos while adhering to internet bandwidth and display dynamic range limitations. Objective models of perceptual quality have found extensive use in optimizing video compression and enhancement parameters to achieve desirable streaming fidelity. In this dissertation, we develop a variety of quality modeling and quality enhancement methods targeting the streaming of standard and high dynamic range (SDR/HDR) videos over the internet, subjected to compression and tone mapping. The Visual Multimethod Assessment Fusion (VMAF) algorithm has recently emerged as a state-of-the-art approach to video quality prediction, that now pervades the streaming and social media industry. However, since VMAF requires the evaluation of a heterogeneous set of quality models, it is computationally expensive. Given other advances in hardware-accelerated encoding, quality assessment is emerging as a significant bottleneck in video compression pipelines. Towards alleviating this burden, we first propose a novel Fusion of Unified Quality Evaluators (FUNQUE) framework, by enabling computation sharing and by using a transform that is sensitive to visual perception to boost accuracy. Further, we expand the FUNQUE framework to define a collection of improved low-complexity fused-feature models that advance the state-of-the-art of video quality performance with respect to both accuracy, by 4.2\% to 5.3\%, and computational efficiency, by factors of 3.8 to 11 times! High Dynamic Range (HDR) videos are able to represent wider ranges of contrasts and colors than Standard Dynamic Range (SDR) videos, giving more vivid experiences. Due to this, HDR videos are expected to grow into the dominant video modality of the future. However, HDR videos are incompatible with existing SDR displays, which form the majority of affordable consumer displays on the market. Because of this, HDR videos must be processed by tone-mapping them to reduced bit-depths to service a broad swath of SDR-limited video consumers. Here, we analyzed the impact of tone-mapping operators on the visual quality of streaming HDR videos by building the first large-scale subjectively annotated open-source database of compressed tone-mapped HDR videos, containing 15,000 tone-mapped sequences derived from 40 unique HDR source contents. The videos in the database were labeled with more than 750,000 subjective quality annotations, collected from more than 1,600 unique human observers. We envision that the new LIVE Tone-Mapped HDR (LIVE-TMHDR) database will enable significant progress on HDR video tone mapping and quality assessment in the future. To this end, we make the database freely available to the community at https://live.ece.utexas.edu/research/LIVE_TMHDR/index.html. Server-side tone-mapping involves automating decisions regarding the choices of tone-mapping operators (TMOs) and their parameters to yield high-fidelity outputs. Moreover, these choices must be balanced against the effects of lossy compression, which is ubiquitous in streaming scenarios. To automate this process, we developed a novel, efficient model of objective video quality named Cut-FUNQUE that is able to accurately predict the visual quality of tone-mapped and compressed HDR videos. By evaluating Cut-FUNQUE on the LIVE-TMHDR database, we show that it achieves state-of-the-art accuracy. Finally, the deep learning revolution has strongly impacted low-level image processing tasks such as style/domain transfer, enhancement/restoration, and visual quality assessments. Despite often being treated separately, the aforementioned tasks share a common theme of understanding, editing, or enhancing the appearance of input images without modifying the underlying content. We leverage this observation to develop a novel disentangled representation learning method that decomposes inputs into content and appearance features. The model is trained in a self-supervised manner and we use the learned features to develop a new quality prediction model named DisQUE. We demonstrate through extensive evaluations that DisQUE achieves state-of-the-art accuracy across quality prediction tasks and distortion types. Moreover, we demonstrate that the same features may also be used for image processing tasks such as HDR tone mapping, where the desired output characteristics may be tuned using example input-output pairs.Electrical and Computer Engineerin
Tele-immersive display with live-streamed video.
Tang Wai-Kwan.Thesis (M.Phil.)--Chinese University of Hong Kong, 2001.Includes bibliographical references (leaves 88-95).Abstracts in English and Chinese.Abstract --- p.iAcknowledgement --- p.iiiChapter 1 --- Introduction --- p.1Chapter 1.1 --- Applications --- p.3Chapter 1.2 --- Motivation and Goal --- p.6Chapter 1.3 --- Thesis Outline --- p.7Chapter 2 --- Background and Related Work --- p.8Chapter 2.1 --- Panoramic Image Navigation --- p.8Chapter 2.2 --- Image Mosaicing --- p.9Chapter 2.2.1 --- Image Registration --- p.10Chapter 2.2.2 --- Image Composition --- p.12Chapter 2.3 --- Immersive Display --- p.13Chapter 2.4 --- Video Streaming --- p.14Chapter 2.4.1 --- Video Coding --- p.15Chapter 2.4.2 --- Transport Protocol --- p.18Chapter 3 --- System Design --- p.19Chapter 3.1 --- System Architecture --- p.19Chapter 3.1.1 --- Video Capture Module --- p.19Chapter 3.1.2 --- Video Streaming Module --- p.23Chapter 3.1.3 --- Stitching and Rendering Module --- p.24Chapter 3.1.4 --- Display Module --- p.24Chapter 3.2 --- Design Issues --- p.25Chapter 3.2.1 --- Modular Design --- p.25Chapter 3.2.2 --- Scalability --- p.26Chapter 3.2.3 --- Workload distribution --- p.26Chapter 4 --- Panoramic Video Mosaic --- p.28Chapter 4.1 --- Video Mosaic to Image Mosaic --- p.28Chapter 4.1.1 --- Assumptions --- p.29Chapter 4.1.2 --- Processing Pipeline --- p.30Chapter 4.2 --- Camera Calibration --- p.33Chapter 4.2.1 --- Perspective Projection --- p.33Chapter 4.2.2 --- Distortion --- p.36Chapter 4.2.3 --- Calibration Procedure --- p.37Chapter 4.3 --- Panorama Generation --- p.39Chapter 4.3.1 --- Cylindrical and Spherical Panoramas --- p.39Chapter 4.3.2 --- Homography --- p.41Chapter 4.3.3 --- Homography Computation --- p.42Chapter 4.3.4 --- Error Minimization --- p.44Chapter 4.3.5 --- Stitching Multiple Images --- p.46Chapter 4.3.6 --- Seamless Composition --- p.47Chapter 4.4 --- Image Mosaic to Video Mosaic --- p.49Chapter 4.4.1 --- Varying Intensity --- p.49Chapter 4.4.2 --- Video Frame Management --- p.50Chapter 5 --- Immersive Display --- p.52Chapter 5.1 --- Human Perception System --- p.52Chapter 5.2 --- Creating Virtual Scene --- p.53Chapter 5.3 --- VisionStation --- p.54Chapter 5.3.1 --- F-Theta Lens --- p.55Chapter 5.3.2 --- VisionStation Geometry --- p.56Chapter 5.3.3 --- Sweet Spot Relocation and Projection --- p.57Chapter 5.3.4 --- Sweet Spot Relocation in Vector Representation --- p.61Chapter 6 --- Video Streaming --- p.65Chapter 6.1 --- Video Compression --- p.66Chapter 6.2 --- Transport Protocol --- p.66Chapter 6.3 --- Latency and Jitter Control --- p.67Chapter 6.4 --- Synchronization --- p.70Chapter 7 --- Implementation and Results --- p.71Chapter 7.1 --- Video Capture --- p.71Chapter 7.2 --- Video Streaming --- p.73Chapter 7.2.1 --- Video Encoding --- p.73Chapter 7.2.2 --- Streaming Protocol --- p.75Chapter 7.3 --- Implementation Results --- p.76Chapter 7.3.1 --- Indoor Scene --- p.76Chapter 7.3.2 --- Outdoor Scene --- p.78Chapter 7.4 --- Evaluation --- p.78Chapter 8 --- Conclusion --- p.83Chapter 8.1 --- Summary --- p.83Chapter 8.2 --- Future Directions --- p.84Chapter A --- Parallax --- p.8
Description-driven Adaptation of Media Resources
The current multimedia landscape is characterized by a significant diversity in terms of available media formats, network technologies, and device properties. This heterogeneity has resulted in a number of new challenges, such as providing universal access to multimedia content. A solution for this diversity is the use of scalable bit streams, as well as the deployment of a complementary system that is capable of adapting scalable bit streams to the constraints imposed by a particular usage environment (e.g., the limited screen resolution of a mobile device). This dissertation investigates the use of an XML-driven (Extensible Markup Language) framework for the format-independent adaptation of scalable bit streams. Using this approach, the structure of a bit stream is first translated into an XML description. In a next step, the resulting XML description is transformed to reflect a desired adaptation of the bit stream. Finally, the transformed XML description is used to create an adapted bit stream that is suited for playback in the targeted usage environment. The main contribution of this dissertation is BFlavor, a new tool for exposing the syntax of binary media resources as an XML description. Its development was inspired by two other technologies, i.e. MPEG-21 BSDL (Bitstream Syntax Description Language) and XFlavor (Formal Language for Audio-Visual Object Representation, extended with XML features). Although created from a different point of view, both languages offer solutions for translating the syntax of a media resource into an XML representation for further processing. BFlavor (BSDL+XFlavor) harmonizes the two technologies by combining their strengths and eliminating their weaknesses. The expressive power and performance of a BFlavor-based content adaptation chain, compared to tool chains entirely based on either BSDL or XFlavor, were investigated by several experiments. One series of experiments targeted the exploitation of multi-layered temporal scalability in H.264/AVC, paying particular attention to the use of sub-sequences and hierarchical coding patterns, as well as to the use of metadata messages to communicate the bit stream structure to the adaptation logic. BFlavor was the only tool to offer an elegant and practical solution for XML-driven adaptation of H.264/AVC bit streams in the temporal domain
Attention Driven Solutions for Robust Digital Watermarking Within Media
As digital technologies have dramatically expanded within the last decade, content recognition now plays a major role within the control of media. Of the current recent systems available, digital watermarking provides a robust maintainable solution to enhance media security. The two main properties of digital watermarking, imperceptibility and robustness, are complimentary to each other but by employing visual attention based mechanisms within the watermarking framework, highly robust watermarking solutions are obtainable while also maintaining high media quality. This thesis firstly provides suitable bottom-up saliency models for raw image and video. The image and video saliency algorithms are estimated directly from within the wavelet domain for enhanced compatibility with the watermarking framework. By combining colour, orientation and intensity contrasts for the image model and globally compensated object motion in the video model, novel wavelet-based visual saliency algorithms are provided. The work extends these saliency models into a unique visual attention-based watermarking scheme by increasing the watermark weighting parameter within visually uninteresting regions. An increased watermark robustness, up to 40%, against various filtering attacks, JPEG2000 and H.264/AVC compression is obtained while maintaining the media quality, verified by various objective and subjective evaluation tools. As most video sequences are stored in an encoded format, this thesis studies watermarking schemes within the compressed domain. Firstly, the work provides a compressed domain saliency model formulated directly within the HEVC codec, utilizing various coding decisions such as block partition size, residual magnitude, intra frame angular prediction mode and motion vector difference magnitude. Large computational savings, of 50% or greater, are obtained compared with existing methodologies, as the saliency maps are generated from partially decoded bitstreams. Finally, the saliency maps formulated within the compressed HEVC domain are studied within the watermarking framework. A joint encoder and a
frame domain watermarking scheme are both proposed by embedding data into the quantised transform residual data or wavelet coefficients, respectively, which exhibit low visual salience
Progressively communicating rich telemetry from autonomous underwater vehicles via relays
Submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy at the Massachusetts Institute of Technology and the Woods Hole Oceanographic Institution June 2012As analysis of imagery and environmental data plays a greater role in mission construction
and execution, there is an increasing need for autonomous marine vehicles
to transmit this data to the surface. Without access to the data acquired by a
vehicle, surface operators cannot fully understand the state of the mission. Communicating
imagery and high-resolution sensor readings to surface observers remains
a significant challenge â as a result, current telemetry from free-roaming
autonomous marine vehicles remains limited to âheartbeatâ status messages, with
minimal scientific data available until after recovery. Increasing the challenge, longdistance
communication may require relaying data across multiple acoustic hops
between vehicles, yet fixed infrastructure is not always appropriate or possible.
In this thesis I present an analysis of the unique considerations facing telemetry
systems for free-roaming Autonomous Underwater Vehicles (AUVs) used in exploration.
These considerations include high-cost vehicle nodes with persistent storage
and significant computation capabilities, combined with human surface operators
monitoring each node. I then propose mechanisms for interactive, progressive
communication of data across multiple acoustic hops. These mechanisms include
wavelet-based embedded coding methods, and a novel image compression scheme
based on texture classification and synthesis. The specific characteristics of underwater
communication channels, including high latency, intermittent communication,
the lack of instantaneous end-to-end connectivity, and a broadcast medium,
inform these proposals. Human feedback is incorporated by allowing operators to
identify segments of data thatwarrant higher quality refinement, ensuring efficient
use of limited throughput. I then analyze the performance of these mechanisms
relative to current practices.
Finally, I present CAPTURE, a telemetry architecture that builds on this analysis.
CAPTURE draws on advances in compression and delay tolerant networking to
enable progressive transmission of scientific data, including imagery, across multiple acoustic hops. In concert with a physical layer, CAPTURE provides an endto-
end networking solution for communicating science data from autonomous marine
vehicles. Automatically selected imagery, sonar, and time-series sensor data
are progressively transmitted across multiple hops to surface operators. Human
operators can request arbitrarily high-quality refinement of any resource, up to an
error-free reconstruction. The components of this system are then demonstrated
through three field trials in diverse environments on SeaBED, OceanServer and
Bluefin AUVs, each in different software architectures.Thanks to the National Science Foundation, and the
National Oceanic and Atmospheric Administration for
their funding of my education and this work
- âŠ