209 research outputs found
Light field image compression
Light field imaging based on a single-tier camera equipped with a micro-lens array has currently risen up as a practical and prospective approach for future visual applications and services. However, successfully deploying actual light field imaging applications and services will require identifying adequate coding solutions to efficiently handle the massive amount of data involved in these systems. In this context, this chapter presents some of the most recent light field image coding solutions that have been investigated. After a brief review of the current state of the art in image coding formats for light field photography, an experimental study of the rate-distortion performance for different coding formats and architectures is presented. Then, aiming at enabling faster deployment of light field applications and services in the consumer market, a scalable light field coding solution that provides backward compatibility with legacy display devices (e.g., 2D, 3D stereo, and 3D multiview) is also presented. Furthermore, a light field coding scheme based on a sparse set of microimages and the associated blockwise disparity is also presented. This coding scheme is scalable with three layers such that the rendering can be performed with the sparse micro-image set, the reconstructed light field image, and the decoded light field image.info:eu-repo/semantics/acceptedVersio
Layer Selection in Progressive Transmission of Motion-Compensated JPEG2000 Video
MCJ2K (Motion-Compensated JPEG2000) is a video codec based on MCTF (Motion- Compensated Temporal Filtering) and J2K (JPEG2000). MCTF analyzes a sequence of images, generating a collection of temporal sub-bands, which are compressed with J2K. The R/D (Rate-Distortion) performance in MCJ2K is better than the MJ2K (Motion JPEG2000) extension, especially if there is a high level of temporal redundancy. MCJ2K codestreams can be served by standard JPIP (J2K Interactive Protocol) servers, thanks to the use of only J2K standard file formats. In bandwidth-constrained scenarios, an important issue in MCJ2K is determining the amount of data of each temporal sub-band that must be transmitted to maximize the quality of the reconstructions at the client side. To solve this problem, we have proposed two rate-allocation algorithms which provide reconstructions that are progressive in quality. The first, OSLA (Optimized Sub-band Layers Allocation), determines the best progression of quality layers, but is computationally expensive. The second, ESLA (Estimated-Slope sub-band Layers Allocation), is sub-optimal in most cases, but much faster and more convenient for real-time streaming scenarios. An experimental comparison shows that even when a straightforward motion compensation scheme is used, the R/D performance of MCJ2K competitive is compared not only to MJ2K, but also with respect to other standard scalable video codecs
Improving minimum rate predictors algorithm for compression of volumetric medical images
Medical imaging technologies are experiencing a growth in terms of usage and image
resolution, namely in diagnostics systems that require a large set of images, like CT or
MRI. Furthermore, legal restrictions impose that these scans must be archived for several
years. These facts led to the increase of storage costs in medical image databases and
institutions. Thus, a demand for more efficient compression tools, used for archiving and
communication, is arising.
Currently, the DICOM standard, that makes recommendations for medical communications
and imaging compression, recommends lossless encoders such as JPEG, RLE,
JPEG-LS and JPEG2000. However, none of these encoders include inter-slice prediction
in their algorithms.
This dissertation presents the research work on medical image compression, using the
MRP encoder. MRP is one of the most efficient lossless image compression algorithm.
Several processing techniques are proposed to adapt the input medical images to the
encoder characteristics. Two of these techniques, namely changing the alignment of slices
for compression and a pixel-wise difference predictor, increased the compression efficiency
of MRP, by up to 27.9%.
Inter-slice prediction support was also added to MRP, using uni and bi-directional techniques.
Also, the pixel-wise difference predictor was added to the algorithm. Overall, the
compression efficiency of MRP was improved by 46.1%. Thus, these techniques allow for
compression ratio savings of 57.1%, compared to DICOM encoders, and 33.2%, compared
to HEVC RExt Random Access. This makes MRP the most efficient of the encoders
under study
Learned Scalable Video Coding For Humans and Machines
Video coding has traditionally been developed to support services such as
video streaming, videoconferencing, digital TV, and so on. The main intent was
to enable human viewing of the encoded content. However, with the advances in
deep neural networks (DNNs), encoded video is increasingly being used for
automatic video analytics performed by machines. In applications such as
automatic traffic monitoring, analytics such as vehicle detection, tracking and
counting, would run continuously, while human viewing could be required
occasionally to review potential incidents. To support such applications, a new
paradigm for video coding is needed that will facilitate efficient
representation and compression of video for both machine and human use in a
scalable manner. In this manuscript, we introduce the first end-to-end
learnable video codec that supports a machine vision task in its base layer,
while its enhancement layer supports input reconstruction for human viewing.
The proposed system is constructed based on the concept of conditional coding
to achieve better compression gains. Comprehensive experimental evaluations
conducted on four standard video datasets demonstrate that our framework
outperforms both state-of-the-art learned and conventional video codecs in its
base layer, while maintaining comparable performance on the human vision task
in its enhancement layer. We will provide the implementation of the proposed
system at www.github.com upon completion of the review process.Comment: 14 pages, 16 figure
Scalable Video Coding for Humans and Machines
Video content is watched not only by humans, but increasingly also by
machines. For example, machine learning models analyze surveillance video for
security and traffic monitoring, search through YouTube videos for
inappropriate content, and so on. In this paper, we propose a scalable video
coding framework that supports machine vision (specifically, object detection)
through its base layer bitstream and human vision via its enhancement layer
bitstream. The proposed framework includes components from both conventional
and Deep Neural Network (DNN)-based video coding. The results show that on
object detection, the proposed framework achieves 13-19% bit savings compared
to state-of-the-art video codecs, while remaining competitive in terms of
MS-SSIM on the human vision task.Comment: 6 pages, 5 figures, IEEE MMSP 202
- …