26 research outputs found
Non-disruptive use of light fields in image and video processing
In the age of computational imaging, cameras capture not only an image but also data. This captured additional data can be best used for photo-realistic renderings facilitating numerous post-processing possibilities such as perspective shift, depth scaling, digital refocus, 3D reconstruction, and much more. In computational photography, the light field imaging technology captures the complete volumetric information of a scene. This technology has the highest potential to accelerate immersive experiences towards close-toreality. It has gained significance in both commercial and research domains. However, due to lack of coding and storage formats and also the incompatibility of the tools to process and enable the data, light fields are not exploited to its full potential. This dissertation approaches the integration of light field data to image and video processing. Towards this goal, the representation of light fields using advanced file formats designed for 2D image assemblies to facilitate asset re-usability and interoperability between applications and devices is addressed. The novel 5D light field acquisition and the on-going research on coding frameworks are presented. Multiple techniques for optimised sequencing of light field data are also proposed. As light fields contain complete 3D information of a scene, large amounts of data is captured and is highly redundant in nature. Hence, by pre-processing the data using the proposed approaches, excellent coding performance can be achieved.Im Zeitalter der computergestützten Bildgebung erfassen Kameras nicht mehr nur ein Bild, sondern vielmehr auch Daten. Diese erfassten Zusatzdaten lassen sich optimal für fotorealistische Renderings nutzen und erlauben zahlreiche Nachbearbeitungsmöglichkeiten, wie Perspektivwechsel, Tiefenskalierung, digitale Nachfokussierung, 3D-Rekonstruktion und vieles mehr. In der computergestützten Fotografie erfasst die Lichtfeld-Abbildungstechnologie die vollständige volumetrische Information einer Szene. Diese Technologie bietet dabei das größte Potenzial, immersive Erlebnisse zu mehr Realitätsnähe zu beschleunigen. Deshalb gewinnt sie sowohl im kommerziellen Sektor als auch im Forschungsbereich zunehmend an Bedeutung. Aufgrund fehlender Kompressions- und Speicherformate sowie der Inkompatibilität derWerkzeuge zur Verarbeitung und Freigabe der Daten, wird das Potenzial der Lichtfelder nicht voll ausgeschöpft. Diese Dissertation ermöglicht die Integration von Lichtfelddaten in die Bild- und Videoverarbeitung. Hierzu wird die Darstellung von Lichtfeldern mit Hilfe von fortschrittlichen für 2D-Bilder entwickelten Dateiformaten erarbeitet, um die Wiederverwendbarkeit von Assets- Dateien und die Kompatibilität zwischen Anwendungen und Geräten zu erleichtern. Die neuartige 5D-Lichtfeldaufnahme und die aktuelle Forschung an Kompressions-Rahmenbedingungen werden vorgestellt. Es werden zudem verschiedene Techniken für eine optimierte Sequenzierung von Lichtfelddaten vorgeschlagen. Da Lichtfelder die vollständige 3D-Information einer Szene beinhalten, wird eine große Menge an Daten, die in hohem Maße redundant sind, erfasst. Die hier vorgeschlagenen Ansätze zur Datenvorverarbeitung erreichen dabei eine ausgezeichnete Komprimierleistung
Perceptual Video Coding for Machines via Satisfied Machine Ratio Modeling
Video Coding for Machines (VCM) aims to compress visual signals for machine
analysis. However, existing methods only consider a few machines, neglecting
the majority. Moreover, the machine perceptual characteristics are not
effectively leveraged, leading to suboptimal compression efficiency. In this
paper, we introduce Satisfied Machine Ratio (SMR) to address these issues. SMR
statistically measures the quality of compressed images and videos for machines
by aggregating satisfaction scores from them. Each score is calculated based on
the difference in machine perceptions between original and compressed images.
Targeting image classification and object detection tasks, we build two
representative machine libraries for SMR annotation and construct a large-scale
SMR dataset to facilitate SMR studies. We then propose an SMR prediction model
based on the correlation between deep features differences and SMR.
Furthermore, we introduce an auxiliary task to increase the prediction accuracy
by predicting the SMR difference between two images in different quality
levels. Extensive experiments demonstrate that using the SMR models
significantly improves compression performance for VCM, and the SMR models
generalize well to unseen machines, traditional and neural codecs, and
datasets. In summary, SMR enables perceptual coding for machines and advances
VCM from specificity to generality. Code is available at
\url{https://github.com/ywwynm/SMR}
JOINT CODING OF MULTIMODAL BIOMEDICAL IMAGES US ING CONVOLUTIONAL NEURAL NETWORKS
The massive volume of data generated daily by the gathering of medical images with
different modalities might be difficult to store in medical facilities and share through
communication networks. To alleviate this issue, efficient compression methods
must be implemented to reduce the amount of storage and transmission resources
required in such applications. However, since the preservation of all image details
is highly important in the medical context, the use of lossless image compression
algorithms is of utmost importance.
This thesis presents the research results on a lossless compression scheme designed
to encode both computerized tomography (CT) and positron emission tomography
(PET). Different techniques, such as image-to-image translation, intra prediction,
and inter prediction are used. Redundancies between both image modalities are
also investigated. To perform the image-to-image translation approach, we resort to
lossless compression of the original CT data and apply a cross-modality image translation
generative adversarial network to obtain an estimation of the corresponding
PET.
Two approaches were implemented and evaluated to determine a PET residue
that will be compressed along with the original CT. In the first method, the
residue resulting from the differences between the original PET and its estimation
is encoded, whereas in the second method, the residue is obtained using encoders
inter-prediction coding tools. Thus, in alternative to compressing two independent
picture modalities, i.e., both images of the original PET-CT pair solely the CT is
independently encoded alongside with the PET residue, in the proposed method.
Along with the proposed pipeline, a post-processing optimization algorithm that
modifies the estimated PET image by altering the contrast and rescaling the image
is implemented to maximize the compression efficiency.
Four different versions (subsets) of a publicly available PET-CT pair dataset
were tested. The first proposed subset was used to demonstrate that the concept
developed in this work is capable of surpassing the traditional compression schemes.
The obtained results showed gains of up to 8.9% using the HEVC. On the other
side, JPEG2k proved not to be the most suitable as it failed to obtain good results,
having reached only -9.1% compression gain. For the remaining (more challenging) subsets, the results reveal that the proposed refined post-processing scheme attains,
when compared to conventional compression methods, up 6.33% compression gain
using HEVC, and 7.78% using VVC
3D coding tools final report
Livrable D4.3 du projet ANR PERSEECe rapport a été réalisé dans le cadre du projet ANR PERSEE (n° ANR-09-BLAN-0170). Exactement il correspond au livrable D4.3 du projet. Son titre : 3D coding tools final repor
Livrable D4.2 of the PERSEE project : Représentation et codage 3D - Rapport intermédiaire - Définitions des softs et architecture
51Livrable D4.2 du projet ANR PERSEECe rapport a été réalisé dans le cadre du projet ANR PERSEE (n° ANR-09-BLAN-0170). Exactement il correspond au livrable D4.2 du projet. Son titre : Représentation et codage 3D - Rapport intermédiaire - Définitions des softs et architectur
Video compression algorithms for HEVC and beyond
PhDDue to the increasing number of new services and devices that allow the creation, distribution and consumption of video content, the amount of video information being transmitted all over the world is constantly growing. Video compression technology is essential to cope with the ever increasing volume of digital video data being distributed in today's networks, as more e cient video compression techniques allow support for higher volumes of video data under the same memory/bandwidth constraints. This is especially relevant with the introduction of new and more immersive video formats associated with signi cantly higher amounts of data. In this thesis, novel techniques for improving the e ciency of current and future video coding technologies are investigated. Several aspects that in uence the way conventional video coding methods work are considered. In particular, the properties and limitations of the Human Visual System are exploited to tune the performance of video encoders towards better subjective quality. Additionally, it is shown how the visibility of speci c types of visual artefacts can be prevented during the video encoding process, in order to avoid subjective quality degradations in the compressed content. Techniques for higher video compression e ciency are also explored, targeting to improve the compression capabilities of state-of-the-art video coding standards. Finally, the application of video coding technologies to practical use-cases is considered. Accurate estimation models are devised to control the encoding time and bit rate associated with compressed video signals, in order to meet speci c encoding time and transmission time restrictions