5,621 research outputs found
Livrable D2.2 of the PERSEE project : Analyse/Synthese de Texture
Livrable D2.2 du projet ANR PERSEECe rapport a été réalisé dans le cadre du projet ANR PERSEE (n° ANR-09-BLAN-0170). Exactement il correspond au livrable D2.2 du projet. Son titre : Analyse/Synthese de Textur
RealFlow: EM-based Realistic Optical Flow Dataset Generation from Videos
Obtaining the ground truth labels from a video is challenging since the
manual annotation of pixel-wise flow labels is prohibitively expensive and
laborious. Besides, existing approaches try to adapt the trained model on
synthetic datasets to authentic videos, which inevitably suffers from domain
discrepancy and hinders the performance for real-world applications. To solve
these problems, we propose RealFlow, an Expectation-Maximization based
framework that can create large-scale optical flow datasets directly from any
unlabeled realistic videos. Specifically, we first estimate optical flow
between a pair of video frames, and then synthesize a new image from this pair
based on the predicted flow. Thus the new image pairs and their corresponding
flows can be regarded as a new training set. Besides, we design a Realistic
Image Pair Rendering (RIPR) module that adopts softmax splatting and
bi-directional hole filling techniques to alleviate the artifacts of the image
synthesis. In the E-step, RIPR renders new images to create a large quantity of
training data. In the M-step, we utilize the generated training data to train
an optical flow network, which can be used to estimate optical flows in the
next E-step. During the iterative learning steps, the capability of the flow
network is gradually improved, so is the accuracy of the flow, as well as the
quality of the synthesized dataset. Experimental results show that RealFlow
outperforms previous dataset generation methods by a considerably large margin.
Moreover, based on the generated dataset, our approach achieves
state-of-the-art performance on two standard benchmarks compared with both
supervised and unsupervised optical flow methods. Our code and dataset are
available at https://github.com/megvii-research/RealFlowComment: ECCV 2022 Ora
Rate-Distortion Analysis of Multiview Coding in a DIBR Framework
Depth image based rendering techniques for multiview applications have been
recently introduced for efficient view generation at arbitrary camera
positions. Encoding rate control has thus to consider both texture and depth
data. Due to different structures of depth and texture images and their
different roles on the rendered views, distributing the available bit budget
between them however requires a careful analysis. Information loss due to
texture coding affects the value of pixels in synthesized views while errors in
depth information lead to shift in objects or unexpected patterns at their
boundaries. In this paper, we address the problem of efficient bit allocation
between textures and depth data of multiview video sequences. We adopt a
rate-distortion framework based on a simplified model of depth and texture
images. Our model preserves the main features of depth and texture images.
Unlike most recent solutions, our method permits to avoid rendering at encoding
time for distortion estimation so that the encoding complexity is not
augmented. In addition to this, our model is independent of the underlying
inpainting method that is used at decoder. Experiments confirm our theoretical
results and the efficiency of our rate allocation strategy
Direction Hole-Filling Method for a 3D View Generator
[[abstract]]Depth image-based rendering (DIBR) technology is an approach to creating a virtual
3D image from one single 2D image. A desired view can be synthesised at the receiver side using
depth images to make transmission and storage efficient. While this technique has many
advantages, one of the key challenges is how to fill the holes caused by disocclusion regions and
wrong depth values in the warped left/right images. A common means to alleviate the sizes and
the number of holes is to smooth the depth image. But smoothing results in geometric distortions
and degrades the depth image quality. This study proposes a hole-filling method based on the
oriented texture direction. Parallax correction is first implemented to mitigate the wrong depth
values. Texture directional information is then probed in the background pixels where holes take
place after warping. Next, in the warped image, holes are filled according to their directions.
Experimental results showed that this algorithm preserves the depth information and greatly
reduces the amount of geometric distortion.[[notice]]補æ£å®Œ
Livrable D5.2 of the PERSEE project : 2D/3D Codec architecture
Livrable D5.2 du projet ANR PERSEECe rapport a été réalisé dans le cadre du projet ANR PERSEE (n° ANR-09-BLAN-0170). Exactement il correspond au livrable D5.2 du projet. Son titre : 2D/3D Codec architectur
Automatic 3DS Conversion of Historical Aerial Photographs
In this paper we present a method for the generation of 3D stereo (3DS) pairs from sequences of historical aerial photographs. The goal of our work is to provide a stereoscopic display when the existing exposures are in a monocular sequence. Each input image is processed using its neighbours and a synthetic image is rendered, which, together with the original one, form a stereo pair. Promising results on real images taken from a historical photo archive are shown, that corroborate the viability of generating 3DS data from monocular footage
Application for light field inpainting
Light Field (LF) imaging is a multimedia technology that can provide more immersive experience when visualizing a multimedia content with higher levels of realism compared to conventional imaging technologies. This technology is mainly promising for Virtual Reality (VR) since it displays real-world scenes in a way that users can experience the captured scenes in every position and every angle, due to its 4-dimensional LF representation. For these reasons, LF is a fast-growing technology, with so many topics to explore, being the LF inpainting the one that was explored in this dissertation.
Image inpainting is an editing technique that allows synthesizing alternative content to fill in holes in an image. It is commonly used to fill missing parts in a scene and restore damaged images such that the modifications are correct and visually realistic. Applying traditional 2D inpainting techniques straightforwardly to LFs is very unlikely to result in a consistent inpainting in its all 4 dimensions. Usually, to inpaint a 4D LF content, 2D inpainting algorithms are used to inpaint a particular point of view and then 4D inpainting propagation algorithms propagate the inpainted result for the whole 4D LF data.
Based on this idea of 4D inpainting propagation, some 4D LF inpainting techniques have been recently proposed in the literature. Therefore, this dissertation proposes to design and implement an LF inpainting application that can be used by the public that desire to work in this field and/or explore and edit LFs.Campos de luz é uma tecnologia multimédia que fornece uma experiência mais imersiva ao visualizar conteúdo multimédia com nÃveis mais altos de realismo, comparando a tecnologias convencionais de imagem. Esta tecnologia é promissora, principalmente para Realidade Virtual, pois exibe cenas capturadas do mundo real de forma que utilizadores as possam experimentar em todas as posições e ângulos, devido à sua representação em 4 dimensões. Por isso, esta é tecnologia em rápido crescimento, com tantos tópicos para explorar, sendo o inpainting o explorado nesta dissertação.
Inpainting de imagens é uma técnica de edição, permitindo sintetizar conteúdo alternativo para preencher lacunas numa imagem. Comumente usado para preencher partes que faltam numa cena e restaurar imagens danificadas, de forma que as modificações sejam corretas e visualmente realistas. É muito improvável que aplicar técnicas tradicionais de inpainting 2D diretamente a campos de luz resulte num inpainting consistente em todas as suas 4 dimensões. Normalmente, para fazer inpainting num conteúdo 4D de campos de luz, os algoritmos de inpainting 2D são usados para fazer inpainting de um ponto de vista especÃfico e, seguidamente, os algoritmos de propagação de inpainting 4D propagam o resultado do inpainting para todos os dados do campo de luz 4D.
Com base nessa ideia de propagação de inpainting 4D, algumas técnicas foram recentemente propostas na literatura. Assim, esta dissertação propõe-se a conceber e implementar uma aplicação de inpainting de campos de luz que possa ser utilizada pelo público que pretenda trabalhar nesta área e/ou explorar e editar campos de luz
3D Hierarchical Refinement and Augmentation for Unsupervised Learning of Depth and Pose from Monocular Video
Depth and ego-motion estimations are essential for the localization and
navigation of autonomous robots and autonomous driving. Recent studies make it
possible to learn the per-pixel depth and ego-motion from the unlabeled
monocular video. A novel unsupervised training framework is proposed with 3D
hierarchical refinement and augmentation using explicit 3D geometry. In this
framework, the depth and pose estimations are hierarchically and mutually
coupled to refine the estimated pose layer by layer. The intermediate view
image is proposed and synthesized by warping the pixels in an image with the
estimated depth and coarse pose. Then, the residual pose transformation can be
estimated from the new view image and the image of the adjacent frame to refine
the coarse pose. The iterative refinement is implemented in a differentiable
manner in this paper, making the whole framework optimized uniformly.
Meanwhile, a new image augmentation method is proposed for the pose estimation
by synthesizing a new view image, which creatively augments the pose in 3D
space but gets a new augmented 2D image. The experiments on KITTI demonstrate
that our depth estimation achieves state-of-the-art performance and even
surpasses recent approaches that utilize other auxiliary tasks. Our visual
odometry outperforms all recent unsupervised monocular learning-based methods
and achieves competitive performance to the geometry-based method, ORB-SLAM2
with back-end optimization.Comment: 10 pages, 7 figures, under revie
- …