9 research outputs found
Deep Multicameral Decoding for Localizing Unoccluded Object Instances from a Single RGB Image
Occlusion-aware instance-sensitive segmentation is a complex task generally
split into region-based segmentations, by approximating instances as their
bounding box. We address the showcase scenario of dense homogeneous layouts in
which this approximation does not hold. In this scenario, outlining unoccluded
instances by decoding a deep encoder becomes difficult, due to the translation
invariance of convolutional layers and the lack of complexity in the decoder.
We therefore propose a multicameral design composed of subtask-specific
lightweight decoder and encoder-decoder units, coupled in cascade to encourage
subtask-specific feature reuse and enforce a learning path within the decoding
process. Furthermore, the state-of-the-art datasets for occlusion-aware
instance segmentation contain real images with few instances and occlusions
mostly due to objects occluding the background, unlike dense object layouts. We
thus also introduce a synthetic dataset of dense homogeneous object layouts,
namely Mikado, which extensibly contains more instances and inter-instance
occlusions per image than these public datasets. Our extensive experiments on
Mikado and public datasets show that ordinal multiscale units within the
decoding process prove more effective than state-of-the-art design patterns for
capturing position-sensitive representations. We also show that Mikado is
plausible with respect to real-world problems, in the sense that it enables the
learning of performance-enhancing representations transferable to real images,
while drastically reducing the need of hand-made annotations for finetuning.
The proposed dataset will be made publicly available.Comment: International Journal of Computer Vision, Springer Verlag, 2020,
Special Issue on Deep Learning for Robotic Visio
Disparity Refinement based on Depth Image Layers Separation for Stereo Matching Algorithms
This paper presents a method to improve the raw disparity maps in the disparity refinement stage for stereo matching algorithm. The proposed algorithm will use the disparity depth map from the stereo matching algorithm as initial disparity depth output with a basic similarity metric of SAD. The similarity metric finds the pixel points between the left and right under the fixed window (FW) searching process. With this approach, the raw disparity depth map obtained is not smooth and contained errors particularly with the depth discontinuities and unable to detect the uniform areas and repetitive patterns. The initial disparity depth will be used to identify the layers of disparity depth map by adapting the Depth Image Layers Separation (DILS) algorithm that separate the layers of depth based on disparity range. Each particular disparity depth map distributed along the disparity range and can be divided into several layers. The layer will be mapped to segmented reference image to refine the disparity depth map. This method will be known as the Depth Layer Refinement (DLR) that using the disparity depth layers to refine the disparity ma
Segment-based stereo matching algorithm with rectification for single-lens bi-prism stereovision system
Ph.DDOCTOR OF PHILOSOPH
Depth scene estimation from images captured with a plenoptic câmera
Monografia (graduação)—Université de Bordeaux, ENSEIRB-MATMECA, Universidade de Brasília, 2013.Uma câmera plenóptica, também conhecida como \textit{light field camera}, é um dispositivo que emprega uma rede de microlentes colocada entre a lente principal e o sensor da câmera para capturar a informação 4D da luz de uma cena. Este \textit{light field} nos permite conhecer a posição e o ângulo de incidência dos raios de luz capturados pela câmera e pode ser usado para melhorar as soluções de problemas relacionados com gráfico computacional e visão por computador. Com um campo de luz amostrado adquirido pela câmera, várias imagens da cena em baixa resolução estão disponíveis das quais é possível inferir a profundidade. Diferentemente do estéreo multivisão tradicional, estas vistas são capturadas pelo mesmo sensor, implicando que elas são adquiridas com os mesmos parâmetros da câmera. Da mesma forma, as vistas estão em geometria epipolar perfeita. Entretanto, outros problemas aparecem devido a esta configuração. O sensor da câmera usa um filtro de Bayer e a dematriçagem da imagem bruta implica em interferência entre as vistas, criando artefatos de imagem. A construção das vistas modifica o padrão de cores, adicionando complexidade para a dematriçagem. A resolução das vistas que podemos obter é outro problema. Como a informação angular e espacial são amostrados pelo mesmo sensor, existe um compromisso entre a resolução das vistas e o número de vistas disponíveis. Para a câmera Lytro, por exemplo, as vistas são construídas com uma resolução de aproximadamente 0,12 megapixels, implicando em \textit{aliasing} para a maioria das cenas. Este trabalho apresenta: Um técnica para construir as vistas a partir da imagem bruta capturada pela câmera; um método de estimação de disparidade adaptado às câmeras plenópticas que permite a estimação mesmo sem a dematriçagem; um novo conceito para representar a disparidade no caso do estéreo multi-vistas; um esquema de reconstrução e dematriçagem usando a informação da disparidade e os pixels de vistas vizinhas.A plenoptic camera, also known as light field camera, is a device that employs a microlens array placed between the main lens and the camera sensor to capture the 4D light field information about a scene. Such light field enable us to know the position and angle of incidence of the light rays captured by the camera and can be used to improve the solution of computer graphics and computer vision-related problems. With a sampled light field acquired from a plenoptic camera, several low-resolution views of the scene are available from which to infer depth. Unlike traditional multiview stereo, these views are captured by the same sensor, implying that they are acquired with the same camera parameters. Also the views are in perfect epipolar geometry. However, other problems arises with such configuration. The camera sensor uses a Bayer color filter and demosaicing the RAW data implies view cross-talk creating image artifacts. The rendering of the views modify the color pattern, adding complexity for demosaicing. The resolution of the views we can get is another problem. As the angular and spatial position of the light rays are sampled by the same sensor, there is a trade off between view resolution and number of available views. For Lytro camera, for example, the views are rendered with about 0.12 megapixels of resolution, implying in aliasing on the views for most of the scenes. This work present: an approach to render the views from the RAW image captured by the camera; a method of disparity estimation adapted to plenoptic cameras that enables the estimation even without executing the demosaicing; a new concept of representing the disparity information on the case of multiview stereo; a reconstruction and demosaicing scheme using the disparity information and the pixels of neighbouring views