45 research outputs found

    Image Based View Synthesis

    Get PDF
    This dissertation deals with the image-based approach to synthesize a virtual scene using sparse images or a video sequence without the use of 3D models. In our scenario, a real dynamic or static scene is captured by a set of un-calibrated images from different viewpoints. After automatically recovering the geometric transformations between these images, a series of photo-realistic virtual views can be rendered and a virtual environment covered by these several static cameras can be synthesized. This image-based approach has applications in object recognition, object transfer, video synthesis and video compression. In this dissertation, I have contributed to several sub-problems related to image based view synthesis. Before image-based view synthesis can be performed, images need to be segmented into individual objects. Assuming that a scene can approximately be described by multiple planar regions, I have developed a robust and novel approach to automatically extract a set of affine or projective transformations induced by these regions, correctly detect the occlusion pixels over multiple consecutive frames, and accurately segment the scene into several motion layers. First, a number of seed regions using correspondences in two frames are determined, and the seed regions are expanded and outliers are rejected employing the graph cuts method integrated with level set representation. Next, these initial regions are merged into several initial layers according to the motion similarity. Third, the occlusion order constraints on multiple frames are explored, which guarantee that the occlusion area increases with the temporal order in a short period and effectively maintains segmentation consistency over multiple consecutive frames. Then the correct layer segmentation is obtained by using a graph cuts algorithm, and the occlusions between the overlapping layers are explicitly determined. Several experimental results are demonstrated to show that our approach is effective and robust. Recovering the geometrical transformations among images of a scene is a prerequisite step for image-based view synthesis. I have developed a wide baseline matching algorithm to identify the correspondences between two un-calibrated images, and to further determine the geometric relationship between images, such as epipolar geometry or projective transformation. In our approach, a set of salient features, edge-corners, are detected to provide robust and consistent matching primitives. Then, based on the Singular Value Decomposition (SVD) of an affine matrix, we effectively quantize the search space into two independent subspaces for rotation angle and scaling factor, and then we use a two-stage affine matching algorithm to obtain robust matches between these two frames. The experimental results on a number of wide baseline images strongly demonstrate that our matching method outperforms the state-of-art algorithms even under the significant camera motion, illumination variation, occlusion, and self-similarity. Given the wide baseline matches among images I have developed a novel method for Dynamic view morphing. Dynamic view morphing deals with the scenes containing moving objects in presence of camera motion. The objects can be rigid or non-rigid, each of them can move in any orientation or direction. The proposed method can generate a series of continuous and physically accurate intermediate views from only two reference images without any knowledge about 3D. The procedure consists of three steps: segmentation, morphing and post-warping. Given a boundary connection constraint, the source and target scenes are segmented into several layers for morphing. Based on the decomposition of affine transformation between corresponding points, we uniquely determine a physically correct path for post-warping by the least distortion method. I have successfully generalized the dynamic scene synthesis problem from the simple scene with only rotation to the dynamic scene containing non-rigid objects. My method can handle dynamic rigid or non-rigid objects, including complicated objects such as humans. Finally, I have also developed a novel algorithm for tri-view morphing. This is an efficient image-based method to navigate a scene based on only three wide-baseline un-calibrated images without the explicit use of a 3D model. After automatically recovering corresponding points between each pair of images using our wide baseline matching method, an accurate trifocal plane is extracted from the trifocal tensor implied in these three images. Next, employing a trinocular-stereo algorithm and barycentric blending technique, we generate an arbitrary novel view to navigate the scene in a 2D space. Furthermore, after self-calibration of the cameras, a 3D model can also be correctly augmented into this virtual environment synthesized by the tri-view morphing algorithm. We have applied our view morphing framework to several interesting applications: 4D video synthesis, automatic target recognition, multi-view morphing

    Robust 3D Surface Reconstruction from Light Fields

    Get PDF
    Light field data captures the intensity, as well as the direction of rays in 3D space, allowing to retrieve not only the 3D geometry information, but also the reflectance properties of the acquired scene. The main focus of this thesis is precise 3D geometry reconstruction from light fields, especially on scenes with specular objects. A new semi-global approach for 3D reconstruction from linear light fields is proposed. This method combines a modified version of the Progressive Probabilistic Hough Transform with local slope estimates to extract ori- entations, and consequently depth information, in epipolar plane images (EPIs). The resulting reconstructions achieve a higher accuracy than local methods, with a more precise localization of object boundaries, as well as preservation of fine details. In the second part of the thesis the proposed approach is extended to cir- cular light fields in order to determine the full 360° view of target objects. Additionally, circular light fields allow retrieving depth even from datasets acquired with telecentric lenses, a task which is not possible using a linearly moving camera. Experimental results on synthetic and real datasets demon- strate the quality and the robustness of the proposed algorithm, which pro- vides precise reconstructions even with highly specular objects. The quality of the final reconstruction opens up many possible application scenarios, such as precise 3D reconstruction for defect detection in industrial optical inspection, object scanning for heritage preservation, as well as depth segmentation for the movie industry

    Multimodal Three Dimensional Scene Reconstruction, The Gaussian Fields Framework

    Get PDF
    The focus of this research is on building 3D representations of real world scenes and objects using different imaging sensors. Primarily range acquisition devices (such as laser scanners and stereo systems) that allow the recovery of 3D geometry, and multi-spectral image sequences including visual and thermal IR images that provide additional scene characteristics. The crucial technical challenge that we addressed is the automatic point-sets registration task. In this context our main contribution is the development of an optimization-based method at the core of which lies a unified criterion that solves simultaneously for the dense point correspondence and transformation recovery problems. The new criterion has a straightforward expression in terms of the datasets and the alignment parameters and was used primarily for 3D rigid registration of point-sets. However it proved also useful for feature-based multimodal image alignment. We derived our method from simple Boolean matching principles by approximation and relaxation. One of the main advantages of the proposed approach, as compared to the widely used class of Iterative Closest Point (ICP) algorithms, is convexity in the neighborhood of the registration parameters and continuous differentiability, allowing for the use of standard gradient-based optimization techniques. Physically the criterion is interpreted in terms of a Gaussian Force Field exerted by one point-set on the other. Such formulation proved useful for controlling and increasing the region of convergence, and hence allowing for more autonomy in correspondence tasks. Furthermore, the criterion can be computed with linear complexity using recently developed Fast Gauss Transform numerical techniques. In addition, we also introduced a new local feature descriptor that was derived from visual saliency principles and which enhanced significantly the performance of the registration algorithm. The resulting technique was subjected to a thorough experimental analysis that highlighted its strength and showed its limitations. Our current applications are in the field of 3D modeling for inspection, surveillance, and biometrics. However, since this matching framework can be applied to any type of data, that can be represented as N-dimensional point-sets, the scope of the method is shown to reach many more pattern analysis applications

    Crowd detection and counting using a static and dynamic platform: state of the art

    Get PDF
    Automated object detection and crowd density estimation are popular and important area in visual surveillance research. The last decades witnessed many significant research in this field however, it is still a challenging problem for automatic visual surveillance. The ever increase in research of the field of crowd dynamics and crowd motion necessitates a detailed and updated survey of different techniques and trends in this field. This paper presents a survey on crowd detection and crowd density estimation from moving platform and surveys the different methods employed for this purpose. This review category and delineates several detections and counting estimation methods that have been applied for the examination of scenes from static and moving platforms

    Light field reconstruction from multi-view images

    Get PDF
    Kang Han studied recovering the 3D world from multi-view images. He proposed several algorithms to deal with occlusions in depth estimation and effective representations in view rendering. the proposed algorithms can be used for many innovative applications based on machine intelligence, such as autonomous driving and Metaverse

    Light Fields Reconstructing Geometry and Reflectance Properties

    Get PDF
    Computer vision plays an important role in the progress of automation and digitalization of our society. One of the key challenges is the creation of accurate 3D representations of our environment. The rich information in light fields can enable highly accurate depth estimates, but requires the development of new algorithms. Especially specular reflections pose a challenge for many reconstruction algorithms. This is due to the violation of the brightness consistency assumption, which only holds for Lambertian surfaces. Most surfaces are to some extent specular and an appropriate handling is central to avoid erroneous depth maps. In this thesis we explore the potential of using specular highlights to determine the orientation of surfaces. To this end, we examine epipolar images in light field set ups. In light field data, reflectance properties can be characterized by intensity variations in the epipolar plane space. This space is analysed and compared to the expected reflectance, which is modelled using the render equation with different bidirectional reflection distribution functions. This approach allows us to infer highly accurate surface normals and depth estimates. Furthermore, it reveals material properties encoded in the reflectance by inspecting the intensity profile. Our results demonstrate the potential to increase the accuracy of the depth maps. Multiple cameras in a light field set up let us retrieve additional material properties encoded in the reflectance

    Detecção de eventos complexos em vídeos baseada em ritmos visuais

    Get PDF
    Orientador: Hélio PedriniDissertação (mestrado) - Universidade Estadual de Campinas, Instituto de ComputaçãoResumo: O reconhecimento de eventos complexos em vídeos possui várias aplicações práticas relevantes, alavancadas pela grande disponibilidade de câmeras digitais instaladas em aeroportos, estações de ônibus e trens, centros de compras, estádios, hospitais, escolas, prédios, estradas, entre vários outros locais. Avanços na tecnologia digital têm aumentado as capacidades dos sistemas em reconhecer eventos em vídeos por meio do desenvolvimento de dispositivos com alta resolução, dimensões físicas pequenas e altas taxas de amostragem. Muitos trabalhos disponíveis na literatura têm explorado o tema a partir de diferentes pontos de vista. Este trabalho apresenta e avalia uma metodologia para extrair características dos ritmos visuais no contexto de detecção de eventos em vídeos. Um ritmo visual pode ser visto com a projeção de um vídeo em uma imagem, tal que a tarefa de análise de vídeos é reduzida a um problema de análise de imagens, beneficiando-se de seu baixo custo de processamento em termos de tempo e complexidade. Para demonstrar o potencial do ritmo visual na análise de vídeos complexos, três problemas da área de visão computacional são selecionados: detecção de eventos anômalos, classificação de ações humanas e reconhecimento de gestos. No primeiro problema, um modelo e? aprendido com situações de normalidade a partir dos rastros deixados pelas pessoas ao andar, enquanto padro?es representativos das ações são extraídos nos outros dois problemas. Nossa hipo?tese e? de que vídeos similares produzem padro?es semelhantes, tal que o problema de classificação de ações pode ser reduzido a uma tarefa de classificação de imagens. Experimentos realizados em bases públicas de dados demonstram que o método proposto produz resultados promissores com baixo custo de processamento, tornando-o possível aplicar em tempo real. Embora os padro?es dos ritmos visuais sejam extrai?dos como histograma de gradientes, algumas tentativas para adicionar características do fluxo o?tico são discutidas, além de estratégias para obter ritmos visuais alternativosAbstract: The recognition of complex events in videos has currently several important applications, particularly due to the wide availability of digital cameras in environments such as airports, train and bus stations, shopping centers, stadiums, hospitals, schools, buildings, roads, among others. Moreover, advances in digital technology have enhanced the capabilities for detection of video events through the development of devices with high resolution, small physical size, and high sampling rates. Many works available in the literature have explored the subject from different perspectives. This work presents and evaluates a methodology for extracting a feature descriptor from visual rhythms of video sequences in order to address the video event detection problem. A visual rhythm can be seen as the projection of a video onto an image, such that the video analysis task can be reduced into an image analysis problem, benefiting from its low processing cost in terms of time and complexity. To demonstrate the potential of the visual rhythm in the analysis of complex videos, three computer vision problems are selected in this work: abnormal event detection, human action classification, and gesture recognition. The former problem learns a normalcy model from the traces that people leave when they walk, whereas the other two problems extract representative patterns from actions. Our hypothesis is that similar videos produce similar patterns, therefore, the action classification problem is reduced into an image classification task. Experiments conducted on well-known public datasets demonstrate that the method produces promising results at high processing rates, making it possible to work in real time. Even though the visual rhythm features are mainly extracted as histogram of gradients, some attempts for adding optical flow features are discussed, as well as strategies for obtaining alternative visual rhythmsMestradoCiência da ComputaçãoMestre em Ciência da Computação1570507, 1406910, 1374943CAPE

    Digital video moving object segmentation using tensor voting: A non-causal, accurate approach

    Get PDF
    Motion based video segmentation is important in many video processing applications such as MPEG4. This thesis presents an exhaustive, non-causal method to estimate boundaries between moving objects in a video clip. It make use of tensor voting principles. The tensor voting is adapted to allow image structure to manifest in the tangential plane of the saliency map. The technique allows direct estimation of motion vectors from second-order tensor analysis. The tensors make maximal and direct use of the available information by encoding it into the dimensionality of the tensor. The tensor voting methodology introduces a non-symmetrical voting kernel to allow a measure of voting skewness to be inferred. Skewness is found in the third-order tensor in the direction of the tangential first eigenvector. This new concept is introduced as the Tensor Skewness Map or TS map. The TS map gives further information about whether an object is occluding or disoccluding another object. The information can be used to infer the layering order of the moving objects in the video clip. Matched filtering and detection are applied to reduce the TS map into occluding and disoccluding detections. The technique is computationally exhaustive, but may find use in off-line video object segmentation processes. The use of commercial-off-the-shelf Graphic Processor Units is demonstrated to scale well to the tensor voting framework, providing the computational speed improvement required to make the framework realisable on a larger scale and to handle tensor dimensionalities higher than before
    corecore