142 research outputs found

    Single View Modeling and View Synthesis

    Get PDF
    This thesis develops new algorithms to produce 3D content from a single camera. Today, amateurs can use hand-held camcorders to capture and display the 3D world in 2D, using mature technologies. However, there is always a strong desire to record and re-explore the 3D world in 3D. To achieve this goal, current approaches usually make use of a camera array, which suffers from tedious setup and calibration processes, as well as lack of portability, limiting its application to lab experiments. In this thesis, I try to produce the 3D contents using a single camera, making it as simple as shooting pictures. It requires a new front end capturing device rather than a regular camcorder, as well as more sophisticated algorithms. First, in order to capture the highly detailed object surfaces, I designed and developed a depth camera based on a novel technique called light fall-off stereo (LFS). The LFS depth camera outputs color+depth image sequences and achieves 30 fps, which is necessary for capturing dynamic scenes. Based on the output color+depth images, I developed a new approach that builds 3D models of dynamic and deformable objects. While the camera can only capture part of a whole object at any instance, partial surfaces are assembled together to form a complete 3D model by a novel warping algorithm. Inspired by the success of single view 3D modeling, I extended my exploration into 2D-3D video conversion that does not utilize a depth camera. I developed a semi-automatic system that converts monocular videos into stereoscopic videos, via view synthesis. It combines motion analysis with user interaction, aiming to transfer as much depth inferring work from the user to the computer. I developed two new methods that analyze the optical flow in order to provide additional qualitative depth constraints. The automatically extracted depth information is presented in the user interface to assist with user labeling work. In this thesis, I developed new algorithms to produce 3D contents from a single camera. Depending on the input data, my algorithm can build high fidelity 3D models for dynamic and deformable objects if depth maps are provided. Otherwise, it can turn the video clips into stereoscopic video

    Modelling the human perception of shape-from-shading

    Get PDF
    Shading conveys information on 3-D shape and the process of recovering this information is called shape-from-shading (SFS). This thesis divides the process of human SFS into two functional sub-units (luminance disambiguation and shape computation) and studies them individually. Based on results of a series of psychophysical experiments it is proposed that the interaction between first- and second-order channels plays an important role in disambiguating luminance. Based on this idea, two versions of a biologically plausible model are developed to explain the human performances observed here and elsewhere. An algorithm sharing the same idea is also developed as a solution to the problem of intrinsic image decomposition in the field of image processing. With regard to the shape computation unit, a link between luminance variations and estimated surface norms is identified by testing participants on simple gratings with several different luminance profiles. This methodology is unconventional but can be justified in the light of past studies of human SFS. Finally a computational algorithm for SFS containing two distinct operating modes is proposed. This algorithm is broadly consistent with the known psychophysics on human SFS

    Hand eye coordination in surgery

    Get PDF
    The coordination of the hand in response to visual target selection has always been regarded as an essential quality in a range of professional activities. This quality has thus far been elusive to objective scientific measurements, and is usually engulfed in the overall performance of the individuals. Parallels can be drawn to surgery, especially Minimally Invasive Surgery (MIS), where the physical constraints imposed by the arrangements of the instruments and visualisation methods require certain coordination skills that are unprecedented. With the current paradigm shift towards early specialisation in surgical training and shortened focused training time, selection process should identify trainees with the highest potentials in certain specific skills. Although significant effort has been made in objective assessment of surgical skills, it is only currently possible to measure surgeons’ abilities at the time of assessment. It has been particularly difficult to quantify specific details of hand-eye coordination and assess innate ability of future skills development. The purpose of this thesis is to examine hand-eye coordination in laboratory-based simulations, with a particular emphasis on details that are important to MIS. In order to understand the challenges of visuomotor coordination, movement trajectory errors have been used to provide an insight into the innate coordinate mapping of the brain. In MIS, novel spatial transformations, due to a combination of distorted endoscopic image projections and the “fulcrum” effect of the instruments, accentuate movement generation errors. Obvious differences in the quality of movement trajectories have been observed between novices and experts in MIS, however, this is difficult to measure quantitatively. A Hidden Markov Model (HMM) is used in this thesis to reveal the underlying characteristic movement details of a particular MIS manoeuvre and how such features are exaggerated by the introduction of rotation in the endoscopic camera. The proposed method has demonstrated the feasibility of measuring movement trajectory quality by machine learning techniques without prior arbitrary classification of expertise. Experimental results have highlighted these changes in novice laparoscopic surgeons, even after a short period of training. The intricate relationship between the hands and the eyes changes when learning a skilled visuomotor task has been previously studied. Reactive eye movement, when visual input is used primarily as a feedback mechanism for error correction, implies difficulties in hand-eye coordination. As the brain learns to adapt to this new coordinate map, eye movements then become predictive of the action generated. The concept of measuring this spatiotemporal relationship is introduced as a measure of hand-eye coordination in MIS, by comparing the Target Distance Function (TDF) between the eye fixation and the instrument tip position on the laparoscopic screen. Further validation of this concept using high fidelity experimental tasks is presented, where higher cognitive influence and multiple target selection increase the complexity of the data analysis. To this end, Granger-causality is presented as a measure of the predictability of the instrument movement with the eye fixation pattern. Partial Directed Coherence (PDC), a frequency-domain variation of Granger-causality, is used for the first time to measure hand-eye coordination. Experimental results are used to establish the strengths and potential pitfalls of the technique. To further enhance the accuracy of this measurement, a modified Jensen-Shannon Divergence (JSD) measure has been developed for enhancing the signal matching algorithm and trajectory segmentations. The proposed framework incorporates high frequency noise filtering, which represents non-purposeful hand and eye movements. The accuracy of the technique has been demonstrated by quantitative measurement of multiple laparoscopic tasks by expert and novice surgeons. Experimental results supporting visual search behavioural theory are presented, as this underpins the target selection process immediately prior to visual motor action generation. The effects of specialisation and experience on visual search patterns are also examined. Finally, pilot results from functional brain imaging are presented, where the Posterior Parietal Cortical (PPC) activation is measured using optical spectroscopy techniques. PPC has been demonstrated to involve in the calculation of the coordinate transformations between the visual and motor systems, which establishes the possibilities of exciting future studies in hand-eye coordination

    Learning geometric and lighting priors from natural images

    Get PDF
    Comprendre les images est d’une importance cruciale pour une pléthore de tâches, de la composition numérique au ré-éclairage d’une image, en passant par la reconstruction 3D d’objets. Ces tâches permettent aux artistes visuels de réaliser des chef-d’oeuvres ou d’aider des opérateurs à prendre des décisions de façon sécuritaire en fonction de stimulis visuels. Pour beaucoup de ces tâches, les modèles physiques et géométriques que la communauté scientifique a développés donnent lieu à des problèmes mal posés possédant plusieurs solutions, dont généralement une seule est raisonnable. Pour résoudre ces indéterminations, le raisonnement sur le contexte visuel et sémantique d’une scène est habituellement relayé à un artiste ou un expert qui emploie son expérience pour réaliser son travail. Ceci est dû au fait qu’il est généralement nécessaire de raisonner sur la scène de façon globale afin d’obtenir des résultats plausibles et appréciables. Serait-il possible de modéliser l’expérience à partir de données visuelles et d’automatiser en partie ou en totalité ces tâches ? Le sujet de cette thèse est celui-ci : la modélisation d’a priori par apprentissage automatique profond pour permettre la résolution de problèmes typiquement mal posés. Plus spécifiquement, nous couvrirons trois axes de recherche, soient : 1) la reconstruction de surface par photométrie, 2) l’estimation d’illumination extérieure à partir d’une seule image et 3) l’estimation de calibration de caméra à partir d’une seule image avec un contenu générique. Ces trois sujets seront abordés avec une perspective axée sur les données. Chacun de ces axes comporte des analyses de performance approfondies et, malgré la réputation d’opacité des algorithmes d’apprentissage machine profonds, nous proposons des études sur les indices visuels captés par nos méthodes.Understanding images is needed for a plethora of tasks, from compositing to image relighting, including 3D object reconstruction. These tasks allow artists to realize masterpieces or help operators to safely make decisions based on visual stimuli. For many of these tasks, the physical and geometric models that the scientific community has developed give rise to ill-posed problems with several solutions, only one of which is generally reasonable. To resolve these indeterminations, the reasoning about the visual and semantic context of a scene is usually relayed to an artist or an expert who uses his experience to carry out his work. This is because humans are able to reason globally on the scene in order to obtain plausible and appreciable results. Would it be possible to model this experience from visual data and partly or totally automate tasks? This is the topic of this thesis: modeling priors using deep machine learning to solve typically ill-posed problems. More specifically, we will cover three research axes: 1) surface reconstruction using photometric cues, 2) outdoor illumination estimation from a single image and 3) camera calibration estimation from a single image with generic content. These three topics will be addressed from a data-driven perspective. Each of these axes includes in-depth performance analyses and, despite the reputation of opacity of deep machine learning algorithms, we offer studies on the visual cues captured by our methods
    • …
    corecore