148 research outputs found

    Algorithms for trajectory integration in multiple views

    Get PDF
    PhDThis thesis addresses the problem of deriving a coherent and accurate localization of moving objects from partial visual information when data are generated by cameras placed in di erent view angles with respect to the scene. The framework is built around applications of scene monitoring with multiple cameras. Firstly, we demonstrate how a geometric-based solution exploits the relationships between corresponding feature points across views and improves accuracy in object location. Then, we improve the estimation of objects location with geometric transformations that account for lens distortions. Additionally, we study the integration of the partial visual information generated by each individual sensor and their combination into one single frame of observation that considers object association and data fusion. Our approach is fully image-based, only relies on 2D constructs and does not require any complex computation in 3D space. We exploit the continuity and coherence in objects' motion when crossing cameras' elds of view. Additionally, we work under the assumption of planar ground plane and wide baseline (i.e. cameras' viewpoints are far apart). The main contributions are: i) the development of a framework for distributed visual sensing that accounts for inaccuracies in the geometry of multiple views; ii) the reduction of trajectory mapping errors using a statistical-based homography estimation; iii) the integration of a polynomial method for correcting inaccuracies caused by the cameras' lens distortion; iv) a global trajectory reconstruction algorithm that associates and integrates fragments of trajectories generated by each camera

    Light field image processing: an overview

    Get PDF
    Light field imaging has emerged as a technology allowing to capture richer visual information from our world. As opposed to traditional photography, which captures a 2D projection of the light in the scene integrating the angular domain, light fields collect radiance from rays in all directions, demultiplexing the angular information lost in conventional photography. On the one hand, this higher dimensional representation of visual data offers powerful capabilities for scene understanding, and substantially improves the performance of traditional computer vision problems such as depth sensing, post-capture refocusing, segmentation, video stabilization, material classification, etc. On the other hand, the high-dimensionality of light fields also brings up new challenges in terms of data capture, data compression, content editing, and display. Taking these two elements together, research in light field image processing has become increasingly popular in the computer vision, computer graphics, and signal processing communities. In this paper, we present a comprehensive overview and discussion of research in this field over the past 20 years. We focus on all aspects of light field image processing, including basic light field representation and theory, acquisition, super-resolution, depth estimation, compression, editing, processing algorithms for light field display, and computer vision applications of light field data

    Robust Stabilised Visual Tracker for Vehicle Tracking

    Get PDF
    Visual tracking is performed in a stabilised video. If the input video to the tracker algorithm is itself destabilised, incorrect motion vectors will cause a serious drift in tracking. Therefore video stabilisation is must before tracking. A novel algorithm is developed which simultaneously takes care of video stabilisation and target tracking. Target templates in just previous frame are stored in positive and negative repositories followed by Affine mapping. Then optimised affine parameters are used to stabilise the video. Target of interest in the next frame is approximated using linear combinations of previous target templates. Proposed modified L1 minimisation method is used to solve sparse representation of target in the target template subspace. Occlusion problem is minimised using the inherent energy of coefficients. Accurate tracking results have been obtained in destabilised videos

    Viewpoint-Free Photography for Virtual Reality

    Get PDF
    Viewpoint-free photography, i.e., interactively controlling the viewpoint of a photograph after capture, is a standing challenge. In this thesis, we investigate algorithms to enable viewpoint-free photography for virtual reality (VR) from casual capture, i.e., from footage easily captured with consumer cameras. We build on an extensive body of work in image-based rendering (IBR). Given images of an object or scene, IBR methods aim to predict the appearance of an image taken from a novel perspective. Most IBR methods focus on full or near-interpolation, where the output viewpoints either lie directly between captured images, or nearby. These methods are not suitable for VR, where the user has significant range of motion and can look in all directions. Thus, it is essential to create viewpoint-free photos with a wide field-of-view and sufficient positional freedom to cover the range of motion a user might experience in VR. We focus on two VR experiences: 1) Seated VR experiences, where the user can lean in different directions. This simplifies the problem, as the scene is only observed from a small range of viewpoints. Thus, we focus on easy capture, showing how to turn panorama-style capture into 3D photos, a simple representation for viewpoint-free photos, and also how to speed up processing so users can see the final result on-site. 2) Room-scale VR experiences, where the user can explore vastly different perspectives. This is challenging: More input footage is needed, maintaining real-time display rates becomes difficult, view-dependent appearance and object backsides need to be modelled, all while preventing noticeable mistakes. We address these challenges by: (1) creating refined geometry for each input photograph, (2) using a fast tiled rendering algorithm to achieve real-time display rates, and (3) using a convolutional neural network to hide visual mistakes during compositing. Overall, we provide evidence that viewpoint-free photography is feasible from casual capture. We thoroughly compare with the state-of-the-art, showing that our methods achieve both a numerical improvement and a clear increase in visual quality for both seated and room-scale VR experiences

    Developing object detection, tracking and image mosaicing algorithms for visual surveillance

    Get PDF
    Visual surveillance systems are becoming increasingly important in the last decades due to proliferation of cameras. These systems have been widely used in scientific, commercial and end-user applications where they can store, extract and infer huge amount of information automatically without human help. In this thesis, we focus on developing object detection, tracking and image mosaicing algorithms for a visual surveillance system. First, we review some real-time object detection algorithms that exploit motion cue and enhance one of them that is suitable for use in dynamic scenes. This algorithm adopts a nonparametric probabilistic model over the whole image and exploits pixel adjacencies to detect foreground regions under even small baseline motion. Then we develop a multiple object tracking algorithm which utilizes this algorithm as its detection step. The algorithm analyzes multiple object interactions in a probabilistic framework using virtual shells to track objects in case of severe occlusions. The final part of the thesis is devoted to an image mosaicing algorithm that stitches ordered images to create a large and visually attractive mosaic for large sequence of images. The proposed mosaicing method eliminates nonlinear optimization techniques with the capability of real-time operation on large datasets. Experimental results show that developed algorithms work quite successfully in dynamic and cluttered environments with real-time performance

    Fehlerkaschierte Bildbasierte Darstellungsverfahren

    Get PDF
    Creating photo-realistic images has been one of the major goals in computer graphics since its early days. Instead of modeling the complexity of nature with standard modeling tools, image-based approaches aim at exploiting real-world footage directly,as they are photo-realistic by definition. A drawback of these approaches has always been that the composition or combination of different sources is a non-trivial task, often resulting in annoying visible artifacts. In this thesis we focus on different techniques to diminish visible artifacts when combining multiple images in a common image domain. The results are either novel images, when dealing with the composition task of multiple images, or novel video sequences rendered in real-time, when dealing with video footage from multiple cameras.Fotorealismus ist seit jeher eines der großen Ziele in der Computergrafik. Anstatt die Komplexität der Natur mit standardisierten Modellierungswerkzeugen nachzubauen, gehen bildbasierte Ansätze den umgekehrten Weg und verwenden reale Bildaufnahmen zur Modellierung, da diese bereits per Definition fotorealistisch sind. Ein Nachteil dieser Variante ist jedoch, dass die Komposition oder Kombination mehrerer Quellbilder eine nichttriviale Aufgabe darstellt und häufig unangenehm auffallende Artefakte im erzeugten Bild nach sich zieht. In dieser Dissertation werden verschiedene Ansätze verfolgt, um Artefakte zu verhindern oder abzuschwächen, welche durch die Komposition oder Kombination mehrerer Bilder in einer gemeinsamen Bilddomäne entstehen. Im Ergebnis liefern die vorgestellten Verfahren neue Bilder oder neue Ansichten einer Bildsammlung oder Videosequenz, je nachdem, ob die jeweilige Aufgabe die Komposition mehrerer Bilder ist oder die Kombination mehrerer Videos verschiedener Kameras darstellt

    3D Hand reconstruction from monocular camera with model-based priors

    Get PDF
    As virtual and augmented reality (VR/AR) technology gains popularity, facilitating intuitive digital interactions in 3D is of crucial importance. Tools such as VR controllers exist, but such devices support only a limited range of interactions, mapped onto complex sequences of button presses that can be intimidating to learn. In contrast, users already have an instinctive understanding of manual interactions in the real world, which is readily transferable to the virtual world. This makes hands the ideal mode of interaction for down-stream applications such as robotic teleoperation, sign-language translation, and computer-aided design. Existing hand-tracking systems come with several inconvenient limitations. Wearable solutions such as gloves and markers unnaturally limit the range of articulation. Multi-camera systems are not trivial to calibrate and have specialized hardware requirements which make them cumbersome to use. Given these drawbacks, recent research tends to focus on monocular inputs, as these do not constrain articulation and suitable devices are pervasive in everyday life. 3D reconstruction in this setting is severely under-constrained, however, due to occlusions and depth ambiguities. The majority of state-of-the-art works rely on a learning framework to resolve these ambiguities statistically; as a result they have several limitations in common. For example, they require a vast amount of annotated 3D data that is labor intensive to obtain and prone to systematic error. Additionally, traits that are hard to quantify with annotations - the details of individual hand appearance - are difficult to reconstruct in such a framework. Existing methods also make the simplifying assumption that only a single hand is present in the scene. Two-hand interactions introduce additional challenges, however, in the form of inter-hand occlusion, left-right confusion, and collision constraints, that single hand methods cannot address. To tackle the aforementioned shortcomings of previous methods, this thesis advances the state-of-the-art through the novel use of model-based priors to incorporate hand-specific knowledge. In particular, this thesis presents a training method that reduces the amount of annotations required and is robust to systemic biases; it presents the first tracking method that addresses the challenging two-hand-interaction scenario using monocular RGB video, and also the first probabilistic method to model image ambiguity for two-hand interactions. Additionally, this thesis also contributes the first parametric hand texture model with example applications in hand personalization.Virtual- und Augmented-Reality-Technologien (VR/AR) gewinnen rapide an Beliebtheit und Einfluss, und so ist die Erleichterung intuitiver digitaler Interaktionen in 3D von wachsender Bedeutung. Zwar gibt es Tools wie VR-Controller, doch solche Geräte unterstützen nur ein begrenztes Spektrum an Interaktionen, oftmals abgebildet auf komplexe Sequenzen von Tastendrücken, deren Erlernen einschüchternd sein kann. Im Gegensatz dazu haben Nutzer bereits ein instinktives Verständnis für manuelle Interaktionen in der realen Welt, das sich leicht auf die virtuelle Welt übertragen lässt. Dies macht Hände zum idealen Werkzeug der Interaktion für nachgelagerte Anwendungen wie robotergestützte Teleoperation, Übersetzung von Gebärdensprache und computergestütztes Design. Existierende Hand-Tracking Systeme leiden unter mehreren unbequemen Einschränkungen. Tragbare Lösungen wie Handschuhe und aufgesetzte Marker schränken den Bewegungsspielraum auf unnatürliche Weise ein. Systeme mit mehreren Kameras erfordern genaue Kalibrierung und haben spezielle Hardwareanforderungen, die ihre Anwendung umständlich gestalten. Angesichts dieser Nachteile konzentriert sich die neuere Forschung tendenziell auf monokularen Input, da so Bewegungsabläufe nicht gestört werden und geeignete Geräte im Alltag allgegenwärtig sind. Die 3D-Rekonstruktion in diesem Kontext stößt jedoch aufgrund von Okklusionen und Tiefenmehrdeutigkeiten schnell an ihre Grenzen. Die Mehrheit der Arbeiten auf dem neuesten Stand der Technik setzt hierbei auf ein ML-Framework, um diese Mehrdeutigkeiten statistisch aufzulösen; infolgedessen haben all diese mehrere Einschränkungen gemein. Beispielsweise benötigen sie eine große Menge annotierter 3D-Daten, deren Beschaffung arbeitsintensiv und anfällig für systematische Fehler ist. Darüber hinaus sind Merkmale, die mit Anmerkungen nur schwer zu quantifizieren sind – die Details des individuellen Erscheinungsbildes – in einem solchen Rahmen schwer zu rekonstruieren. Bestehende Verfahren gehen auch vereinfachend davon aus, dass nur eine einzige Hand in der Szene vorhanden ist. Zweihand-Interaktionen bringen jedoch zusätzliche Herausforderungen in Form von Okklusion der Hände untereinander, Links-Rechts-Verwirrung und Kollisionsbeschränkungen mit sich, die Einhand-Methoden nicht bewältigen können. Um die oben genannten Mängel früherer Methoden anzugehen, bringt diese Arbeit den Stand der Technik durch die neuartige Verwendung modellbasierter Priors voran, um Hand-spezifisches Wissen zu integrieren. Insbesondere stellt diese Arbeit eine Trainingsmethode vor, die die Menge der erforderlichen Annotationen reduziert und robust gegenüber systemischen Verzerrungen ist; es wird die erste Tracking-Methode vorgestellt, die das herausfordernde Zweihand-Interaktionsszenario mit monokularem RGB-Video angeht, und auch die erste probabilistische Methode zur Modellierung der Bildmehrdeutigkeit für Zweihand-Interaktionen. Darüber hinaus trägt diese Arbeit auch das erste parametrische Handtexturmodell mit Beispielanwendungen in der Hand-Personalisierung bei
    • …
    corecore