388 research outputs found

    Human detection in surveillance videos and its applications - a review

    Get PDF
    Detecting human beings accurately in a visual surveillance system is crucial for diverse application areas including abnormal event detection, human gait characterization, congestion analysis, person identification, gender classification and fall detection for elderly people. The first step of the detection process is to detect an object which is in motion. Object detection could be performed using background subtraction, optical flow and spatio-temporal filtering techniques. Once detected, a moving object could be classified as a human being using shape-based, texture-based or motion-based features. A comprehensive review with comparisons on available techniques for detecting human beings in surveillance videos is presented in this paper. The characteristics of few benchmark datasets as well as the future research directions on human detection have also been discussed

    Visual tracking of small animals in cluttered natural environments using a freely moving camera

    Get PDF
    Image-based tracking of animals in their natural habitats can provide rich behavioural data, but is very challenging due to complex and dynamic background and target appearances. We present an effective method to recover the positions of terrestrial animals in cluttered environments from video sequences filmed using a freely moving monocular camera. The method uses residual motion cues to detect the targets and is thus robust to different lighting conditions and requires no a-priori appearance model of the animal or environment. The detection is globally optimised based on an inference problem formulation using factor graphs. This handles ambiguities such as occlusions and intersections and provides automatic initialisation. Furthermore, this formulation allows a seamless integration of occasional user input for the most difficult situations, so that the effect of a few manual position estimates are smoothly distributed over long sequences. Testing our system against a benchmark dataset featuring small targets in natural scenes, we obtain 96% accuracy for fully automated tracking. We also demonstrate reliable tracking in a new data set that includes different targets (insects, vertebrates or artificial objects) in a variety of environments (desert, jungle, meadows, urban) using different imaging devices (day / night vision cameras, smart phones) and modalities (stationary, hand-held, drone operated)

    Going beyond semantic image segmentation, towards holistic scene understanding, with associative hierarchical random fields

    Get PDF
    In this thesis we exploit the generality and expressive power of the Associative Hierarchical Random Field (AHRF) graphical model to take its use beyond that of semantic image segmentation, into object-classes, towards a framework for holistic scene understanding. We provide a working definition for the holistic approach to scene understanding, which allows for the integration of existing, disparate, applications into an unifying ensemble. We believe that modelling such an ensemble as an AHRF is both a principled and pragmatic solution. We present a hierarchy that shows several methods for fusing applications together with the AHRF graphical model. Each of the three; feature, potential and energy, layers subsumes its predecessor in generality and together give rise to many options for integration. With applications on street scenes we demonstrate an implementation of each layer. The first layer application joins appearance and geometric features. For our second layer we implement a things and stuff co-junction using higher order AHRF potentials for object detectors, with the goal of answering the classic questions: What? Where? and How many? A holistic approach to recognition-and-reconstruction is realised within our third layer by linking two energy based formulations of both applications. Each application is evaluated qualitatively and quantitatively. In all cases our holistic approach shows improvement over baseline methods

    Inverse rendering for scene reconstruction in general environments

    Get PDF
    Demand for high-quality 3D content has been exploding recently, owing to the advances in 3D displays and 3D printing. However, due to insufficient 3D content, the potential of 3D display and printing technology has not been realized to its full extent. Techniques for capturing the real world, which are able to generate 3D models from captured images or videos, are a hot research topic in computer graphics and computer vision. Despite significant progress, many methods are still highly constrained and require lots of prerequisites to succeed. Marker-less performance capture is one such dynamic scene reconstruction technique that is still confined to studio environments. The requirements involved, such as the need for a multi-view camera setup, specially engineered lighting or green-screen backgrounds, prevent these methods from being widely used by the film industry or even by ordinary consumers. In the area of scene reconstruction from images or videos, this thesis proposes new techniques that succeed in general environments, even using as few as two cameras. Contributions are made in terms of reducing the constraints of marker-less performance capture on lighting, background and the required number of cameras. The primary theoretical contribution lies in the investigation of light transport mechanisms for high-quality 3D reconstruction in general environments. Several steps are taken to approach the goal of scene reconstruction in general environments. At first, the concept of employing inverse rendering for scene reconstruction is demonstrated on static scenes, where a high-quality multi-view 3D reconstruction method under general unknown illumination is developed. Then, this concept is extended to dynamic scene reconstruction from multi-view video, where detailed 3D models of dynamic scenes can be captured under general and even varying lighting, and in front of a general scene background without a green screen. Finally, efforts are made to reduce the number of cameras employed. New performance capture methods using as few as two cameras are proposed to capture high-quality 3D geometry in general environments, even outdoors.Die Nachfrage nach qualitativ hochwertigen 3D Modellen ist in letzter Zeit, bedingt durch den technologischen Fortschritt bei 3D-Wieder-gabegeräten und -Druckern, stark angestiegen. Allerdings konnten diese Technologien wegen mangelnder Inhalte nicht ihr volles Potential entwickeln. Methoden zur Erfassung der realen Welt, welche 3D-Modelle aus Bildern oder Videos generieren, sind daher ein brandaktuelles Forschungsthema im Bereich Computergrafik und Bildverstehen. Trotz erheblichen Fortschritts in dieser Richtung sind viele Methoden noch stark eingeschränkt und benötigen viele Voraussetzungen um erfolgreich zu sein. Markerloses Performance Capturing ist ein solches Verfahren, das dynamische Szenen rekonstruiert, aber noch auf Studio-Umgebungen beschränkt ist. Die spezifischen Anforderung solcher Verfahren, wie zum Beispiel einen Mehrkameraaufbau, maßgeschneiderte, kontrollierte Beleuchtung oder Greenscreen-Hintergründe verhindern die Verbreitung dieser Verfahren in der Filmindustrie und besonders bei Endbenutzern. Im Bereich der Szenenrekonstruktion aus Bildern oder Videos schlägt diese Dissertation neue Methoden vor, welche in beliebigen Umgebungen und auch mit nur wenigen (zwei) Kameras funktionieren. Dazu werden Schritte unternommen, um die Einschränkungen bisheriger Verfahren des markerlosen Performance Capturings im Hinblick auf Beleuchtung, Hintergründe und die erforderliche Anzahl von Kameras zu verringern. Der wichtigste theoretische Beitrag liegt in der Untersuchung von Licht-Transportmechanismen für hochwertige 3D-Rekonstruktionen in beliebigen Umgebungen. Dabei werden mehrere Schritte unternommen, um das Ziel der Szenenrekonstruktion in beliebigen Umgebungen anzugehen. Zunächst wird die Anwendung von inversem Rendering auf die Rekonstruktion von statischen Szenen dargelegt, indem ein hochwertiges 3D-Rekonstruktionsverfahren aus Mehransichtsaufnahmen unter beliebiger, unbekannter Beleuchtung entwickelt wird. Dann wird dieses Konzept auf die dynamische Szenenrekonstruktion basierend auf Mehransichtsvideos erweitert, wobei detaillierte 3D-Modelle von dynamischen Szenen unter beliebiger und auch veränderlicher Beleuchtung vor einem allgemeinen Hintergrund ohne Greenscreen erfasst werden. Schließlich werden Anstrengungen unternommen die Anzahl der eingesetzten Kameras zu reduzieren. Dazu werden neue Verfahren des Performance Capturings, unter Verwendung von lediglich zwei Kameras vorgeschlagen, um hochwertige 3D-Geometrie im beliebigen Umgebungen, sowie im Freien, zu erfassen

    Single View Modeling and View Synthesis

    Get PDF
    This thesis develops new algorithms to produce 3D content from a single camera. Today, amateurs can use hand-held camcorders to capture and display the 3D world in 2D, using mature technologies. However, there is always a strong desire to record and re-explore the 3D world in 3D. To achieve this goal, current approaches usually make use of a camera array, which suffers from tedious setup and calibration processes, as well as lack of portability, limiting its application to lab experiments. In this thesis, I try to produce the 3D contents using a single camera, making it as simple as shooting pictures. It requires a new front end capturing device rather than a regular camcorder, as well as more sophisticated algorithms. First, in order to capture the highly detailed object surfaces, I designed and developed a depth camera based on a novel technique called light fall-off stereo (LFS). The LFS depth camera outputs color+depth image sequences and achieves 30 fps, which is necessary for capturing dynamic scenes. Based on the output color+depth images, I developed a new approach that builds 3D models of dynamic and deformable objects. While the camera can only capture part of a whole object at any instance, partial surfaces are assembled together to form a complete 3D model by a novel warping algorithm. Inspired by the success of single view 3D modeling, I extended my exploration into 2D-3D video conversion that does not utilize a depth camera. I developed a semi-automatic system that converts monocular videos into stereoscopic videos, via view synthesis. It combines motion analysis with user interaction, aiming to transfer as much depth inferring work from the user to the computer. I developed two new methods that analyze the optical flow in order to provide additional qualitative depth constraints. The automatically extracted depth information is presented in the user interface to assist with user labeling work. In this thesis, I developed new algorithms to produce 3D contents from a single camera. Depending on the input data, my algorithm can build high fidelity 3D models for dynamic and deformable objects if depth maps are provided. Otherwise, it can turn the video clips into stereoscopic video
    corecore