155,062 research outputs found

    Real-Time Occlusion Handling in Augmented Reality Based on an Object Tracking Approach

    Get PDF
    To produce a realistic augmentation in Augmented Reality, the correct relative positions of real objects and virtual objects are very important. In this paper, we propose a novel real-time occlusion handling method based on an object tracking approach. Our method is divided into three steps: selection of the occluding object, object tracking and occlusion handling. The user selects the occluding object using an interactive segmentation method. The contour of the selected object is then tracked in the subsequent frames in real-time. In the occlusion handling step, all the pixels on the tracked object are redrawn on the unprocessed augmented image to produce a new synthesized image in which the relative position between the real and virtual object is correct. The proposed method has several advantages. First, it is robust and stable, since it remains effective when the camera is moved through large changes of viewing angles and volumes or when the object and the background have similar colors. Second, it is fast, since the real object can be tracked in real-time. Last, a smoothing technique provides seamless merging between the augmented and virtual object. Several experiments are provided to validate the performance of the proposed method

    움직이는 물체 검출 및 추적을 위한 생체 모방 모델

    Get PDF
    학위논문 (박사)-- 서울대학교 대학원 : 전기·컴퓨터공학부, 2014. 2. 최진영.In this thesis, we propose bio-mimetic models for motion detection and visual tracking to overcome the limitations of existing methods in actual environments. The models are inspired from the theory that there are four different forms of visual memory for human visual perception when representing a scenevisible persistence, informational persistence, visual short-term memory (VSTM), and visual long-term memory (VLTM). We view our problem as a problem of modeling and representing an observed scene with temporary short-term models (TSTM) and conservative long-term models (CLTM). We study on building efficient and effective models for TSTM and CLTM, and utilizing them together to obtain robust detection and tracking results under occlusions, clumsy initializations, background clutters, drifting, and non-rigid deformations encountered in actual environments. First, we propose an efficient representation of TSTM to be used for moving object detection on non-stationary cameras, which runs within 5.8 milliseconds (ms) on a PC, and real-time on mobile devices. To achieve real-time capability with robust performance, our method models the background through the proposed dual-mode kernel model (DMKM) and compensates the motion of the camera by mixing neighboring models. Modeling through DMKM prevents the background model from being contaminated by foreground pixels, while still allowing the model to be able to adapt to changes of the background. Mixing neighboring models reduces the errors arising from motion compensation and their influences are further reduced by keeping the age of the model. Also, to decrease computation load, the proposed method applies one DMKM to multiple pixels without performance degradation. Experimental results show the computational lightness and the real-time capability of our method on a smart phone with robust detection performances. Second, by using the concept from both TSTM and CLTM, a new visual tracking method using the novel tri-model is proposed. The proposed method aims to solve the problems of occlusions, background clutters, and drifting simultaneously with the new tri-model. The proposed tri-model is composed of three models, where each model learns the target object, the background, and other non-target moving objects online. The proposed scheme performs tracking by finding the best explanation of the scene with the three learned models. By utilizing the information in the background and the foreground models as well as the target object model, our method obtains robust results under occlusions and background clutters. Also, the target object model is updated in a conservative way to prevent drifting. Furthermore, our method is not restricted to bounding-boxes when representing the target object, and is able to give pixel-wise tracking results. Third, we go beyond pixel-wise modeling and propose a local feature based tracking model using both TSTM and CLTM to track objects in case of uncertain initializations and severe occlusions. To track objects accurately in such situations, the proposed scheme uses ``motion saliency'' and ``descriptor saliency'' of local features and performs tracking based on generalized Hough transform (GHT). The proposed motion saliency of a local feature utilizes instantaneous velocity of features to form TSTM and emphasizes features having distinctive motions, compared to the motions coming from local features which are not from the object. The descriptor saliency models local features as CLTM and emphasizes features which are likely to be of the object in terms of its feature descriptors. Through these saliencies, the proposed method tries to ``learn and find'' the target object rather than looking for what was given at initialization, becoming robust to initialization problems. Also, our tracking result is obtained by combining the results of each local features of the target and the surroundings, thus being robust against severe occlusions as well. The proposed method is compared against eight other methods, with nine image sequences, and hundred random initializations. The experimental results show that our method outperforms all other compared methods. Fourth and last, we focus on building robust CLTM with local patches and their neighboring structures. The proposed method is based on sequential Bayesian inference and focuses on solving both the problem of tracking under partial occlusions and the problem of non-rigid object tracking in real-time on desktop personal computers (PC). The proposed scheme is mainly composed of two parts: (1) modeling the target object using elastic structure of local patches for robust performanceand (2) efficient hierarchical diffusion method to perform the tracking process in real-time. The elastic structure of local patches allows the proposed scheme to handle partial occlusions and non-rigid deformations through the relationship among neighboring patches. The proposed hierarchical diffusion generates samples from the region where the posterior is concentrated to reduce computation time. The method is extensively tested on a number of challenging image sequences with occlusion and non-rigid deformation. The experimental results show the real-time capability and the robustness of the proposed scheme under various situations.1 Introduction 1.1 Background and Research Issues 1.1.1 Issues in Motion Detection 1.1.2 Issues in Object Tracking 1.2 The Human Visual Memory 1.2.1 Sensory Memory 1.2.2 Visual Short-Term Memory 1.2.3 Visual Long-Term Memory 1.3 Bio-mimetic Framework for Detection and Tracking 1.4 Contents of the Research 2 Detection by Pixel-wise Dual-Mode Kernel Model 2.1 Proposed Method 2.1.1 Approximated Gaussian Kernel Model 2.1.2 Dual-Mode Kernel Model (DMKM) 2.1.3 Motion Compensation by Mixing Models 2.1.4 Detection of Foreground Pixels 2.2 Experimental Results 2.2.1 Runtime Comparison 2.2.2 Qualitative Comparison 2.2.3 Quantitative Comparison 2.2.4 Effects of Dual-Mode Kernel Model 2.2.5 Effects of Motion Compensation 2.2.6 Mobile Results 2.3 Remarks and Discussion 3 Tracking by Pixel-wise Tri-Model Representation 3.1 Tri-Model Framework 3.1.1 Overall Scheme 3.1.2 Advantages 3.1.3 Practical Approximation 3.2 Tracking with the Tri-Model 3.2.1 Likelihood of the Tri-Model 3.2.2 Likelihood Maximization 3.2.3 Estimating Pixel-Wise Labels 3.3 Learning the Tri-Model 3.3.1 Target Model 3.3.2 Background Model 3.3.3 Foreground Model 3.4 Experimental Results 3.4.1 Experimental Settings 3.4.2 Tracking Accuracy: Bounding Box 3.4.3 Tracking Accuracy: Pixel-Wise 3.5 Remarks and Discussion 4 Tracking by Feature-point-wise Saliency Model 4.1 Proposed Method 4.1.1 Tracking based on GHT 4.1.2 Descriptor Saliency and Feature DB Update 4.1.3 Motion Saliency 4.2 Experimental Results 4.2.1 Tracking with Inaccurate Initializations 4.2.2 Tracking Under Occlusions 4.3 Remarks and Discussion 5 Tracking by Patch-wise Elastic Structure Model 5.1 Tracking with Elastic Structure of Local Patches 5.1.1 Sequential Bayesian Inference Framework 5.1.2 Elastic Structure of Local Patches 5.1.3 Modeling a Single Patch 5.1.4 Modeling the Relationship between Patches 5.1.5 Model Update 5.1.6 Hierarchical Diffusion 5.1.7 Summary of the Proposed Method 5.2 Experiments 5.2.1 Parameter Effects 5.2.2 Performance Evaluation 5.2.3 Discussion on Translation, Rotation, Illumination Changes 5.2.4 Discussion on Partial Occlusions 5.2.5 Discussion on Non-Rigid Deformations 5.2.6 Discussion on Additional Cases 5.2.7 Summary of Tracking Results 5.2.8 Effectiveness of Hierarchical Diffusion 5.2.9 Limitations 5.3 Remarks and Discussion 6 Concluding Remarks and Future Works Bibliography Abstract in KoreanDocto

    Video analytics for security systems

    Get PDF
    This study has been conducted to develop robust event detection and object tracking algorithms that can be implemented in real time video surveillance applications. The aim of the research has been to produce an automated video surveillance system that is able to detect and report potential security risks with minimum human intervention. Since the algorithms are designed to be implemented in real-life scenarios, they must be able to cope with strong illumination changes and occlusions. The thesis is divided into two major sections. The first section deals with event detection and edge based tracking while the second section describes colour measurement methods developed to track objects in crowded environments. The event detection methods presented in the thesis mainly focus on detection and tracking of objects that become stationary in the scene. Objects such as baggage left in public places or vehicles parked illegally can cause a serious security threat. A new pixel based classification technique has been developed to detect objects of this type in cluttered scenes. Once detected, edge based object descriptors are obtained and stored as templates for tracking purposes. The consistency of these descriptors is examined using an adaptive edge orientation based technique. Objects are tracked and alarm events are generated if the objects are found to be stationary in the scene after a certain period of time. To evaluate the full capabilities of the pixel based classification and adaptive edge orientation based tracking methods, the model is tested using several hours of real-life video surveillance scenarios recorded at different locations and time of day from our own and publically available databases (i-LIDS, PETS, MIT, ViSOR). The performance results demonstrate that the combination of pixel based classification and adaptive edge orientation based tracking gave over 95% success rate. The results obtained also yield better detection and tracking results when compared with the other available state of the art methods. In the second part of the thesis, colour based techniques are used to track objects in crowded video sequences in circumstances of severe occlusion. A novel Adaptive Sample Count Particle Filter (ASCPF) technique is presented that improves the performance of the standard Sample Importance Resampling Particle Filter by up to 80% in terms of computational cost. An appropriate particle range is obtained for each object and the concept of adaptive samples is introduced to keep the computational cost down. The objective is to keep the number of particles to a minimum and only to increase them up to the maximum, as and when required. Variable standard deviation values for state vector elements have been exploited to cope with heavy occlusion. The technique has been tested on different video surveillance scenarios with variable object motion, strong occlusion and change in object scale. Experimental results show that the proposed method not only tracks the object with comparable accuracy to existing particle filter techniques but is up to five times faster. Tracking objects in a multi camera environment is discussed in the final part of the thesis. The ASCPF technique is deployed within a multi-camera environment to track objects across different camera views. Such environments can pose difficult challenges such as changes in object scale and colour features as the objects move from one camera view to another. Variable standard deviation values of the ASCPF have been utilized in order to cope with sudden colour and scale changes. As the object moves from one scene to another, the number of particles, together with the spread value, is increased to a maximum to reduce any effects of scale and colour change. Promising results are obtained when the ASCPF technique is tested on live feeds from four different camera views. It was found that not only did the ASCPF method result in the successful tracking of the moving object across different views but also maintained the real time frame rate due to its reduced computational cost thus indicating that the method is a potential practical solution for multi camera tracking applications

    Real-time Hand Tracking under Occlusion from an Egocentric RGB-D Sensor

    No full text
    We present an approach for real-time, robust and accurate hand pose estimation from moving egocentric RGB-D cameras in cluttered real environments. Existing methods typically fail for hand-object interactions in cluttered scenes imaged from egocentric viewpoints, common for virtual or augmented reality applications. Our approach uses two subsequently applied Convolutional Neural Networks (CNNs) to localize the hand and regress 3D joint locations. Hand localization is achieved by using a CNN to estimate the 2D position of the hand center in the input, even in the presence of clutter and occlusions. The localized hand position, together with the corresponding input depth value, is used to generate a normalized cropped image that is fed into a second CNN to regress relative 3D hand joint locations in real time. For added accuracy, robustness and temporal stability, we refine the pose estimates using a kinematic pose tracking energy. To train the CNNs, we introduce a new photorealistic dataset that uses a merged reality approach to capture and synthesize large amounts of annotated data of natural hand interaction in cluttered scenes. Through quantitative and qualitative evaluation, we show that our method is robust to self-occlusion and occlusions by objects, particularly in moving egocentric perspectives

    Visual Localization and Mapping in Dynamic and Changing Environments

    Full text link
    The real-world deployment of fully autonomous mobile robots depends on a robust SLAM (Simultaneous Localization and Mapping) system, capable of handling dynamic environments, where objects are moving in front of the robot, and changing environments, where objects are moved or replaced after the robot has already mapped the scene. This paper presents Changing-SLAM, a method for robust Visual SLAM in both dynamic and changing environments. This is achieved by using a Bayesian filter combined with a long-term data association algorithm. Also, it employs an efficient algorithm for dynamic keypoints filtering based on object detection that correctly identify features inside the bounding box that are not dynamic, preventing a depletion of features that could cause lost tracks. Furthermore, a new dataset was developed with RGB-D data especially designed for the evaluation of changing environments on an object level, called PUC-USP dataset. Six sequences were created using a mobile robot, an RGB-D camera and a motion capture system. The sequences were designed to capture different scenarios that could lead to a tracking failure or a map corruption. To the best of our knowledge, Changing-SLAM is the first Visual SLAM system that is robust to both dynamic and changing environments, not assuming a given camera pose or a known map, being also able to operate in real time. The proposed method was evaluated using benchmark datasets and compared with other state-of-the-art methods, proving to be highly accurate.Comment: 14 pages, 13 figure

    Robust and real-time hand detection and tracking in monocular video

    Get PDF
    In recent years, personal computing devices such as laptops, tablets and smartphones have become ubiquitous. Moreover, intelligent sensors are being integrated into many consumer devices such as eyeglasses, wristwatches and smart televisions. With the advent of touchscreen technology, a new human-computer interaction (HCI) paradigm arose that allows users to interface with their device in an intuitive manner. Using simple gestures, such as swipe or pinch movements, a touchscreen can be used to directly interact with a virtual environment. Nevertheless, touchscreens still form a physical barrier between the virtual interface and the real world. An increasingly popular field of research that tries to overcome this limitation, is video based gesture recognition, hand detection and hand tracking. Gesture based interaction allows the user to directly interact with the computer in a natural manner by exploring a virtual reality using nothing but his own body language. In this dissertation, we investigate how robust hand detection and tracking can be accomplished under real-time constraints. In the context of human-computer interaction, real-time is defined as both low latency and low complexity, such that a complete video frame can be processed before the next one becomes available. Furthermore, for practical applications, the algorithms should be robust to illumination changes, camera motion, and cluttered backgrounds in the scene. Finally, the system should be able to initialize automatically, and to detect and recover from tracking failure. We study a wide variety of existing algorithms, and propose significant improvements and novel methods to build a complete detection and tracking system that meets these requirements. Hand detection, hand tracking and hand segmentation are related yet technically different challenges. Whereas detection deals with finding an object in a static image, tracking considers temporal information and is used to track the position of an object over time, throughout a video sequence. Hand segmentation is the task of estimating the hand contour, thereby separating the object from its background. Detection of hands in individual video frames allows us to automatically initialize our tracking algorithm, and to detect and recover from tracking failure. Human hands are highly articulated objects, consisting of finger parts that are connected with joints. As a result, the appearance of a hand can vary greatly, depending on the assumed hand pose. Traditional detection algorithms often assume that the appearance of the object of interest can be described using a rigid model and therefore can not be used to robustly detect human hands. Therefore, we developed an algorithm that detects hands by exploiting their articulated nature. Instead of resorting to a template based approach, we probabilistically model the spatial relations between different hand parts, and the centroid of the hand. Detecting hand parts, such as fingertips, is much easier than detecting a complete hand. Based on our model of the spatial configuration of hand parts, the detected parts can be used to obtain an estimate of the complete hand's position. To comply with the real-time constraints, we developed techniques to speed-up the process by efficiently discarding unimportant information in the image. Experimental results show that our method is competitive with the state-of-the-art in object detection while providing a reduction in computational complexity with a factor 1 000. Furthermore, we showed that our algorithm can also be used to detect other articulated objects such as persons or animals and is therefore not restricted to the task of hand detection. Once a hand has been detected, a tracking algorithm can be used to continuously track its position in time. We developed a probabilistic tracking method that can cope with uncertainty caused by image noise, incorrect detections, changing illumination, and camera motion. Furthermore, our tracking system automatically determines the number of hands in the scene, and can cope with hands entering or leaving the video canvas. We introduced several novel techniques that greatly increase tracking robustness, and that can also be applied in other domains than hand tracking. To achieve real-time processing, we investigated several techniques to reduce the search space of the problem, and deliberately employ methods that are easily parallelized on modern hardware. Experimental results indicate that our methods outperform the state-of-the-art in hand tracking, while providing a much lower computational complexity. One of the methods used by our probabilistic tracking algorithm, is optical flow estimation. Optical flow is defined as a 2D vector field describing the apparent velocities of objects in a 3D scene, projected onto the image plane. Optical flow is known to be used by many insects and birds to visually track objects and to estimate their ego-motion. However, most optical flow estimation methods described in literature are either too slow to be used in real-time applications, or are not robust to illumination changes and fast motion. We therefore developed an optical flow algorithm that can cope with large displacements, and that is illumination independent. Furthermore, we introduce a regularization technique that ensures a smooth flow-field. This regularization scheme effectively reduces the number of noisy and incorrect flow-vector estimates, while maintaining the ability to handle motion discontinuities caused by object boundaries in the scene. The above methods are combined into a hand tracking framework which can be used for interactive applications in unconstrained environments. To demonstrate the possibilities of gesture based human-computer interaction, we developed a new type of computer display. This display is completely transparent, allowing multiple users to perform collaborative tasks while maintaining eye contact. Furthermore, our display produces an image that seems to float in thin air, such that users can touch the virtual image with their hands. This floating imaging display has been showcased on several national and international events and tradeshows. The research that is described in this dissertation has been evaluated thoroughly by comparing detection and tracking results with those obtained by state-of-the-art algorithms. These comparisons show that the proposed methods outperform most algorithms in terms of accuracy, while achieving a much lower computational complexity, resulting in a real-time implementation. Results are discussed in depth at the end of each chapter. This research further resulted in an international journal publication; a second journal paper that has been submitted and is under review at the time of writing this dissertation; nine international conference publications; a national conference publication; a commercial license agreement concerning the research results; two hardware prototypes of a new type of computer display; and a software demonstrator

    Automatic vehicle detection and tracking in aerial video

    Get PDF
    This thesis is concerned with the challenging tasks of automatic and real-time vehicle detection and tracking from aerial video. The aim of this thesis is to build an automatic system that can accurately localise any vehicles that appear in aerial video frames and track the target vehicles with trackers. Vehicle detection and tracking have many applications and this has been an active area of research during recent years; however, it is still a challenge to deal with certain realistic environments. This thesis develops vehicle detection and tracking algorithms which enhance the robustness of detection and tracking beyond the existing approaches. The basis of the vehicle detection system proposed in this thesis has different object categorisation approaches, with colour and texture features in both point and area template forms. The thesis also proposes a novel Self-Learning Tracking and Detection approach, which is an extension to the existing Tracking Learning Detection (TLD) algorithm. There are a number of challenges in vehicle detection and tracking. The most difficult challenge of detection is distinguishing and clustering the target vehicle from the background objects and noises. Under certain conditions, the images captured from Unmanned Aerial Vehicles (UAVs) are also blurred; for example, turbulence may make the vehicle shake during flight. This thesis tackles these challenges by applying integrated multiple feature descriptors for real-time processing. In this thesis, three vehicle detection approaches are proposed: the HSV-GLCM feature approach, the ISM-SIFT feature approach and the FAST-HoG approach. The general vehicle detection approaches used have highly flexible implicit shape representations. They are based on training samples in both positive and negative sets and use updated classifiers to distinguish the targets. It has been found that the detection results attained by using HSV-GLCM texture features can be affected by blurring problems; the proposed detection algorithms can further segment the edges of the vehicles from the background. Using the point descriptor feature can solve the blurring problem, however, the large amount of information contained in point descriptors can lead to processing times that are too long for real-time applications. So the FAST-HoG approach combining the point feature and the shape feature is proposed. This new approach is able to speed up the process that attains the real-time performance. Finally, a detection approach using HoG with the FAST feature is also proposed. The HoG approach is widely used in object recognition, as it has a strong ability to represent the shape vector of the object. However, the original HoG feature is sensitive to the orientation of the target; this method improves the algorithm by inserting the direction vectors of the targets. For the tracking process, a novel tracking approach was proposed, an extension of the TLD algorithm, in order to track multiple targets. The extended approach upgrades the original system, which can only track a single target, which must be selected before the detection and tracking process. The greatest challenge to vehicle tracking is long-term tracking. The target object can change its appearance during the process and illumination and scale changes can also occur. The original TLD feature assumed that tracking can make errors during the tracking process, and the accumulation of these errors could cause tracking failure, so the original TLD proposed using a learning approach in between the tracking and the detection by adding a pair of inspectors (positive and negative) to constantly estimate errors. This thesis extends the TLD approach with a new detection method in order to achieve multiple-target tracking. A Forward and Backward Tracking approach has been proposed to eliminate tracking errors and other problems such as occlusion. The main purpose of the proposed tracking system is to learn the features of the targets during tracking and re-train the detection classifier for further processes. This thesis puts particular emphasis on vehicle detection and tracking in different extreme scenarios such as crowed highway vehicle detection, blurred images and changes in the appearance of the targets. Compared with currently existing detection and tracking approaches, the proposed approaches demonstrate a robust increase in accuracy in each scenario
    corecore