239 research outputs found

    Collision-free state estimation

    Get PDF
    In state estimation, we often want the maximum likelihood estimate of the current state. For the commonly used joint multivariate Gaussian distribution over the state space, this can be efficiently found using a Kalman filter. However, in complex environments the state space is often highly constrained. For example, for objects within a refrigerator, they cannot interpenetrate each other or the refrigerator walls. The multivariate Gaussian is unconstrained over the state space and cannot incorporate these constraints. In particular, the state estimate returned by the unconstrained distribution may itself be infeasible. Instead, we solve a related constrained optimization problem to find a good feasible state estimate. We illustrate this for estimating collision-free configurations for objects resting stably on a 2-D surface, and demonstrate its utility in a real robot perception domain.National Science Foundation (U.S.) (Grant 019868)United States. Office of Naval Research. Multidisciplinary University Research Initiative (Grant N00014-09-1-1051)United States. Air Force Office of Scientific Research (Grant AOARD-104135

    Visual articulated tracking in cluttered environments

    Get PDF
    This thesis is concerned with the state estimation of an articulated robotic manipulator during interaction with its environment. Traditionally, robot state estimation has relied on proprioceptive sensors as the single source of information about the internal state. In this thesis, we are motivated to shift the focus from proprioceptive to exteroceptive sensing, which is capable to represent a holistic interpretation of the entire manipulation scene. When visually observing grasping tasks, the tracked manipulator is subject to visual distractions caused by the background, the manipulated object and by occlusions from other objects present in the environment. The aim of this thesis is to investigate and develop methods for the robust visual state estimation of articulated kinematic chains in cluttered environments which suffer from partial occlusions. To make these methods widely applicable to a variety of kinematic setups and unseen environments, we intentionally refrain from using prior information about the internal state of the articulated kinematic chain, and we do not explicitly model visual distractions such as the background and manipulated objects in the environment. We approach this problem with model-fitting methods, in which an articulated model is associated to the observed data using discriminative information. We explore model-fitting objectives that are robust to occlusions and unseen environments, methods to generate synthetic training data for data-driven discriminative methods, and robust optimisers to minimise the tracking objective. This thesis contributes (1) an automatic colour and depth image synthesis pipeline for data-driven learning without depending on a real articulated robot; (2) a training strategy for discriminative model-fitting objectives with an implicit representation of objects; (3) a tracking objective that is able to track occluded parts of a kinematic chain; and finally (4) a robust multi-hypotheses optimiser. These contributions are evaluated on two robotic platforms in different environments and with different manipulated and occluding objects. We demonstrate that our image synthesis pipeline generalises well to colour and depth observations of the real robot without requiring real ground truth labelled images. While this synthesis approach introduces a visual simulation-to-reality gap, the combination of our robust tracking objective and optimiser enables stable tracking of an occluded end-effector during manipulation tasks

    MonoSLAM: Real-time single camera SLAM

    No full text
    Published versio

    3D VISUAL TRACKING USING A SINGLE CAMERA

    Get PDF
    automated surveillance and motion based recognition. 3D tracking address the localization of moving target is the 3D space. Therefore, 3D tracking requires 3D measurement of the moving object which cannot be obtained from 2D cameras. Existing 3D tracking systems use multiple cameras for computing the depth of field and it is only used in research laboratories. Millions of surveillance cameras are installed worldwide and all of them capture 2D images. Therefore, 3D tracking cannot be performed with these cameras unless multiple cameras are installed at each location in order to compute the depth. This means installing millions of new cameras which is not a feasible solution. This work introduces a novel depth estimation method from a single 2D image using triangulation. This method computes the absolute depth of field for any object in the scene with high accuracy and short computational time. The developed method is used for performing 3D visual tracking using a single camera by providing the depth of field and ground coordinates of the moving object for each frame accurately and efficiently. Therefore, this technique can help in transforming existing 2D tracking and 2D video analytics into 3D without incurring additional costs. This makes video surveillance more efficient and increases its usage in human life. The proposed methodology uses background subtraction process for detecting a moving object in the image. Then, the newly developed depth estimation method is used for computing the 3D measurement of the moving target. Finally, the unscented Kalman filter is used for tracking the moving object given the 3D measurement obtained by the triangulation method. This system has been test and validated using several video sequences and it shows good performance in term of accuracy and computational complexity

    Robust and real-time hand detection and tracking in monocular video

    Get PDF
    In recent years, personal computing devices such as laptops, tablets and smartphones have become ubiquitous. Moreover, intelligent sensors are being integrated into many consumer devices such as eyeglasses, wristwatches and smart televisions. With the advent of touchscreen technology, a new human-computer interaction (HCI) paradigm arose that allows users to interface with their device in an intuitive manner. Using simple gestures, such as swipe or pinch movements, a touchscreen can be used to directly interact with a virtual environment. Nevertheless, touchscreens still form a physical barrier between the virtual interface and the real world. An increasingly popular field of research that tries to overcome this limitation, is video based gesture recognition, hand detection and hand tracking. Gesture based interaction allows the user to directly interact with the computer in a natural manner by exploring a virtual reality using nothing but his own body language. In this dissertation, we investigate how robust hand detection and tracking can be accomplished under real-time constraints. In the context of human-computer interaction, real-time is defined as both low latency and low complexity, such that a complete video frame can be processed before the next one becomes available. Furthermore, for practical applications, the algorithms should be robust to illumination changes, camera motion, and cluttered backgrounds in the scene. Finally, the system should be able to initialize automatically, and to detect and recover from tracking failure. We study a wide variety of existing algorithms, and propose significant improvements and novel methods to build a complete detection and tracking system that meets these requirements. Hand detection, hand tracking and hand segmentation are related yet technically different challenges. Whereas detection deals with finding an object in a static image, tracking considers temporal information and is used to track the position of an object over time, throughout a video sequence. Hand segmentation is the task of estimating the hand contour, thereby separating the object from its background. Detection of hands in individual video frames allows us to automatically initialize our tracking algorithm, and to detect and recover from tracking failure. Human hands are highly articulated objects, consisting of finger parts that are connected with joints. As a result, the appearance of a hand can vary greatly, depending on the assumed hand pose. Traditional detection algorithms often assume that the appearance of the object of interest can be described using a rigid model and therefore can not be used to robustly detect human hands. Therefore, we developed an algorithm that detects hands by exploiting their articulated nature. Instead of resorting to a template based approach, we probabilistically model the spatial relations between different hand parts, and the centroid of the hand. Detecting hand parts, such as fingertips, is much easier than detecting a complete hand. Based on our model of the spatial configuration of hand parts, the detected parts can be used to obtain an estimate of the complete hand's position. To comply with the real-time constraints, we developed techniques to speed-up the process by efficiently discarding unimportant information in the image. Experimental results show that our method is competitive with the state-of-the-art in object detection while providing a reduction in computational complexity with a factor 1 000. Furthermore, we showed that our algorithm can also be used to detect other articulated objects such as persons or animals and is therefore not restricted to the task of hand detection. Once a hand has been detected, a tracking algorithm can be used to continuously track its position in time. We developed a probabilistic tracking method that can cope with uncertainty caused by image noise, incorrect detections, changing illumination, and camera motion. Furthermore, our tracking system automatically determines the number of hands in the scene, and can cope with hands entering or leaving the video canvas. We introduced several novel techniques that greatly increase tracking robustness, and that can also be applied in other domains than hand tracking. To achieve real-time processing, we investigated several techniques to reduce the search space of the problem, and deliberately employ methods that are easily parallelized on modern hardware. Experimental results indicate that our methods outperform the state-of-the-art in hand tracking, while providing a much lower computational complexity. One of the methods used by our probabilistic tracking algorithm, is optical flow estimation. Optical flow is defined as a 2D vector field describing the apparent velocities of objects in a 3D scene, projected onto the image plane. Optical flow is known to be used by many insects and birds to visually track objects and to estimate their ego-motion. However, most optical flow estimation methods described in literature are either too slow to be used in real-time applications, or are not robust to illumination changes and fast motion. We therefore developed an optical flow algorithm that can cope with large displacements, and that is illumination independent. Furthermore, we introduce a regularization technique that ensures a smooth flow-field. This regularization scheme effectively reduces the number of noisy and incorrect flow-vector estimates, while maintaining the ability to handle motion discontinuities caused by object boundaries in the scene. The above methods are combined into a hand tracking framework which can be used for interactive applications in unconstrained environments. To demonstrate the possibilities of gesture based human-computer interaction, we developed a new type of computer display. This display is completely transparent, allowing multiple users to perform collaborative tasks while maintaining eye contact. Furthermore, our display produces an image that seems to float in thin air, such that users can touch the virtual image with their hands. This floating imaging display has been showcased on several national and international events and tradeshows. The research that is described in this dissertation has been evaluated thoroughly by comparing detection and tracking results with those obtained by state-of-the-art algorithms. These comparisons show that the proposed methods outperform most algorithms in terms of accuracy, while achieving a much lower computational complexity, resulting in a real-time implementation. Results are discussed in depth at the end of each chapter. This research further resulted in an international journal publication; a second journal paper that has been submitted and is under review at the time of writing this dissertation; nine international conference publications; a national conference publication; a commercial license agreement concerning the research results; two hardware prototypes of a new type of computer display; and a software demonstrator

    An Insect-Inspired Target Tracking Mechanism for Autonomous Vehicles

    Get PDF
    Target tracking is a complicated task from an engineering perspective, especially where targets are small and seen against complex natural environments. Due to the high demand for robust target tracking algorithms a great deal of research has focused on this area. However, most engineering solutions developed for this purpose are often unreliable in real world conditions or too computationally expensive to be used in real-time applications. While engineering methods try to solve the problem of target detection and tracking by using high resolution input images, fast processors, with typically computationally expensive methods, a quick glance at nature provides evidence that practical real world solutions for target tracking exist. Many animals track targets for predation, territorial or mating purposes and with millions of years of evolution behind them, it seems reasonable to assume that these solutions are highly efficient. For instance, despite their low resolution compound eyes and tiny brains, many flying insects have evolved superb abilities to track targets in visual clutter even in the presence of other distracting stimuli, such as swarms of prey and conspecifics. The accessibility of the dragonfly for stable electrophysiological recordings makes this insect an ideal and tractable model system for investigating the neuronal correlates for complex tasks such as target pursuit. Studies on dragonflies identified and characterized a set of neurons likely to mediate target detection and pursuit referred to as ‘small target motion detector’ (STMD) neurons. These neurons are selective for tiny targets, are velocity-tuned, contrast-sensitive and respond robustly to targets even against the motion of background. These neurons have shown several high-order properties which can contribute to the dragonfly’s ability to robustly pursue prey with over a 97% success rate. These include the recent electrophysiological observations of response ‘facilitation’ (a slow build-up of response to targets that move on long, continuous trajectories) and ‘selective attention’, a competitive mechanism that selects one target from alternatives. In this thesis, I adopted a bio-inspired approach to develop a solution for the problem of target tracking and pursuit. Directly inspired by recent physiological breakthroughs in understanding the insect brain, I developed a closed-loop target tracking system that uses an active saccadic gaze fixation strategy inspired by insect pursuit. First, I tested this model in virtual world simulations using MATLAB/Simulink. The results of these simulations show robust performance of this insect-inspired model, achieving high prey capture success even within complex background clutter, low contrast and high relative speed of pursued prey. Additionally, these results show that inclusion of facilitation not only substantially improves success for even short-duration pursuits, it also enhances the ability to ‘attend’ to one target in the presence of distracters. This inspect-inspired system has a relatively simple image processing strategy compared to state-of-the-art trackers developed recently for computer vision applications. Traditional machine vision approaches incorporate elaborations to handle challenges and non-idealities in the natural environments such as local flicker and illumination changes, and non-smooth and non-linear target trajectories. Therefore, the question arises as whether this insect inspired tracker can match their performance when given similar challenges? I investigated this question by testing both the efficacy and efficiency of this insect-inspired model in open-loop, using a widely-used set of videos recorded under natural conditions. I directly compared the performance of this model with several state-of-the-art engineering algorithms using the same hardware, software environment and stimuli. This insect-inspired model exhibits robust performance in tracking small moving targets even in very challenging natural scenarios, outperforming the best of the engineered approaches. Furthermore, it operates more efficiently compared to the other approaches, in some cases dramatically so. Computer vision literature traditionally test target tracking algorithms only in open-loop. However, one of the main purposes for developing these algorithms is implementation in real-time robotic applications. Therefore, it is still unclear how these algorithms might perform in closed-loop real-world applications where inclusion of sensors and actuators on a physical robot results in additional latency which can affect the stability of the feedback process. Additionally, studies show that animals interact with the target by changing eye or body movements, which then modulate the visual inputs underlying the detection and selection task (via closed-loop feedback). This active vision system may be a key to exploiting visual information by the simple insect brain for complex tasks such as target tracking. Therefore, I implemented this insect-inspired model along with insect active vision in a robotic platform. I tested this robotic implementation both in indoor and outdoor environments against different challenges which exist in real-world conditions such as vibration, illumination variation, and distracting stimuli. The experimental results show that the robotic implementation is capable of handling these challenges and robustly pursuing a target even in highly challenging scenarios.Thesis (Ph.D.) -- University of Adelaide, School of Mechanical Engineering, 201

    Semantic Robot Programming for Taskable Goal-Directed Manipulation

    Full text link
    Autonomous robots have the potential to assist people to be more productive in factories, homes, hospitals, and similar environments. Unlike traditional industrial robots that are pre-programmed for particular tasks in controlled environments, modern autonomous robots should be able to perform arbitrary user-desired tasks. Thus, it is beneficial to provide pathways to enable users to program an arbitrary robot to perform an arbitrary task in an arbitrary world. Advances in robot Programming by Demonstration (PbD) has made it possible for end-users to program robot behavior for performing desired tasks through demonstrations. However, it still remains a challenge for users to program robot behavior in a generalizable, performant, scalable, and intuitive manner. In this dissertation, we address the problem of robot programming by demonstration in a declarative manner by introducing the concept of Semantic Robot Programming (SRP). In SRP, we focus on addressing the following challenges for robot PbD: 1) generalization across robots, tasks, and worlds, 2) robustness under partial observations of cluttered scenes, 3) efficiency in task performance as the workspace scales up, and 4) feasibly intuitive modalities of interaction for end-users to demonstrate tasks to robots. Through SRP, our objective is to enable an end-user to intuitively program a mobile manipulator by providing a workspace demonstration of the desired goal scene. We use a scene graph to semantically represent conditions on the current and goal states of the world. To estimate the scene graph given raw sensor observations, we bring together discriminative object detection and generative state estimation for the inference of object classes and poses. The proposed scene estimation method outperformed the state of the art in cluttered scenes. With SRP, we successfully enabled users to program a Fetch robot to set up a kitchen tray on a cluttered tabletop in 10 different start and goal settings. In order to scale up SRP from tabletop to large scale, we propose Contextual-Temporal Mapping (CT-Map) for semantic mapping of large scale scenes given streaming sensor observations. We model the semantic mapping problem via a Conditional Random Field (CRF), which accounts for spatial dependencies between objects. Over time, object poses and inter-object spatial relations can vary due to human activities. To deal with such dynamics, CT-Map maintains the belief over object classes and poses across an observed environment. We present CT-Map semantically mapping cluttered rooms with robustness to perceptual ambiguities, demonstrating higher accuracy on object detection and 6 DoF pose estimation compared to state-of-the-art neural network-based object detector and commonly adopted 3D registration methods. Towards SRP at the building scale, we explore notions of Generalized Object Permanence (GOP) for robots to search for objects efficiently. We state the GOP problem as the prediction of where an object can be located when it is not being directly observed by a robot. We model object permanence via a factor graph inference model, with factors representing long-term memory, short-term memory, and common sense knowledge over inter-object spatial relations. We propose the Semantic Linking Maps (SLiM) model to maintain the belief over object locations while accounting for object permanence through a CRF. Based on the belief maintained by SLiM, we present a hybrid object search strategy that enables the Fetch robot to actively search for objects on a large scale, with a higher search success rate and less search time compared to state-of-the-art search methods.PHDElectrical and Computer EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttps://deepblue.lib.umich.edu/bitstream/2027.42/155073/1/zengzhen_1.pd

    A Methodology for Extracting Human Bodies from Still Images

    Get PDF
    Monitoring and surveillance of humans is one of the most prominent applications of today and it is expected to be part of many future aspects of our life, for safety reasons, assisted living and many others. Many efforts have been made towards automatic and robust solutions, but the general problem is very challenging and remains still open. In this PhD dissertation we examine the problem from many perspectives. First, we study the performance of a hardware architecture designed for large-scale surveillance systems. Then, we focus on the general problem of human activity recognition, present an extensive survey of methodologies that deal with this subject and propose a maturity metric to evaluate them. One of the numerous and most popular algorithms for image processing found in the field is image segmentation and we propose a blind metric to evaluate their results regarding the activity at local regions. Finally, we propose a fully automatic system for segmenting and extracting human bodies from challenging single images, which is the main contribution of the dissertation. Our methodology is a novel bottom-up approach relying mostly on anthropometric constraints and is facilitated by our research in the fields of face, skin and hands detection. Experimental results and comparison with state-of-the-art methodologies demonstrate the success of our approach

    Adaptive visual sampling

    Get PDF
    PhDVarious visual tasks may be analysed in the context of sampling from the visual field. In visual psychophysics, human visual sampling strategies have often been shown at a high-level to be driven by various information and resource related factors such as the limited capacity of the human cognitive system, the quality of information gathered, its relevance in context and the associated efficiency of recovering it. At a lower-level, we interpret many computer vision tasks to be rooted in similar notions of contextually-relevant, dynamic sampling strategies which are geared towards the filtering of pixel samples to perform reliable object association. In the context of object tracking, the reliability of such endeavours is fundamentally rooted in the continuing relevance of object models used for such filtering, a requirement complicated by realworld conditions such as dynamic lighting that inconveniently and frequently cause their rapid obsolescence. In the context of recognition, performance can be hindered by the lack of learned context-dependent strategies that satisfactorily filter out samples that are irrelevant or blunt the potency of models used for discrimination. In this thesis we interpret the problems of visual tracking and recognition in terms of dynamic spatial and featural sampling strategies and, in this vein, present three frameworks that build on previous methods to provide a more flexible and effective approach. Firstly, we propose an adaptive spatial sampling strategy framework to maintain statistical object models for real-time robust tracking under changing lighting conditions. We employ colour features in experiments to demonstrate its effectiveness. The framework consists of five parts: (a) Gaussian mixture models for semi-parametric modelling of the colour distributions of multicolour objects; (b) a constructive algorithm that uses cross-validation for automatically determining the number of components for a Gaussian mixture given a sample set of object colours; (c) a sampling strategy for performing fast tracking using colour models; (d) a Bayesian formulation enabling models of object and the environment to be employed together in filtering samples by discrimination; and (e) a selectively-adaptive mechanism to enable colour models to cope with changing conditions and permit more robust tracking. Secondly, we extend the concept to an adaptive spatial and featural sampling strategy to deal with very difficult conditions such as small target objects in cluttered environments undergoing severe lighting fluctuations and extreme occlusions. This builds on previous work on dynamic feature selection during tracking by reducing redundancy in features selected at each stage as well as more naturally balancing short-term and long-term evidence, the latter to facilitate model rigidity under sharp, temporary changes such as occlusion whilst permitting model flexibility under slower, long-term changes such as varying lighting conditions. This framework consists of two parts: (a) Attribute-based Feature Ranking (AFR) which combines two attribute measures; discriminability and independence to other features; and (b) Multiple Selectively-adaptive Feature Models (MSFM) which involves maintaining a dynamic feature reference of target object appearance. We call this framework Adaptive Multi-feature Association (AMA). Finally, we present an adaptive spatial and featural sampling strategy that extends established Local Binary Pattern (LBP) methods and overcomes many severe limitations of the traditional approach such as limited spatial support, restricted sample sets and ad hoc joint and disjoint statistical distributions that may fail to capture important structure. Our framework enables more compact, descriptive LBP type models to be constructed which may be employed in conjunction with many existing LBP techniques to improve their performance without modification. The framework consists of two parts: (a) a new LBP-type model known as Multiscale Selected Local Binary Features (MSLBF); and (b) a novel binary feature selection algorithm called Binary Histogram Intersection Minimisation (BHIM) which is shown to be more powerful than established methods used for binary feature selection such as Conditional Mutual Information Maximisation (CMIM) and AdaBoost
    corecore