1,417 research outputs found

    Simultaneous Association and Localization for Multi-Camera Multi-Target Tracking

    Get PDF
    학위논문 (박사)-- 서울대학교 대학원 공과대학 전기·컴퓨터공학부, 2017. 8. 최진영.In this dissertation, we propose two approaches for three-dimensional (3D) localizing and tracking of multiple targets by using images from multiple cameras with overlapping views. The main challenge is to solve the 3D position estimation problem and the trajectory assignment problem simultaneously. However, most of the existing methods solve these problems independently. Unlike single camera multi-target tracking, it is much more complicated to solve both problems because the relationship between cameras is also taken into consideration in multi-camera. To tackle this challenge, we present two approaches: mixed multidimensional assignment approach and variational inference approach. In the mixed multidimensional assignment approach, we formulate the data association and 3D trajectory estimation problem as the mixed optimization problem with discrete and continuous variables and suggest an alternative optimization scheme which jointly solves the two coupled problems. To handle a large solution space, we develop an efficient optimization scheme that alternates between two coupled problems with a reasonable computational load. In this optimization formulation, we design a new cost function that describes 3D physical properties of each target. In the variational inference approach, we establish a maximum a posteriori (MAP) problem over trajectory assignments and 3D positions for given detections from multiple cameras. To find a solution, we develop an expectation-maximization scheme, where the probability distributions are designed by following the Boltzmann distribution of seven terms induced from multi-camera tracking settings.1 Introduction 1 1.1 Background & Challenges 1 1.2 Related Works 4 1.3 Problem Statements & Contributions 8 2 Mixed Multidimensional Assignment Approach 12 2.1 Problem Formulation 12 2.1.1 Problem Statements 12 2.1.2 Cost Design 17 2.2 Optimization 22 2.2.1 Spatio-temporal Data Association 23 2.2.2 3D Trajectory Estimation 31 2.2.3 Initialization 33 2.3 Application: Real-time 3D localizing and tracking system 35 2.3.1 System overview 36 2.3.2 Detection 37 2.3.3 Tracking 39 2.4 Appendix 42 2.4.1 Derivation of equation (2.35) 42 3 Variational Inference Approach 44 3.1 Problem Formulation 44 3.1.1 Notations 44 3.1.2 MAP formulation 46 3.2 Optimization 48 3.2.1 Posterior distribution 48 3.2.2 V-EM algorithm 51 3.3 Appendix 56 3.3.1 Derivation of equation (3.12) 56 3.3.2 Derivation of equation (3.27-3.32) 56 3.3.3 Deriving optimal mean and covariance matrix (3.33-3.35) 59 3.3.4 Definition of A and b in (3.22) 62 4 Experiments 63 4.1 Datasets 63 4.1.1 PETS 2009 63 4.1.2 PSN-University 64 4.2 Evaluation Metrics 66 4.3 Results and Discussion 67 4.3.1 Mixed Multidimensional Assignment Approach 67 4.3.2 Variational Inference Approach 82 4.3.3 Comparisons of Two Approaches 93 5 Conclusion 98 5.1 Concluding Remarks 98 5.2 Future Work 99 Abstract (In Korean) 112Docto

    Three hypothesis algorithm with occlusion reasoning for multiple people tracking

    Get PDF
    This work proposes a detection-based tracking algorithm able to locate and keep the identity of multiple people, who may be occluded, in uncontrolled stationary environments. Our algorithm builds a tracking graph that models spatio-temporal relationships among attributes of interacting people to predict and resolve partial and total occlusions. When a total occlusion occurs, the algorithm generates various hypotheses about the location of the occluded person considering three cases: (a) the person keeps the same direction and speed, (b) the person follows the direction and speed of the occluder, and (c) the person remains motionless during occlusion. By analyzing the graph, our algorithm can detect trajectories produced by false alarms and estimate the location of missing or occluded people. Our algorithm performs acceptably under complex conditions, such as partial visibility of individuals getting inside or outside the scene, continuous interactions and occlusions among people, wrong or missing information on the detection of persons, as well as variation of the person’s appearance due to illumination changes and background-clutter distracters. Our algorithm was evaluated on test sequences in the field of intelligent surveillance achieving an overall precision of 93%. Results show that our tracking algorithm outperforms even trajectory-based state-of-the-art algorithms

    Automatic Camera Network Localization using Object Image Tracks

    Full text link

    3D Robotic Sensing of People: Human Perception, Representation and Activity Recognition

    Get PDF
    The robots are coming. Their presence will eventually bridge the digital-physical divide and dramatically impact human life by taking over tasks where our current society has shortcomings (e.g., search and rescue, elderly care, and child education). Human-centered robotics (HCR) is a vision to address how robots can coexist with humans and help people live safer, simpler and more independent lives. As humans, we have a remarkable ability to perceive the world around us, perceive people, and interpret their behaviors. Endowing robots with these critical capabilities in highly dynamic human social environments is a significant but very challenging problem in practical human-centered robotics applications. This research focuses on robotic sensing of people, that is, how robots can perceive and represent humans and understand their behaviors, primarily through 3D robotic vision. In this dissertation, I begin with a broad perspective on human-centered robotics by discussing its real-world applications and significant challenges. Then, I will introduce a real-time perception system, based on the concept of Depth of Interest, to detect and track multiple individuals using a color-depth camera that is installed on moving robotic platforms. In addition, I will discuss human representation approaches, based on local spatio-temporal features, including new “CoDe4D” features that incorporate both color and depth information, a new “SOD” descriptor to efficiently quantize 3D visual features, and the novel AdHuC features, which are capable of representing the activities of multiple individuals. Several new algorithms to recognize human activities are also discussed, including the RG-PLSA model, which allows us to discover activity patterns without supervision, the MC-HCRF model, which can explicitly investigate certainty in latent temporal patterns, and the FuzzySR model, which is used to segment continuous data into events and probabilistically recognize human activities. Cognition models based on recognition results are also implemented for decision making that allow robotic systems to react to human activities. Finally, I will conclude with a discussion of future directions that will accelerate the upcoming technological revolution of human-centered robotics

    Object Association Across Multiple Moving Cameras In Planar Scenes

    Get PDF
    In this dissertation, we address the problem of object detection and object association across multiple cameras over large areas that are well modeled by planes. We present a unifying probabilistic framework that captures the underlying geometry of planar scenes, and present algorithms to estimate geometric relationships between different cameras, which are subsequently used for co-operative association of objects. We first present a local1 object detection scheme that has three fundamental innovations over existing approaches. First, the model of the intensities of image pixels as independent random variables is challenged and it is asserted that useful correlation exists in intensities of spatially proximal pixels. This correlation is exploited to sustain high levels of detection accuracy in the presence of dynamic scene behavior, nominal misalignments and motion due to parallax. By using a non-parametric density estimation method over a joint domain-range representation of image pixels, complex dependencies between the domain (location) and range (color) are directly modeled. We present a model of the background as a single probability density. Second, temporal persistence is introduced as a detection criterion. Unlike previous approaches to object detection that detect objects by building adaptive models of the background, the foreground is modeled to augment the detection of objects (without explicit tracking), since objects detected in the preceding frame contain substantial evidence for detection in the current frame. Finally, the background and foreground models are used competitively in a MAP-MRF decision framework, stressing spatial context as a condition of detecting interesting objects and the posterior function is maximized efficiently by finding the minimum cut of a capacitated graph. Experimental validation of the method is performed and presented on a diverse set of data. We then address the problem of associating objects across multiple cameras in planar scenes. Since cameras may be moving, there is a possibility of both spatial and temporal non-overlap in the fields of view of the camera. We first address the case where spatial and temporal overlap can be assumed. Since the cameras are moving and often widely separated, direct appearance-based or proximity-based constraints cannot be used. Instead, we exploit geometric constraints on the relationship between the motion of each object across cameras, to test multiple correspondence hypotheses, without assuming any prior calibration information. Here, there are three contributions. First, we present a statistically and geometrically meaningful means of evaluating a hypothesized correspondence between multiple objects in multiple cameras. Second, since multiple cameras exist, ensuring coherency in association, i.e. transitive closure is maintained between more than two cameras, is an essential requirement. To ensure such coherency we pose the problem of object associating across cameras as a k-dimensional matching and use an approximation to find the association. We show that, under appropriate conditions, re-entering objects can also be re-associated to their original labels. Third, we show that as a result of associating objects across the cameras, a concurrent visualization of multiple aerial video streams is possible. Results are shown on a number of real and controlled scenarios with multiple objects observed by multiple cameras, validating our qualitative models. Finally, we present a unifying framework for object association across multiple cameras and for estimating inter-camera homographies between (spatially and temporally) overlapping and non-overlapping cameras, whether they are moving or non-moving. By making use of explicit polynomial models for the kinematics of objects, we present algorithms to estimate inter-frame homographies. Under an appropriate measurement noise model, an EM algorithm is applied for the maximum likelihood estimation of the inter-camera homographies and kinematic parameters. Rather than fit curves locally (in each camera) and match them across views, we present an approach that simultaneously refines the estimates of inter-camera homographies and curve coefficients globally. We demonstrate the efficacy of the approach on a number of real sequences taken from aerial cameras, and report quantitative performance during simulations

    Video foreground extraction for mobile camera platforms

    Get PDF
    Foreground object detection is a fundamental task in computer vision with many applications in areas such as object tracking, event identification, and behavior analysis. Most conventional foreground object detection methods work only in a stable illumination environments using fixed cameras. In real-world applications, however, it is often the case that the algorithm needs to operate under the following challenging conditions: drastic lighting changes, object shape complexity, moving cameras, low frame capture rates, and low resolution images. This thesis presents four novel approaches for foreground object detection on real-world datasets using cameras deployed on moving vehicles.The first problem addresses passenger detection and tracking tasks for public transport buses investigating the problem of changing illumination conditions and low frame capture rates. Our approach integrates a stable SIFT (Scale Invariant Feature Transform) background seat modelling method with a human shape model into a weighted Bayesian framework to detect passengers. To deal with the problem of tracking multiple targets, we employ the Reversible Jump Monte Carlo Markov Chain tracking algorithm. Using the SVM classifier, the appearance transformation models capture changes in the appearance of the foreground objects across two consecutives frames under low frame rate conditions. In the second problem, we present a system for pedestrian detection involving scenes captured by a mobile bus surveillance system. It integrates scene localization, foreground-background separation, and pedestrian detection modules into a unified detection framework. The scene localization module performs a two stage clustering of the video data.In the first stage, SIFT Homography is applied to cluster frames in terms of their structural similarity, and the second stage further clusters these aligned frames according to consistency in illumination. This produces clusters of images that are differential in viewpoint and lighting. A kernel density estimation (KDE) technique for colour and gradient is then used to construct background models for each image cluster, which is further used to detect candidate foreground pixels. Finally, using a hierarchical template matching approach, pedestrians can be detected.In addition to the second problem, we present three direct pedestrian detection methods that extend the HOG (Histogram of Oriented Gradient) techniques (Dalal and Triggs, 2005) and provide a comparative evaluation of these approaches. The three approaches include: a) a new histogram feature, that is formed by the weighted sum of both the gradient magnitude and the filter responses from a set of elongated Gaussian filters (Leung and Malik, 2001) corresponding to the quantised orientation, which we refer to as the Histogram of Oriented Gradient Banks (HOGB) approach; b) the codebook based HOG feature with branch-and-bound (efficient subwindow search) algorithm (Lampert et al., 2008) and; c) the codebook based HOGB approach.In the third problem, a unified framework that combines 3D and 2D background modelling is proposed to detect scene changes using a camera mounted on a moving vehicle. The 3D scene is first reconstructed from a set of videos taken at different times. The 3D background modelling identifies inconsistent scene structures as foreground objects. For the 2D approach, foreground objects are detected using the spatio-temporal MRF algorithm. Finally, the 3D and 2D results are combined using morphological operations.The significance of these research is that it provides basic frameworks for automatic large-scale mobile surveillance applications and facilitates many higher-level applications such as object tracking and behaviour analysis

    AN ADAPTIVE MULTIPLE-OBJECT TRACKING ARCHITECTURE FOR LONG-DURATION VIDEOS WITH VARIABLE TARGET DENSITY

    Get PDF
    Multiple-Object Tracking (MOT) methods are used to detect targets in individual video frames, e.g., vehicles, people, and other objects, and then record each unique target’s path over time. Current state-of-the-art approaches are extremely complex because most rely on extracting and comparing visual features at every frame to track each object. These approaches are geared toward high-difficulty-tracking scenarios, e.g., crowded airports, and require expensive dedicated hardware, e.g., Graphics Processing Units. In hardware-constrained applications, researchers are turning to older, less complex MOT methods, which reveals a serious scalability issue within the state-of-the-art. Crowded environments are a niche application for MOT, i.e., there are far more residential areas than there are airports. Given complex approaches are not required for low-difficulty-tracking scenarios, i.e., video showing mainly isolated targets, there is an opportunity to utilize more efficient MOT methods for these environments. Nevertheless, little recent research has focused on developing more efficient MOT methods. This thesis describes a novel MOT method, ClusterTracker, that is built to handle variable-difficulty-tracking environments an order of magnitude faster than the state-of-the-art. It achieves this by avoiding visual features and using quadratic-complexity algorithms instead of the cubic-complexity algorithms found in other trackers. ClusterTracker performs spatial clustering on object detections from short frame sequences, treats clusters as tracklets, and then connects successive tracklets with high bounding-box overlap to form tracks. With recorded video, parallel processing can be applied to several steps of ClusterTracker. This thesis evaluates ClusterTracker’s baseline performance on several benchmark datasets, describes its intended operating environments, and identifies its weaknesses. Subsequent modifications patch these weaknesses while also addressing the scalability concerns of more complex MOT methods. The modified architecture uses clustering feedback to separate isolated targets from non-isolated targets, re-processing the latter with a more complex MOT method. Results show ClusterTracker is uniquely suited for such an approach and allows complex MOT methods to be applied to the challenging tracking situations for which they are intended
    corecore