9 research outputs found

    View Registration Using Interesting Segments of Planar Trajectories

    Full text link
    We introduce a method for recovering the spatial and temporal alignment between two or more views of objects moving over a ground plane. Existing approaches either assume that the streams are globally synchronized, so that only solving the spatial alignment is needed, or that the temporal misalignment is small enough so that exhaustive search can be performed. In contrast, our approach can recover both the spatial and temporal alignment. We compute for each trajectory a number of interesting segments, and we use their description to form putative matches between trajectories. Each pair of corresponding interesting segments induces a temporal alignment, and defines an interval of common support across two views of an object that is used to recover the spatial alignment. Interesting segments and their descriptors are defined using algebraic projective invariants measured along the trajectories. Similarity between interesting segments is computed taking into account the statistics of such invariants. Candidate alignment parameters are verified checking the consistency, in terms of the symmetric transfer error, of all the putative pairs of corresponding interesting segments. Experiments are conducted with two different sets of data, one with two views of an outdoor scene featuring moving people and cars, and one with four views of a laboratory sequence featuring moving radio-controlled cars

    Learning Higher-order Transition Models in Medium-scale Camera Networks

    Get PDF
    We present a Bayesian framework for learning higherorder transition models in video surveillance networks. Such higher-order models describe object movement between cameras in the network and have a greater predictive power for multi-camera tracking than camera adjacency alone. These models also provide inherent resilience to camera failure, filling in gaps left by single or even multiple non-adjacent camera failures. Our approach to estimating higher-order transition models relies on the accurate assignment of camera observations to the underlying trajectories of objects moving through the network. We addresses this data association problem by gathering the observations and evaluating alternative partitions of the observation set into individual object trajectories. Searching the complete partition space is intractable, so an incremental approach is taken, iteratively adding observations and pruning unlikely partitions. Partition likelihood is determined by the evaluation of a probabilistic graphical model. When the algorithm has considered all observations, the most likely (MAP) partition is taken as the true object trajectories. From these recovered trajectories, the higher-order statistics we seek can be derived and employed for tracking. The partitioning algorithm we present is parallel in nature and can be readily extended to distributed computation in medium-scale smart camera networks. 1

    SCOOP: A Real-Time Sparsity Driven People Localization Algorithm

    Get PDF
    Detecting and tracking people in scenes monitored by cameras is an important step in many application scenarios such as surveillance, urban planning or behavioral studies to name a few. The amount of data produced by camera feeds is so large that it is also vital that these steps be performed with the utmost computational efficiency and often even real-time. We propose SCOOP, a novel algorithm that reliably detects pedestrians in camera feeds, using only the output of a simple background removal technique. SCOOP can handle a single or many video feeds. At the heart of our technique there is a sparse model for binary motion detection maps that we solve with a novel greedy algorithm based on set covering. We study the convergence and performance of the algorithm under various degradation models such as noisy observations and crowded environments, and we provide mathematical and experimental evidence of both its efficiency and robustness using standard datasets. This clearly shows that SCOOP is a viable alternative to existing state-of-the-art people detection algorithms, with the marked advantage of real-time computations

    Sparsity Driven People Localization with a Heterogeneous Network of Cameras

    Get PDF
    In this paper, we propose to study the problem of localization of a dense set of people with a network of heterogeneous cameras. We propose to recast the problem as a linear inverse problem. The proposed framework is generic to any scene, scalable in the number of cameras and versatile with respect to their geometry, e.g. planar or omnidirectional. It relies on deducing an \emph {occupancy vector}, i.e. the discretized occupancy of people on the ground, from the noisy binary silhouettes observed as foreground pixels in each camera. This inverse problem is regularized by imposing a sparse occupancy vector, i.e. made of few non- zero elements, while a particular dictionary of silhouettes linearly maps these non-empty grid locations to the multiple silhouettes viewed by the cameras network. This constitutes a linearization of the problem, where non- linearities, such as occlusions, are treated as additional noise on the observed silhouettes. Mathematically, we express the final inverse problem either as Basis Pursuit DeNoise or Lasso convex optimization programs. The sparsity measure is reinforced by iteratively re-weighting the 1\ell_1-norm of the occupancy vector for better approximating its 0\ell_0 ``norm'', and a new kind of ``repulsive'' sparsity is used to adapt further the Lasso procedure to the occupancy reconstruction. Practically, an adaptive sampling process is proposed to reduce the computation cost and monitor a large occupancy area. Qualitative and quantitative results are presented on a basketball game. The proposed algorithm successfully detects people occluding each other given severely degraded extracted features, while outperforming state-of-the-art people localization techniques

    Statistical dependence estimation for object interaction and matching

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2006.Includes bibliographical references (p. 97-103).This dissertation shows how statistical dependence estimation underlies two key problems in visual surveillance and wide-area tracking. The first problem is to detect and describe interactions between moving objects. The goal is to measure the influence objects exert on one another. The second problem is to match objects between non-overlapping cameras. There, the goal is to pair the departures in one camera with the arrivals in a different camera so that the resulting distribution of relationships best models the data. Both problems have become important for scaling up surveillance systems to larger areas and expanding the monitoring to more interesting behaviors. We show how statistical dependence estimation generalizes previous work and may have applications in other areas. The two problems represent different applications of our thesis that statistical dependence estimation underlies the learning of the structure of probabilistic models. First, we analyze the relationship between Bayesian, information-theoretic, and classical statistical methods for statistical dependence estimation. Then, we apply these ideas to formulate object interaction in terms of dependency structure model selection.(cont.) We describe experiments on simulated and real interaction data to validate our approach. Second, we formulate the matching problem in terms of maximizing statistical dependence. This allows us to generalize previous work on matching, and we show improved results on simulated and real data for non-overlapping cameras. We also prove an intractability result on exact maximally dependent matching.by Kinh Tieu.Ph.D

    Tracking interacting targets in multi-modal sensors

    Get PDF
    PhDObject tracking is one of the fundamental tasks in various applications such as surveillance, sports, video conferencing and activity recognition. Factors such as occlusions, illumination changes and limited field of observance of the sensor make tracking a challenging task. To overcome these challenges the focus of this thesis is on using multiple modalities such as audio and video for multi-target, multi-modal tracking. Particularly, this thesis presents contributions to four related research topics, namely, pre-processing of input signals to reduce noise, multi-modal tracking, simultaneous detection and tracking, and interaction recognition. To improve the performance of detection algorithms, especially in the presence of noise, this thesis investigate filtering of the input data through spatio-temporal feature analysis as well as through frequency band analysis. The pre-processed data from multiple modalities is then fused within Particle filtering (PF). To further minimise the discrepancy between the real and the estimated positions, we propose a strategy that associates the hypotheses and the measurements with a real target, using a Weighted Probabilistic Data Association (WPDA). Since the filtering involved in the detection process reduces the available information and is inapplicable on low signal-to-noise ratio data, we investigate simultaneous detection and tracking approaches and propose a multi-target track-beforedetect Particle filtering (MT-TBD-PF). The proposed MT-TBD-PF algorithm bypasses the detection step and performs tracking in the raw signal. Finally, we apply the proposed multi-modal tracking to recognise interactions between targets in regions within, as well as outside the cameras’ fields of view. The efficiency of the proposed approaches are demonstrated on large uni-modal, multi-modal and multi-sensor scenarios from real world detections, tracking and event recognition datasets and through participation in evaluation campaigns

    Visual attention models for far-field scene analysis

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2007.This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.Includes bibliographical references (p. 141-146).The amount of information available to an intelligent monitoring system is simply too vast to process in its entirety. One way to address this issue is by developing attentive mechanisms that recognize parts of the input as more interesting than others. We apply this concept to the domain of far-field activity analysis by addressing the problem of determining where to look in a scene in order to capture interesting activity in progress. We pose the problem of attention as an unsupervised learning problem, in which the task is to learn from long-term observation a model of the usual pattern of activity. Such a statistical scene model then makes it possible to detect and attend to examples of unusual activity. We present two data-driven scene modeling approaches. In the first, we model the pattern of individual observations (instances) of moving objects at each scene location as a mixture of Gaussians. In the second approach, we model the pattern of sequences of observations -- tracks -- by grouping them into clusters.We employ a similarity measure that combines comparisons of multiple attributes -- such as size, position, and velocity -- in a principled manner so that only tracks that are spatially similar and have similar attributes at spatially corresponding points are grouped together. We group the tracks using spectral clustering and represent the scene model as a mixture of Gaussians in the spectral embedding space. New examples of activity can be efficiently classified by projection into the embedding space. We demonstrate clustering and unusual activity detection results on a week of activity in the scene (about 40,000 moving object tracks) and show that human perceptual judgments of unusual activity are well-correlated with the statistical model. The human validation suggests that the track-based anomaly detection framework would perform well as a classifier for unusual events. To our knowledge, our work is the first to evaluate a statistical scene modeling and anomaly detection framework against human judgments.by Tomáš Ižo.Ph.D

    Modeling and Optimizing the Coverage of Multi-Camera Systems

    Get PDF
    This thesis approaches the problem of modeling a multi-camera system\u27s performance from system and task parameters by describing the relationship in terms of coverage. This interface allows a substantial separation of the two concerns: the ability of the system to obtain data from the space of possible stimuli, according to task requirements, and the description of the set of stimuli required for the task. The conjecture is that for any particular system, it is in principle possible to develop such a model with ideal prediction of performance. Accordingly, a generalized structure and tool set is built around the core mathematical definitions of task-oriented coverage, without tying it to any particular model. A family of problems related to coverage in the context of multi-camera systems is identified and described. A comprehensive survey of the state of the art in approaching such problems concludes that by coupling the representation of coverage to narrow problem cases and applications, and by attempting to simplify the models to fit optimization techniques, both the generality and the fidelity of the models are reduced. It is noted that models exhibiting practical levels of fidelity are well beyond the point where only metaheuristic optimization techniques are applicable. Armed with these observations and a promising set of ideas from surveyed sources, a new high-fidelity model for multi-camera vision based on the general coverage framework is presented. This model is intended to be more general in scope than previous work, and despite the complexity introduced by the multiple criteria required for fidelity, it conforms to the framework and is thus tractable for certain optimization approaches. Furthermore, it is readily extended to different types of vision systems. This thesis substantiates all of these claims. The model\u27s fidelity and generality is validated and compared to some of the more advanced models from the literature. Three of the aforementioned coverage problems are then approached in application cases using the model. In one case, a bistatic variant of the sensing modality is used, requiring a modification of the model; the compatibility of this modification, both conceptually and mathematically, illustrates the generality of the framework
    corecore