914 research outputs found

    Tracking interacting targets in multi-modal sensors

    Get PDF
    PhDObject tracking is one of the fundamental tasks in various applications such as surveillance, sports, video conferencing and activity recognition. Factors such as occlusions, illumination changes and limited field of observance of the sensor make tracking a challenging task. To overcome these challenges the focus of this thesis is on using multiple modalities such as audio and video for multi-target, multi-modal tracking. Particularly, this thesis presents contributions to four related research topics, namely, pre-processing of input signals to reduce noise, multi-modal tracking, simultaneous detection and tracking, and interaction recognition. To improve the performance of detection algorithms, especially in the presence of noise, this thesis investigate filtering of the input data through spatio-temporal feature analysis as well as through frequency band analysis. The pre-processed data from multiple modalities is then fused within Particle filtering (PF). To further minimise the discrepancy between the real and the estimated positions, we propose a strategy that associates the hypotheses and the measurements with a real target, using a Weighted Probabilistic Data Association (WPDA). Since the filtering involved in the detection process reduces the available information and is inapplicable on low signal-to-noise ratio data, we investigate simultaneous detection and tracking approaches and propose a multi-target track-beforedetect Particle filtering (MT-TBD-PF). The proposed MT-TBD-PF algorithm bypasses the detection step and performs tracking in the raw signal. Finally, we apply the proposed multi-modal tracking to recognise interactions between targets in regions within, as well as outside the cameras’ fields of view. The efficiency of the proposed approaches are demonstrated on large uni-modal, multi-modal and multi-sensor scenarios from real world detections, tracking and event recognition datasets and through participation in evaluation campaigns

    Model-Based High-Dimensional Pose Estimation with Application to Hand Tracking

    Get PDF
    This thesis presents novel techniques for computer vision based full-DOF human hand motion estimation. Our main contributions are: A robust skin color estimation approach; A novel resolution-independent and memory efficient representation of hand pose silhouettes, which allows us to compute area-based similarity measures in near-constant time; A set of new segmentation-based similarity measures; A new class of similarity measures that work for nearly arbitrary input modalities; A novel edge-based similarity measure that avoids any problematic thresholding or discretizations and can be computed very efficiently in Fourier space; A template hierarchy to minimize the number of similarity computations needed for finding the most likely hand pose observed; And finally, a novel image space search method, which we naturally combine with our hierarchy. Consequently, matching can efficiently be formulated as a simultaneous template tree traversal and function maximization

    Recent Developments in Video Surveillance

    Get PDF
    With surveillance cameras installed everywhere and continuously streaming thousands of hours of video, how can that huge amount of data be analyzed or even be useful? Is it possible to search those countless hours of videos for subjects or events of interest? Shouldn’t the presence of a car stopped at a railroad crossing trigger an alarm system to prevent a potential accident? In the chapters selected for this book, experts in video surveillance provide answers to these questions and other interesting problems, skillfully blending research experience with practical real life applications. Academic researchers will find a reliable compilation of relevant literature in addition to pointers to current advances in the field. Industry practitioners will find useful hints about state-of-the-art applications. The book also provides directions for open problems where further advances can be pursued

    Visual attention models for far-field scene analysis

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2007.This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.Includes bibliographical references (p. 141-146).The amount of information available to an intelligent monitoring system is simply too vast to process in its entirety. One way to address this issue is by developing attentive mechanisms that recognize parts of the input as more interesting than others. We apply this concept to the domain of far-field activity analysis by addressing the problem of determining where to look in a scene in order to capture interesting activity in progress. We pose the problem of attention as an unsupervised learning problem, in which the task is to learn from long-term observation a model of the usual pattern of activity. Such a statistical scene model then makes it possible to detect and attend to examples of unusual activity. We present two data-driven scene modeling approaches. In the first, we model the pattern of individual observations (instances) of moving objects at each scene location as a mixture of Gaussians. In the second approach, we model the pattern of sequences of observations -- tracks -- by grouping them into clusters.We employ a similarity measure that combines comparisons of multiple attributes -- such as size, position, and velocity -- in a principled manner so that only tracks that are spatially similar and have similar attributes at spatially corresponding points are grouped together. We group the tracks using spectral clustering and represent the scene model as a mixture of Gaussians in the spectral embedding space. New examples of activity can be efficiently classified by projection into the embedding space. We demonstrate clustering and unusual activity detection results on a week of activity in the scene (about 40,000 moving object tracks) and show that human perceptual judgments of unusual activity are well-correlated with the statistical model. The human validation suggests that the track-based anomaly detection framework would perform well as a classifier for unusual events. To our knowledge, our work is the first to evaluate a statistical scene modeling and anomaly detection framework against human judgments.by Tomáš Ižo.Ph.D

    Video event detection and visual data pro cessing for multimedia applications

    Get PDF
    Cette thèse (i) décrit une procédure automatique pour estimer la condition d'arrêt des méthodes de déconvolution itératives basées sur un critère d'orthogonalité du signal estimé et de son gradient à une itération donnée; (ii) présente une méthode qui décompose l'image en une partie géométrique (ou "cartoon") et une partie "texture" en utilisation une estimation de paramètre et une condition d'arrêt basées sur la diffusion anisotropique avec orthogonalité, en utilisant le fait que ces deux composantes. "cartoon" et "texture", doivent être indépendantes; (iii) décrit une méthode pour extraire d'une séquence vidéo obtenue à partir de caméra portable les objets de premier plan en mouvement. Cette méthode augmente la compensation de mouvement de la caméra par une nouvelle estimation basée noyau de la fonction de probabilité de densité des pixels d'arrière-plan. Les méthodes présentées ont été testées et comparées aux algorithmes de l'état de l'art.This dissertation (i) describes an automatic procedure for estimating the stopping condition of non-regularized iterative deconvolution methods based on an orthogonality criterion of the estimated signal and its gradient at a given iteration; (ii) presents a decomposition method that splits the image into geometric (or cartoon) and texture parts using anisotropic diffusion with orthogonality based parameter estimation and stopping condition, utilizing the theory that the cartoon and the texture components of an image should be independent of each other; (iii) describes a method for moving foreground object extraction in sequences taken by wearable camera, with strong motion, where the camera motion compensated frame differencing is enhanced with a novel kernel-based estimation of the probability density function of the background pixels. The presented methods have been thoroughly tested and compared to other similar algorithms from the state-of-the-art.BORDEAUX1-Bib.electronique (335229901) / SudocSudocFranceF

    Grounding semantics in robots for Visual Question Answering

    Get PDF
    In this thesis I describe an operational implementation of an object detection and description system that incorporates in an end-to-end Visual Question Answering system and evaluated it on two visual question answering datasets for compositional language and elementary visual reasoning
    • …
    corecore