12 research outputs found

    Multi-camera Tracklet association and fusion using ensemble of visual andgeometric cues

    Get PDF
    International audienceData association and fusion is pivot for object trackingin multi-camera network. We present a novel frameworkfor solving online multi-object tracking in partially overlappingmulti-camera network by modelling tracklet associationas combinatorial optimization problem hypothesizedon ensemble of cues such as appearance, motion and geometryinformation. Our method learns discriminant weightas a measure of consistency and discriminancy of featurepatterns to make ensemble feature selection and combinationbetween local and global tracking information. Ourapproach contributes uniquely in the way tracklet selection,association and fusion is done. Once multi-view correspondencesare established using planar homography, DynamicTime Warping algorithm is used to make tracklet selectionfor which similarity has to be calculated i.e overlappingtracklets and subtracklets. Then trajectory similarities arecomputed for these selective tracklets and subtracklets usingensemble of appearance and motion cues weighted byonline learnt discriminative function. Later on, we tacklethe association problem by building a k-partite graph andassociation rules to match all the pair-wise trackets. Finally,from outcome of hungarian algorithm, the associatedtrajectories are later fused. Fusion is done based on calculatedindividual tracklet reliability criteria. Experimentalresults demonstrate our system achieve performance thatsignificantly improve the state of the art on PETS 2009

    Non-myopic information theoretic sensor management of a single pan\u2013tilt\u2013zoom camera for multiple object detection and tracking

    Get PDF
    Detailed derivation of an information theoretic framework for real PTZ management.Introduction and implementation of a non-myopic strategy.Large experimental validation, with synthetic and realistic datasets.Working demonstration of myopic strategy on an off-the-shelf PTZ camera. Automatic multiple object tracking with a single pan-tilt-zoom (PTZ) cameras is a hard task, with few approaches in the literature, most of them proposing simplistic scenarios. In this paper, we present a novel PTZ camera management framework in which at each time step, the next camera pose (pan, tilt, focal length) is chosen to support multiple object tracking. The policy can be myopic or non-myopic, where the former analyzes exclusively the current frame for deciding the next camera pose, while the latter takes into account plausible future target displacements and camera poses, through a multiple look-ahead optimization. In both cases, occlusions, a variable number of subjects and genuine pedestrian detectors are taken into account, for the first time in the literature. Convincing comparative results on synthetic data, realistic simulations and real trials validate our proposal, showing that non-myopic strategies are particularly suited for a PTZ camera management

    Activity understanding and unusual event detection in surveillance videos

    Get PDF
    PhDComputer scientists have made ceaseless efforts to replicate cognitive video understanding abilities of human brains onto autonomous vision systems. As video surveillance cameras become ubiquitous, there is a surge in studies on automated activity understanding and unusual event detection in surveillance videos. Nevertheless, video content analysis in public scenes remained a formidable challenge due to intrinsic difficulties such as severe inter-object occlusion in crowded scene and poor quality of recorded surveillance footage. Moreover, it is nontrivial to achieve robust detection of unusual events, which are rare, ambiguous, and easily confused with noise. This thesis proposes solutions for resolving ambiguous visual observations and overcoming unreliability of conventional activity analysis methods by exploiting multi-camera visual context and human feedback. The thesis first demonstrates the importance of learning visual context for establishing reliable reasoning on observed activity in a camera network. In the proposed approach, a new Cross Canonical Correlation Analysis (xCCA) is formulated to discover and quantify time delayed pairwise correlations of regional activities observed within and across multiple camera views. This thesis shows that learning time delayed pairwise activity correlations offers valuable contextual information for (1) spatial and temporal topology inference of a camera network, (2) robust person re-identification, and (3) accurate activity-based video temporal segmentation. Crucially, in contrast to conventional methods, the proposed approach does not rely on either intra-camera or inter-camera object tracking; it can thus be applied to low-quality surveillance videos featuring severe inter-object occlusions. Second, to detect global unusual event across multiple disjoint cameras, this thesis extends visual context learning from pairwise relationship to global time delayed dependency between regional activities. Specifically, a Time Delayed Probabilistic Graphical Model (TD-PGM) is proposed to model the multi-camera activities and their dependencies. Subtle global unusual events are detected and localised using the model as context-incoherent patterns across multiple camera views. In the model, different nodes represent activities in different decomposed re3 gions from different camera views, and the directed links between nodes encoding time delayed dependencies between activities observed within and across camera views. In order to learn optimised time delayed dependencies in a TD-PGM, a novel two-stage structure learning approach is formulated by combining both constraint-based and scored-searching based structure learning methods. Third, to cope with visual context changes over time, this two-stage structure learning approach is extended to permit tractable incremental update of both TD-PGM parameters and its structure. As opposed to most existing studies that assume static model once learned, the proposed incremental learning allows a model to adapt itself to reflect the changes in the current visual context, such as subtle behaviour drift over time or removal/addition of cameras. Importantly, the incremental structure learning is achieved without either exhaustive search in a large graph structure space or storing all past observations in memory, making the proposed solution memory and time efficient. Forth, an active learning approach is presented to incorporate human feedback for on-line unusual event detection. Contrary to most existing unsupervised methods that perform passive mining for unusual events, the proposed approach automatically requests supervision for critical points to resolve ambiguities of interest, leading to more robust detection of subtle unusual events. The active learning strategy is formulated as a stream-based solution, i.e. it makes decision on-the-fly on whether to request label for each unlabelled sample observed in sequence. It selects adaptively two active learning criteria, namely likelihood criterion and uncertainty criterion to achieve (1) discovery of unknown event classes and (2) refinement of classification boundary. The effectiveness of the proposed approaches is validated using videos captured from busy public scenes such as underground stations and traffic intersections

    Patterns of motion in non-overlapping networks using vehicle tracking data

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2010.Cataloged from PDF version of thesis.Includes bibliographical references (p. 121-131).We present a systematic framework to learn motion patterns based on vehicle tracking data captured by multiple non-overlapping uncalibrated cameras. We assume that the tracks from individual cameras are available. We define the key problems related to the multi-camera surveillance system and present solutions to these problems: learning the topology of the network, constructing tracking correspondences between different views, learning the activity clusters over global views and finally detecting abnormal events. First, we present a weighted cross correlation model to learn the topology of the network without solving correspondence in the first place. We use estimates of normalized color and apparent size to measure similarity of object appearance between different views. This information is used to temporally correlated observations, allowing us to infer possible links between disjoint views, and to estimate the associated transition time. Based on the learned cross correlation coefficient, the network topology can be fully recovered. Then, we present a MAP framework to match two objects along their tracks from non overlapping camera views and discuss how the learned topology can reduce the correspondence search space dramatically. We propose to learn the color transformation in [iota][alpha][beta] space to compensate for the varying illumination conditions across different views, and learn the inter-camera time transition and the shape/size transformation between different views.(cont.) After we model the correspondence probability for observations captured by different source/sinks, we adopt a probabilistic framework to use this correspondence probability in a principled manner. Tracks are assigned by estimating the correspondences which maximize the posterior probabilities (MAP) using the Hungarian algorithm. After establishing the correspondence, we have a set of stitched trajectories, in which elements from each camera can be combined with observations in multiple subsequent cameras generated by the same object. Finally, we show how to learn the activity clusters and detect abnormal activities using the mixture of unigram model with the stitched trajectories as input. We adopt a bag - of - words presentation, and present a Bayesian probabilistic approach in which trajectories are represented by a mixture model. This model can classify trajectories into different activity clusters, and gives representations of both new trajectories and abnormal trajectories.by Chaowei Niu.Ph.D

    Distributed consensus in multi-robot systems with visual perception

    Get PDF
    La idea de equipos de robots actuando con autonomía y de manera cooperativa está cada día más cerca de convertirse en realidad. Los sistemas multi robot pueden ejecutar tareas de gran complejidad con mayor robustez y en menos tiempo que un robot trabajando solo. Por otra parte, la coordinación de un equipo de robots introduce complicaciones que los ingenieros encargados de diseñar estos sistemas deben afrontar. Conseguir que la percepción del entorno sea consistente en todos los robots es uno de los aspectos más importantes requeridos en cualquier tarea cooperativa, lo que implica que las observaciones de cada robot del equipo deben ser transmitidas a todos los otros miembros. Cuando dos o más robots poseen información común del entorno, el equipo debe alcanzar un consenso usando toda la información disponible. Esto se debe hacer considerando las limitaciones de cada robot, teniendo en cuenta que no todos los robots se pueden comunicar unos con otros. Con este objetivo, se aborda la tarea de diseñar algoritmos distribuidos que consigan que un equipo de robots llegue a un consenso acerca de la información percibida por todos los miembros. Específicamente, nos centramos en resolver este problema cuando los robots usan la visión como sensor para percibir el entorno. Las cámaras convencionales son muy útiles a la hora de ejecutar tareas como la navegación y la construcción de mapas, esenciales en el ámbito de la robótica, gracias a la gran cantidad de información que contiene cada imagen. Sin embargo, el uso de estos sensores en un marco distribuido introduce una gran cantidad de complicaciones adicionales que deben ser abordadas si se quiere cumplir el objetivo propuesto. En esta Tesis presentamos un estudio profundo de los algoritmos distribuidos de consenso y cómo estos pueden ser usados por un equipo de robots equipados con cámaras convencionales, resolviendo los aspectos más importantes relacionados con el uso de estos sensores. En la primera parte de la Tesis nos centramos en encontrar correspondencias globales entre las observaciones de todos los robots. De esta manera, los robots son capaces de detectar que observaciones deben ser combinadas para el cálculo del consenso. También lidiamos con el problema de la robustez y la detección distribuida de espurios durante el cálculo del consenso. Para contrarrestar el incremento del tamaño de los mensajes intercambiados por los robots en las etapas anteriores, usamos las propiedades de los polinomios de Chebyshev, reduciendo el número de iteraciones que se requieren para alcanzar el consenso. En la segunda parte de la Tesis, centramos nuestra atención en los problemas de crear un mapa y controlar el movimiento del equipo de robots. Presentamos soluciones para alcanzar un consenso en estos escenarios mediante el uso de técnicas de visión por computador ampliamente conocidas. El uso de algoritmos de estructura y movimiento nos permite obviar restricciones tales como que los robots tengan que observarse unos a otros directamente durante el control o la necesidad de especificar un marco de referencia común. Adicionalmente, nuestros algoritmos tienen un comportamiento robusto cuando la calibración de las cámaras no se conoce. Finalmente, la evaluación de las propuestas se realiza utilizando un data set de un entorno urbano y robots reales con restricciones de movimiento no holónomas. Todos los algoritmos que se presentan en esta Tesis han sido diseñados para ser ejecutados de manera distribuida. En la Tesis demostramos de manera teórica las principales propiedades de los algoritmos que se proponen y evaluamos la calidad de los mismos con datos simulados e imágenes reales. En resumen, las principales contribuciones de esta Tesis son: • Un conjunto de algoritmos distribuidos que permiten a un equipo de robots equipados con cámaras convencionales alcanzar un consenso acerca de la información que perciben. En particular, proponemos tres algoritmos distribuidos con el objetivo de resolver los problemas de encontrar correspondencias globales entre la información de todos los robots, detectar y descartar información espuria, y reducir el número de veces que los robots tienen que comunicarse entre ellos antes de alcanzar el consenso. • La combinación de técnicas de consenso distribuido y estructura y movimiento en tareas de control y percepción. Se ha diseñado un algoritmo para construir un mapa topológico de manera cooperativa usando planos como características del mapa y restricciones de homografía como elementos para relacionar las observaciones de los robots. También se ha propuesto una ley de control distribuida utilizando la geometría epipolar con el objetivo de hacer que el equipo de robots alcance una orientación común sin la necesidad de observarse directamente unos a otros

    Learning motion patterns using hierarchical Bayesian models

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2009.Cataloged from PDF version of thesis.Includes bibliographical references (p. 163-179).In far-field visual surveillance, one of the key tasks is to monitor activities in the scene. Through learning motion patterns of objects, computers can help people understand typical activities, detect abnormal activities, and learn the models of semantically meaningful scene structures, such as paths commonly taken by objects. In medical imaging, some issues similar to learning motion patterns arise. Diffusion Tensor Magnetic Resonance Imaging (DT-MRI) is one of the first methods to visualize and quantify the organization of white matter in the brain in vivo. Using methods of tractography segmentation, one can connect local diffusion measurements to create global fiber trajectories, which can then be clustered into anatomically meaningful bundles. This is similar to clustering trajectories of objects in visual surveillance. In this thesis, we develop several unsupervised frameworks to learn motion patterns from complicated and large scale data sets using hierarchical Bayesian models. We explore their applications to activity analysis in far-field visual surveillance and tractography segmentation in medical imaging. Many existing activity analysis approaches in visual surveillance are ad hoc, relying on predefined rules or simple probabilistic models, which prohibits them from modeling complicated activities. Our hierarchical Bayesian models can structure dependency among a large number of variables to model complicated activities. Various constraints and knowledge can be nicely added into a Bayesian framework as priors. When the number of clusters is not well defined in advance, our nonparametric Bayesian models can learn it driven by data with Dirichlet Processes priors.(cont.) In this work, several hierarchical Bayesian models are proposed considering different types of scenes and different settings of cameras. If the scenes are crowded, it is difficult to track objects because of frequent occlusions and difficult to separate different types of co-occurring activities. We jointly model simple activities and complicated global behaviors at different hierarchical levels directly from moving pixels without tracking objects. If the scene is sparse and there is only a single camera view, we first track objects and then cluster trajectories into different activity categories. In the meanwhile, we learn the models of paths commonly taken by objects. Under the Bayesian framework, using the models of activities learned from historical data as priors, the models of activities can be dynamically updated over time. When multiple camera views are used to monitor a large area, by adding a smoothness constraint as a prior, our hierarchical Bayesian model clusters trajectories in multiple camera views without tracking objects across camera views. The topology of multiple camera views is assumed to be unknown and arbitrary. In tractography segmentation, our approach can cluster much larger scale data sets than existing approaches and automatically learn the number of bundles from data. We demonstrate the effectiveness of our approaches on multiple visual surveillance and medical imaging data sets.by Xiaogang Wang.Ph.D

    Trajectory association across multiple airborne cameras

    No full text
    A camera mounted on an aerial vehicle provides an excellent means to monitor large areas of a scene. Utilizing several such cameras on different aerial vehicles allows further flexibility in terms of increased visual scope and in the pursuit of multiple targets. In this paper, we address the problem of associating trajectories across multiple moving airborne cameras. We exploit geometric constraints on the relationship between the motion of each object across cameras without assuming any prior calibration information. Since multiple cameras exist, ensuring coherency in association is an essential requirement, e. g., that transitive closure is maintained between more than two cameras. To ensure such coherency, we pose the problem of maximizing the likelihood function as a k-dimensional matching and use an approximation to find the optimal assignment of association. Using the proposed error function, canonical trajectories of each object and optimal estimates of intercamera transformations ( in a maximum likelihood sense) are computed. Finally, we show that, as a result of associating trajectories across the cameras, under special conditions, trajectories interrupted due to occlusion or missing detections can be repaired. Results are shown on a number of real and controlled scenarios with multiple objects observed by multiple cameras, validating our qualitative models, and, through simulation, quantitative performance is also reported

    Trajectory Association Across Multiple Airborne Cameras

    No full text
    A camera mounted on an aerial vehicle provides an excellent means for monitoring large areas of a scene. Utilizing several such cameras on different aerial vehicles allows further flexibility, in terms of increased visual scope and in the pursuit of multiple targets. In this paper, we address the problem of associating objects across multiple airborne cameras. Since the cameras are moving and often widely separated, direct appearance-based or proximity-based constraints cannot be used. Instead, we exploit geometric constraints on the relationship between the motion of each object across cameras, to test multiple association hypotheses, without assuming any prior calibration information. Given our scene model, we propose a likelihood function for evaluating a hypothesized association between observations in multiple cameras that is geometrically motivated. Since multiple cameras exist, ensuring coherency in association is an essential requirement, e.g. that transitive closure is maintained between more than two cameras. To ensure such coherency we pose the problem of maximizing the likelihood function as a k-dimensional matching and use an approximation to find the optimal assignment of association. Using the proposed error function, canonical trajectories of each object and optimal estimates of inter-camera transformations (in a maximum likelihood sense) are computed. Finally, we show that as a result of associating objects across the cameras, a concurrent visualization of multiple aerial video streams is possible and that, under special conditions, trajectories interrupted due to occlusion or missing detections can be repaired. Results are shown on a number of real and controlled scenarios with multiple objects observed by multiple cameras, validating our qualitative models, and through simulation quantitative performance is also reported. © 2008 IEEE
    corecore