1,003 research outputs found

    Bayesian-based techniques for tracking multiple humans in an enclosed environment

    Get PDF
    This thesis deals with the problem of online visual tracking of multiple humans in an enclosed environment. The focus is to develop techniques to deal with the challenges of varying number of targets, inter-target occlusions and interactions when every target gives rise to multiple measurements (pixels) in every video frame. This thesis contains three different contributions to the research in multi-target tracking. Firstly, a multiple target tracking algorithm is proposed which focuses on mitigating the inter-target occlusion problem during complex interactions. This is achieved with the help of a particle filter, multiple video cues and a new interaction model. A Markov chain Monte Carlo particle filter (MCMC-PF) is used along with a new interaction model which helps in modeling interactions of multiple targets. This helps to overcome tracking failures due to occlusions. A new weighted Markov chain Monte Carlo (WMCMC) sampling technique is also proposed which assists in achieving a reduced tracking error. Although effective, to accommodate multiple measurements (pixels) produced by every target, this technique aggregates measurements into features which results in information loss. In the second contribution, a novel variational Bayesian clustering-based multi-target tracking framework is proposed which can associate multiple measurements to every target without aggregating them into features. It copes with complex inter-target occlusions by maintaining the identity of targets during their close physical interactions and handles efficiently a time-varying number of targets. The proposed multi-target tracking framework consists of background subtraction, clustering, data association and particle filtering. A variational Bayesian clustering technique groups the extracted foreground measurements while an improved feature based joint probabilistic data association filter (JPDAF) is developed to associate clusters of measurements to every target. The data association information is used within the particle filter to track multiple targets. The clustering results are further utilised to estimate the number of targets. The proposed technique improves the tracking accuracy. However, the proposed features based JPDAF technique results in an exponential growth of computational complexity of the overall framework with increase in number of targets. In the final work, a novel data association technique for multi-target tracking is proposed which more efficiently assigns multiple measurements to every target, with a reduced computational complexity. A belief propagation (BP) based cluster to target association method is proposed which exploits the inter-cluster dependency information. Both location and features of clusters are used to re-identify the targets when they emerge from occlusions. The proposed techniques are evaluated on benchmark data sets and their performance is compared with state-of-the-art techniques by using, quantitative and global performance measures

    Audio‐Visual Speaker Tracking

    Get PDF
    Target motion tracking found its application in interdisciplinary fields, including but not limited to surveillance and security, forensic science, intelligent transportation system, driving assistance, monitoring prohibited area, medical science, robotics, action and expression recognition, individual speaker discrimination in multi‐speaker environments and video conferencing in the fields of computer vision and signal processing. Among these applications, speaker tracking in enclosed spaces has been gaining relevance due to the widespread advances of devices and technologies and the necessity for seamless solutions in real‐time tracking and localization of speakers. However, speaker tracking is a challenging task in real‐life scenarios as several distinctive issues influence the tracking process, such as occlusions and an unknown number of speakers. One approach to overcome these issues is to use multi‐modal information, as it conveys complementary information about the state of the speakers compared to single‐modal tracking. To use multi‐modal information, several approaches have been proposed which can be classified into two categories, namely deterministic and stochastic. This chapter aims at providing multimedia researchers with a state‐of‐the‐art overview of tracking methods, which are used for combining multiple modalities to accomplish various multimedia analysis tasks, classifying them into different categories and listing new and future trends in this field

    Multi-Target Tracking and Occlusion Handling with Learned Variational Bayesian Clusters and a Social Force Model

    Get PDF
    This paper considers the problem of tracking multiple humans in video. A solution is proposed which is able to deal with the challenges of a varying number of targets, occlusions and interactions when every target gives rise to multiple measurements. The developed novel framework comprises a variational Bayesian clustering approach combined with a social force model. It can handle a time-varying number of targets and can cope with complex inter-target occlusions by maintaining the identities of targets during their close physical interactions. It performs measurement to-target association by automatically detecting the measurement relevance. A variational Bayesian clustering technique clusters the measurements and thereby provides an elegant solution to the measurement origin uncertainty problem. A particle filter is developed in which the clustering algorithm and the social force model enhance the prediction step. The performance of the proposed framework is evaluated over several sequences from publicly available data sets: AV16.3, CAVIAR and PETS2006, which demonstrate that the proposed framework successfully initializes and tracks a variable number of human targets in the presence of complex occlusions. The proposed algorithm is compared with state-of-the-art techniques due to Khan et al., Laet et al. and Czyz et al. and is shown to achieve improved tracking performance

    BEYOND MULTI-TARGET TRACKING: STATISTICAL PATTERN ANALYSIS OF PEOPLE AND GROUPS

    Get PDF
    Ogni giorno milioni e milioni di videocamere monitorano la vita quotidiana delle persone, registrando e collezionando una grande quantit\ue0 di dati. Questi dati possono essere molto utili per scopi di video-sorveglianza: dalla rilevazione di comportamenti anomali all'analisi del traffico urbano nelle strade. Tuttavia i dati collezionati vengono usati raramente, in quanto non \ue8 pensabile che un operatore umano riesca a esaminare manualmente e prestare attenzione a una tale quantit\ue0 di dati simultaneamente. Per questo motivo, negli ultimi anni si \ue8 verificato un incremento della richiesta di strumenti per l'analisi automatica di dati acquisiti da sistemi di video-sorveglianza in modo da estrarre informazione di pi\uf9 alto livello (per esempio, John, Sam e Anne stanno camminando in gruppo al parco giochi vicino alla stazione) a partire dai dati a disposizione che sono solitamente a basso livello e ridondati (per esempio, una sequenza di immagini). L'obiettivo principale di questa tesi \ue8 quello di proporre soluzioni e algoritmi automatici che permettono di estrarre informazione ad alto livello da una zona di interesse che viene monitorata da telecamere. Cos\uec i dati sono rappresentati in modo da essere facilmente interpretabili e analizzabili da qualsiasi persona. In particolare, questo lavoro \ue8 focalizzato sull'analisi di persone e i loro comportamenti sociali collettivi. Il titolo della tesi, beyond multi-target tracking, evidenzia lo scopo del lavoro: tutti i metodi proposti in questa tesi che si andranno ad analizzare hanno come comune denominatore il target tracking. Inoltre andremo oltre le tecniche standard per arrivare a una rappresentazione del dato a pi\uf9 alto livello. Per prima cosa, analizzeremo il problema del target tracking in quanto \ue8 alle basi di questo lavoro. In pratica, target tracking significa stimare la posizione di ogni oggetto di interesse in un immagine e la sua traiettoria nel tempo. Analizzeremo il problema da due prospettive complementari: 1) il punto di vista ingegneristico, dove l'obiettivo \ue8 quello di creare algoritmi che ottengono i risultati migliori per il problema in esame. 2) Il punto di vista della neuroscienza: motivati dalle teorie che cercano di spiegare il funzionamento del sistema percettivo umano, proporremo in modello attenzionale per tracking e il riconoscimento di oggetti e persone. Il secondo problema che andremo a esplorare sar\ue0 l'estensione del tracking alla situazione dove pi\uf9 telecamere sono disponibili. L'obiettivo \ue8 quello di mantenere un identificatore univoco per ogni persona nell'intera rete di telecamere. In altre parole, si vuole riconoscere gli individui che vengono monitorati in posizioni e telecamere diverse considerando un database di candidati. Tale problema \ue8 chiamato in letteratura re-indetificazione di persone. In questa tesi, proporremo un modello standard di come affrontare il problema. In questo modello, presenteremo dei nuovi descrittori di aspetto degli individui, in quanto giocano un ruolo importante allo scopo di ottenere i risultati migliori. Infine raggiungeremo il livello pi\uf9 alto di rappresentazione dei dati che viene affrontato in questa tesi, che \ue8 l'analisi di interazioni sociali tra persone. In particolare, ci focalizzeremo in un tipo specifico di interazione: il raggruppamento di persone. Proporremo dei metodi di visione computazionale che sfruttano nozioni di psicologia sociale per rilevare gruppi di persone. Inoltre, analizzeremo due modelli probabilistici che affrontano il problema di tracking (congiunto) di gruppi e individui.Every day millions and millions of surveillance cameras monitor the world, recording and collecting huge amount of data. The collected data can be extremely useful: from the behavior analysis to prevent unpleasant events, to the analysis of the traffic. However, these valuable data is seldom used, because of the amount of information that the human operator has to manually attend and examine. It would be like looking for a needle in the haystack. The automatic analysis of data is becoming mandatory for extracting summarized high-level information (e.g., John, Sam and Anne are walking together in group at the playground near the station) from the available redundant low-level data (e.g., an image sequence). The main goal of this thesis is to propose solutions and automatic algorithms that perform high-level analysis of a camera-monitored environment. In this way, the data are summarized in a high-level representation for a better understanding. In particular, this work is focused on the analysis of moving people and their collective behaviors. The title of the thesis, beyond multi-target tracking, mirrors the purpose of the work: we will propose methods that have the target tracking as common denominator, and go beyond the standard techniques in order to provide a high-level description of the data. First, we investigate the target tracking problem as it is the basis of all the next work. Target tracking estimates the position of each target in the image and its trajectory over time. We analyze the problem from two complementary perspectives: 1) the engineering point of view, where we deal with problem in order to obtain the best results in terms of accuracy and performance. 2) The neuroscience point of view, where we propose an attentional model for tracking and recognition of objects and people, motivated by theories of the human perceptual system. Second, target tracking is extended to the camera network case, where the goal is to keep a unique identifier for each person in the whole network, i.e., to perform person re-identification. The goal is to recognize individuals in diverse locations over different non-overlapping camera views or also the same camera, considering a large set of candidates. In this context, we propose a pipeline and appearance-based descriptors that enable us to define in a proper way the problem and to reach the-state-of-the-art results. Finally, the higher level of description investigated in this thesis is the analysis (discovery and tracking) of social interaction between people. In particular, we focus on finding small groups of people. We introduce methods that embed notions of social psychology into computer vision algorithms. Then, we extend the detection of social interaction over time, proposing novel probabilistic models that deal with (joint) individual-group tracking

    Fuzzy region assignment for visual tracking

    Get PDF
    In this work we propose a new approach based on fuzzy concepts and heuristic reasoning to deal with the visual data association problem in real time, considering the particular conditions of the visual data segmented from images, and the integration of higher-level information in the tracking process such as trajectory smoothness, consistency of information, and protection against predictable interactions such as overlap/occlusion, etc. The objects' features are estimated from the segmented images using a Bayesian formulation, and the regions assigned to update the tracks are computed through a fuzzy system to integrate all the information. The algorithm is scalable, requiring linear computing resources with respect to the complexity of scenarios, and shows competitive performance with respect to other classical methods in which the number of evaluated alternatives grows exponentially with the number of objects.Research supported by projects CICYT TIN2008-06742-C02-02/TSI, CICYT TEC2008-06732-C02-02/TEC, SINPROB and CAM MADRINET S-0505/TIC/0255.publicad

    Object Tracking: Appearance Modeling And Feature Learning

    Get PDF
    Object tracking in real scenes is an important problem in computer vision due to increasing usage of tracking systems day in and day out in various applications such as surveillance, security, monitoring and robotic vision. Object tracking is the process of locating objects of interest in every frame of video frames. Many systems have been proposed to address the tracking problem where the major challenges come from handling appearance variation during tracking caused by changing scale, pose, rotation, illumination and occlusion. In this dissertation, we address these challenges by introducing several novel tracking techniques. First, we developed a multiple object tracking system that deals specially with occlusion issues. The system depends on our improved KLT tracker for accurate and robust tracking during partial occlusion. In full occlusion, we applied a Kalman filter to predict the object\u27s new location and connect the trajectory parts. Many tracking methods depend on a rectangle or an ellipse mask to segment and track objects. Typically, using a larger or smaller mask will lead to loss of tracked objects. Second, we present an object tracking system (SegTrack) that deals with partial and full occlusions by employing improved segmentation methods: mixture of Gaussians and a silhouette segmentation algorithm. For re-identification, one or more feature vectors for each tracked object are used after target reappearing. Third, we propose a novel Bayesian Hierarchical Appearance Model (BHAM) for robust object tracking. Our idea is to model the appearance of a target as combination of multiple appearance models, each covering the target appearance changes under a certain situation (e.g. view angle). In addition, we built an object tracking system by integrating BHAM with background subtraction and the KLT tracker for static camera videos. For moving camera videos, we applied BHAM to cluster negative and positive target instances. As tracking accuracy depends mainly on finding good discriminative features to estimate the target location, finally, we propose to learn good features for generic object tracking using online convolutional neural networks (OCNN). In order to learn discriminative and stable features for tracking, we propose a novel object function to train OCNN by penalizing the feature variations in consecutive frames, and the tracker is built by integrating OCNN with a color-based multi-appearance model. Our experimental results on real-world videos show that our tracking systems have superior performance when compared with several state-of-the-art trackers. In the feature, we plan to apply the Bayesian Hierarchical Appearance Model (BHAM) for multiple objects tracking

    Efficient Min-cost Flow Tracking with Bounded Memory and Computation

    Get PDF
    This thesis is a contribution to solving multi-target tracking in an optimal fashion for real-time demanding computer vision applications. We introduce a challenging benchmark, recorded with our autonomous driving platform AnnieWAY. Three main challenges of tracking are addressed: Solving the data association (min-cost flow) problem faster than standard solvers, extending this approach to an online setting, and making it real-time capable by a tight approximation of the optimal solution

    A probabilistic integrated object recognition and tracking framework for video sequences

    Get PDF
    Recognition and tracking of multiple objects in video sequences is one of the main challenges in computer vision that currently deserves a lot of attention from researchers. Almost all the reported approaches are very application-dependent and there is a lack of a general methodology for dynamic object recognition and tracking that can be instantiated in particular cases. In this thesis, the work is oriented towards the definition and development of such a methodology which integrates object recognition and tracking from a general perspective using a probabilistic framework called PIORT (probabilistic integrated object recognition and tracking framework). It include some modules for which a variety of techniques and methods can be applied. Some of them are well-known but other methods have been designed, implemented and tested during the development of this thesis.The first step in the proposed framework is a static recognition module that provides class probabilities for each pixel of the image from a set of local features. These probabilities are updated dynamically and supplied to a tracking decision module capable of handling full and partial occlusions. The two specific methods presented use RGB colour features and differ in the classifier implemented: one is a Bayesian method based on maximum likelihood and the other one is based on a neural network. The experimental results obtained have shown that, on one hand, the neural net based approach performs similarly and sometimes better than the Bayesian approach when they are integrated within the tracking framework. And on the other hand, our PIORT methods have achieved better results when compared to other published tracking methods. All these methods have been tested experimentally in several test video sequences taken with still and moving cameras and including full and partial occlusions of the tracked object in indoor and outdoor scenarios in a variety of cases with different levels of task complexity. This allowed the evaluation of the general methodology and the alternative methods that compose these modules.A Probabilistic Integrated Object Recognition and Tracking Framework for Video SequencesEl reconocimiento y seguimiento de múltiples objetos en secuencias de vídeo es uno de los principales desafíos en visión por ordenador que actualmente merece mucha atención de los investigadores. Casi todos los enfoques reportados son muy dependientes de la aplicación y hay carencia de una metodología general para el reconocimiento y seguimiento dinámico de objetos, que pueda ser instanciada en casos particulares. En esta tesis, el trabajo esta orientado hacia la definición y desarrollo de tal metodología, la cual integra reconocimiento y seguimiento de objetos desde una perspectiva general usando un marco probabilístico de trabajo llamado PIORT (Probabilistic Integrated Object Recognition and Tracking). Este incluye algunos módulos para los que se puede aplicar una variedad de técnicas y métodos. Algunos de ellos son bien conocidos, pero otros métodos han sido diseñados, implementados y probados durante el desarrollo de esta tesis.El primer paso en el marco de trabajo propuesto es un módulo estático de reconocimiento que provee probabilidades de clase para cada píxel de la imagen desde un conjunto de características locales. Estas probabilidades son actualizadas dinámicamente y suministradas a un modulo decisión de seguimiento capaz de manejar oclusiones parciales o totales. Se presenta dos métodos específicos usando características de color RGB pero diferentes en la implementación del clasificador: uno es un método Bayesiano basado en la máxima verosimilitud y el otro método está basado en una red neuronal. Los resultados experimentales obtenidos han mostrado que, por una parte, el enfoque basado en la red neuronal funciona similarmente y algunas veces mejor que el enfoque bayesiano cuando son integrados dentro del marco probabilístico de seguimiento. Por otra parte, nuestro método PIORT ha alcanzado mejores resultados comparando con otros métodos de seguimiento publicados. Todos estos métodos han sido probados experimentalmente en varias secuencias de vídeo tomadas con cámaras fijas y móviles incluyendo oclusiones parciales y totales del objeto a seguir, en ambientes interiores y exteriores, en diferentes tareas y niveles de complejidad. Esto ha permitido evaluar tanto la metodología general como los métodos alternativos que componen sus módulos
    corecore