2,482 research outputs found

    EACOFT: an energy-aware correlation filter for visual tracking.

    Get PDF
    Correlation filter based trackers attribute to its calculation in the frequency domain can efficiently locate targets in a relatively fast speed. This characteristic however also limits its generalization in some specific scenarios. The reasons that they still fail to achieve superior performance to state-of-the-art (SOTA) trackers are possibly due to two main aspects. The first is that while tracking the objects whose energy is lower than the background, the tracker may occur drift or even lose the target. The second is that the biased samples may be inevitably selected for model training, which can easily lead to inaccurate tracking. To tackle these shortcomings, a novel energy-aware correlation filter (EACOFT) based tracking method is proposed, in our approach the energy between the foreground and the background is adaptively balanced, which enables the target of interest always having a higher energy than its background. The samples’ qualities are also evaluated in real time, which ensures that the samples used for template training are always helpful with tracking. In addition, we also propose an optimal bottom-up and top-down combined strategy for template training, which plays an important role in improving both the effectiveness and robustness of tracking. As a result, our approach achieves a great improvement on the basis of the baseline tracker, especially under the background clutter and fast motion challenges. Extensive experiments over multiple tracking benchmarks demonstrate the superior performance of our proposed methodology in comparison to a number of the SOTA trackers

    Object Tracking in Video with Part-Based Tracking by Feature Sampling

    Get PDF
    Visual tracking of arbitrary objects is an active research topic in computer vision, with applications across multiple disciplines including video surveillance, activity analysis, robot vision, and human computer interface. Despite great progress having been made in object tracking in recent years, it still remains a challenge to design trackers that can deal with difficult tracking scenarios, such as camera motion, object motion change, occlusion, illumination changes, and object deformation. A promising way of tackling these types of problems is to use a part-based method; one which models and tracks small regions of the object and estimates the location of the object based on the tracked part's positions. These approaches typically model parts of objects with histograms of various hand-crafted features extracted from the region in which the part is located. However, it is unclear how such relatively homogeneous regions should be represented to form an effective part-based tracker. In this thesis we present a part-based tracker that includes a model for object parts that is designed to empirically characterise the underlying colour distribution of an image region, representing it by pairs of randomly selected colour features and counts of how many pixels are similar to each feature. This novel feature representation is used to find probable locations for the part in future frames via a Bhattacharyya Distance-based metric, which is modified to prefer higher quality matches. Sets of candidate patch locations are generated by randomly generating non-shearing affine transformations of the part's previous locations and locally optimising the most likely sets of parts to allow for small intra-frame object deformations. We also present a study of model initialisation in online, model-free tracking and evaluate several techniques for selecting the regions of an image, given a target bounding box most likely to contain an object. The strengths and limitations of the combined tracker are evaluated on the VOT2016 and VOT2018 datasets using their evaluation protocol, which also allows an extensive evaluation of parameter robustness. The presented tracker is ranked first among part-based trackers on the VOT2018 dataset and is particularly robust to changes in object and camera motion, as well as object size changes

    Robust Modular Feature-Based Terrain-Aided Visual Navigation and Mapping

    Get PDF
    The visual feature-based Terrain-Aided Navigation (TAN) system presented in this thesis addresses the problem of constraining inertial drift introduced into the location estimate of Unmanned Aerial Vehicles (UAVs) in GPS-denied environment. The presented TAN system utilises salient visual features representing semantic or human-interpretable objects (roads, forest and water boundaries) from onboard aerial imagery and associates them to a database of reference features created a-priori, through application of the same feature detection algorithms to satellite imagery. Correlation of the detected features with the reference features via a series of the robust data association steps allows a localisation solution to be achieved with a finite absolute bound precision defined by the certainty of the reference dataset. The feature-based Visual Navigation System (VNS) presented in this thesis was originally developed for a navigation application using simulated multi-year satellite image datasets. The extension of the system application into the mapping domain, in turn, has been based on the real (not simulated) flight data and imagery. In the mapping study the full potential of the system, being a versatile tool for enhancing the accuracy of the information derived from the aerial imagery has been demonstrated. Not only have the visual features, such as road networks, shorelines and water bodies, been used to obtain a position ’fix’, they have also been used in reverse for accurate mapping of vehicles detected on the roads into an inertial space with improved precision. Combined correction of the geo-coding errors and improved aircraft localisation formed a robust solution to the defense mapping application. A system of the proposed design will provide a complete independent navigation solution to an autonomous UAV and additionally give it object tracking capability

    Incremental Learning from Low-labelled Stream Data in Open-Set Video Face Recognition

    Get PDF
    [Abstract] Deep Learning approaches have brought solutions, with impressive performance, to general classification problems where wealthy of annotated data are provided for training. In contrast, less progress has been made in continual learning of a set of non-stationary classes, mainly when applied to unsupervised problems with streaming data. Here, we propose a novel incremental learning approach which combines a deep features encoder with an Open-Set Dynamic Ensembles of SVM, to tackle the problem of identifying individuals of interest (IoI) from streaming face data. From a simple weak classifier trained on a few video-frames, our method can use unsupervised operational data to enhance recognition. Our approach adapts to new patterns avoiding catastrophic forgetting and partially heals itself from miss-adaptation. Besides, to better comply with real world conditions, the system was designed to operate in an open-set setting. Results show a benefit of up to 15% F1-score increase respect to non-adaptive state-of-the-art methods.This work has received financial support from the Spanish government (project PID2020-119367RB-I00); from the Xunta de Galicia, Consellaría de Cultura, Educación e Ordenación Universitaria (accreditations 2019-2022 ED431G-2019/04 and ED431G 2019/01, and reference competitive groups 2021-2024 ED431C 2021/48 and ED431C 2021/30), and from the European Regional Development Fund (ERDF). Eric López-López has received financial support from the Xunta de Galicia and the European Union (European Social Fund - ESF)Xunta de Galicia; ED431G-2019/04Xunta de Galicia; and ED431G 2019/01Xunta de Galicia; ED431C 2021/48Xunta de Galicia; ED431C 2021/3

    MonoSLAM: Real-time single camera SLAM

    No full text
    Published versio

    Robust Face Tracking in Video Sequences

    Get PDF
    Ce travail présente une analyse et une discussion détaillées d’un nouveau système de suivi des visages qui utilise plusieurs modèles d’apparence ainsi qu’un e approche suivi par détection. Ce système peut aider un système de reconnaissance de visages basé sur la vidéo en donnant des emplacements de visages d’individus spécifiques (région d’intérêt, ROI) pour chaque cadre. Un système de reconnaissance faciale peut utiliser les ROI fournis par le suivi du visage pour obtenir des preuves accumulées de la présence d’une personne d’une personne présente dans une vidéo, afin d’identifier une personne d’intérêt déjà inscrite dans le système de reconnaissance faciale. La tâche principale d’une méthode de suivi est de trouver l’emplacement d’un visage présent dans une image en utilisant des informations de localisation à partir de la trame précédente. Le processus de recherche se fait en trouvant la meilleure région qui maximise la possibilité d’un visage présent dans la trame en comparant la région avec un modèle d’apparence du visage. Cependant, au cours de ce processus, plusieurs facteurs externes nuisent aux performances d’une méthode de suivi. Ces facteurs externes sont qualifiés de nuisances et apparaissent habituellement sous la forme d’une variation d’éclairage, d’un encombrement de la scène, d’un flou de mouvement, d’une occlusion partielle, etc. Ainsi, le principal défi pour une méthode de suivi est de trouver la meilleure région malgré les changements d’apparence fréquents du visage pendant le processus de suivi. Étant donné qu’il n’est pas possible de contrôler ces nuisances, des modèles d’apparence faciale robustes sont conçus et développés de telle sorte qu’ils soient moins affectés par ces nuisances et peuvent encore suivre un visage avec succès lors de ces scénarios. Bien qu’un modèle d’apparence unique puisse être utilisé pour le suivi d’un visage, il ne peut pas s’attaquer à toutes les nuisances de suivi. Par conséquent, la méthode proposée utilise plusieurs modèles d’apparence faciale pour s’attaquer à ces nuisances. En outre, la méthode proposée combine la méthodologie du suivi par détection en employant un détecteur de visage qui fournit des rectangles englobants pour chaque image. Par conséquent, le détecteur de visage aide la méthode de suivi à aborder les nuisances de suivi. De plus, un détecteur de visage contribue à la réinitialisation du suivi pendant un cas de dérive. Cependant, la précision suivi peut encore être améliorée en générant des candidats additionnels autour de l’estimation de la position de l’objet par la méthode de suivi et en choisissant le meilleur parmi eux. Ainsi, dans la méthode proposée, le suivi du visage est formulé comme le visage candidat qui maximise la similitude de tous les modèles d’apparence.----------ABSTRACT: This work presents a detailed analysis and discussion of a novel face tracking system that utilizes multiple appearance models along with a tracking-by-detection framework that can aid a video-based face recognition system by giving face locations of specific individuals (Region Of Interest, ROI) for every frame. A face recognition system can utilize the ROIs provided by the face tracker to get accumulated evidence of a person being present in a video, in order to identify a person of interest that is already enrolled in the face recognition system. The primary task of a face tracker is to find the location of a face present in an image by utilizing its location information from the previous frame. The searching process is done by finding the best region that maximizes the possibility of a face being present in the frame by comparing the region with a face appearance model. However, during this face search, several external factors inhibit the performance of a face tracker. These external factors are termed as tracking nuisances, and usually appear in the form of illumination variation, background clutter, motion blur, partial occlusion, etc. Thus, the main challenge for a face tracker is to find the best region in spite of frequent appearance changes of the face during the tracking process. Since, it is not possible to control these nuisances. Robust face appearance models are designed and developed such that they do not too much affected by these nuisances and still can track a face successfully during such scenarios. Although a single face appearance model can be used for tracking a face, it cannot tackle all the tracking nuisances. Hence, the proposed method utilizes multiple face appearance models. By doing this, different appearance models can facilitate tracking in the presence of tracking nuisances. In addition, the proposed method, combines the tracking-by-detection methodology by employing a face detector that outputs a bounding box for every frame. Therefore, the face detector aids the face tracker in tackling the tracking nuisances. In addition, a face detector aids in the re-initialization of the tracker during tracking drift. However, the precision of the tracker can further be improved by generating face candidates around the face tracking output and choosing the best among them. Thus, in the proposed method, face tracking is formulated as the face candidate that maximizes the similarity of all the appearance models

    Monocular slam for deformable scenarios.

    Get PDF
    El problema de localizar la posición de un sensor en un mapa incierto que se estima simultáneamente se conoce como Localización y Mapeo Simultáneo --SLAM--. Es un problema desafiante comparable al paradigma del huevo y la gallina. Para ubicar el sensor necesitamos conocer el mapa, pero para construir el mapa, necesitamos la posición del sensor. Cuando se utiliza un sensor visual, por ejemplo, una cámara, se denomina Visual SLAM o VSLAM. Los sensores visuales para SLAM se dividen entre los que proporcionan información de profundidad (por ejemplo, cámaras RGB-D o equipos estéreo) y los que no (por ejemplo, cámaras monoculares o cámaras de eventos). En esta tesis hemos centrado nuestra investigación en SLAM con cámaras monoculares.Debido a la falta de percepción de profundidad, el SLAM monocular es intrínsecamente más duro en comparación con el SLAM con sensores de profundidad. Los trabajos estado del arte en VSLAM monocular han asumido normalmente que la escena permanece rígida durante toda la secuencia, lo que es una suposición factible para entornos industriales y urbanos. El supuesto de rigidez aporta las restricciones suficientes al problema y permite reconstruir un mapa fiable tras procesar varias imágenes. En los últimos años, el interés por el SLAM ha llegado a las áreas médicas donde los algoritmos SLAM podrían ayudar a orientar al cirujano o localizar la posición de un robot. Sin embargo, a diferencia de los escenarios industriales o urbanos, en secuencias dentro del cuerpo, todo puede deformarse eventualmente y la suposición de rigidez acaba siendo inválida en la práctica, y por extensión, también los algoritmos de SLAM monoculares. Por lo tanto, nuestro objetivo es ampliar los límites de los algoritmos de SLAM y concebir el primer sistema SLAM monocular capaz de hacer frente a la deformación de la escena.Los sistemas de SLAM actuales calculan la posición de la cámara y la estructura del mapa en dos subprocesos concurrentes: la localización y el mapeo. La localización se encarga de procesar cada imagen para ubicar el sensor de forma continua, en cambio el mapeo se encarga de construir el mapa de la escena. Nosotros hemos adoptado esta estructura y concebimos tanto la localización deformable como el mapeo deformable ahora capaces de recuperar la escena incluso con deformación.Nuestra primera contribución es la localización deformable. La localización deformable utiliza la estructura del mapa para recuperar la pose de la cámara con una única imagen. Simultáneamente, a medida que el mapa se deforma durante la secuencia, también recupera la deformación del mapa para cada fotograma. Hemos propuesto dos familias de localización deformable. En el primer algoritmo de localización deformable, asumimos que todos los puntos están embebidos en una superficie denominada plantilla. Podemos recuperar la deformación de la superficie gracias a un modelo de deformación global que permite estimar la deformación más probable del objeto. Con nuestro segundo algoritmo de localización deformable, demostramos que es posible recuperar la deformación del mapa sin un modelo de deformación global, representando el mapa como surfels individuales. Nuestros resultados experimentales mostraron que, recuperando la deformación del mapa, ambos métodos superan tanto en robustez como en precisión a los métodos rígidos.Nuestra segunda contribución es la concepción del mapeo deformable. Es el back-end del algoritmo SLAM y procesa un lote de imágenes para recuperar la estructura del mapa para todas las imágenes y hacer crecer el mapa ensamblando las observaciones parciales del mismo. Tanto la localización deformable como el mapeo que se ejecutan en paralelo y juntos ensamblan el primer SLAM monocular deformable: \emph{DefSLAM}. Una evaluación ampliada de nuestro método demostró, tanto en secuencias controladas por laboratorio como en secuencias médicas, que nuestro método procesa con éxito secuencias en las que falla el sistema monocular SLAM actual.Nuestra tercera contribución son dos métodos para explotar la información fotométrica en SLAM monocular deformable. Por un lado, SD-DefSLAM que aprovecha el emparejamiento semi-directo para obtener un emparejamiento mucho más fiable de los puntos del mapa en las nuevas imágenes, como consecuencia, se demostró que es más robusto y estable en secuencias médicas. Por otro lado, proponemos un método de Localización Deformable Directa y Dispersa en el que usamos un error fotométrico directo para rastrear la deformación de un mapa modelado como un conjunto de surfels 3D desconectados. Podemos recuperar la deformación de múltiples superficies desconectadas, deformaciones no isométricas o superficies con una topología cambiante.<br /

    Visual Tracking: An Experimental Survey

    Get PDF
    There is a large variety of trackers, which have been proposed in the literature during the last two decades with some mixed success. Object tracking in realistic scenarios is difficult problem, therefore it remains a most active area of research in Computer Vision. A good tracker should perform well in a large number of videos involving illumination changes, occlusion, clutter, camera motion, low contrast, specularities and at least six more aspects. However, the performance of proposed trackers have been evaluated typically on less than ten videos, or on the special purpose datasets. In this paper, we aim to evaluate trackers systematically and experimentally on 315 video fragments covering above aspects. We selected a set of nineteen trackers to include a wide variety of algorithms often cited in literature, supplemented with trackers appearing in 2010 and 2011 for which the code was publicly available. We demonstrate that trackers can be evaluated objectively by survival curves, Kaplan Meier statistics, and Grubs testing. We find that in the evaluation practice the F-score is as effective as the object tracking accuracy (OTA) score. The analysis under a large variety of circumstances provides objective insight into the strengths and weaknesses of trackers
    • …
    corecore