22 research outputs found

    Discriminative Scale Space Tracking

    Full text link
    Accurate scale estimation of a target is a challenging research problem in visual object tracking. Most state-of-the-art methods employ an exhaustive scale search to estimate the target size. The exhaustive search strategy is computationally expensive and struggles when encountered with large scale variations. This paper investigates the problem of accurate and robust scale estimation in a tracking-by-detection framework. We propose a novel scale adaptive tracking approach by learning separate discriminative correlation filters for translation and scale estimation. The explicit scale filter is learned online using the target appearance sampled at a set of different scales. Contrary to standard approaches, our method directly learns the appearance change induced by variations in the target scale. Additionally, we investigate strategies to reduce the computational cost of our approach. Extensive experiments are performed on the OTB and the VOT2014 datasets. Compared to the standard exhaustive scale search, our approach achieves a gain of 2.5% in average overlap precision on the OTB dataset. Additionally, our method is computationally efficient, operating at a 50% higher frame rate compared to the exhaustive scale search. Our method obtains the top rank in performance by outperforming 19 state-of-the-art trackers on OTB and 37 state-of-the-art trackers on VOT2014.Comment: To appear in TPAMI. This is the journal extension of the VOT2014-winning DSST tracking metho

    Проектирование основной гидросистемы запуска газотурбинной установки

    Get PDF
    Robust and accurate visual tracking is one of the most challenging computer vision problems. Due to the inherent lack of training data, a robust approach for constructing a target appearance model is crucial. Recently, discriminatively learned correlation filters (DCF) have been successfully applied to address this problem for tracking. These methods utilize a periodic assumption of the training samples to efficiently learn a classifier on all patches in the target neighborhood. However, the periodic assumption also introduces unwanted boundary effects, which severely degrade the quality of the tracking model. We propose Spatially Regularized Discriminative Correlation Filters (SRDCF) for tracking. A spatial regularization component is introduced in the learning to penalize correlation filter coefficients depending on their spatial location. Our SRDCF formulation allows the correlation filters to be learned on a significantly larger set of negative training samples, without corrupting the positive samples. We further propose an optimization strategy, based on the iterative Gauss-Seidel method, for efficient online learning of our SRDCF. Experiments are performed on four benchmark datasets: OTB-2013, ALOV++, OTB-2015, and VOT2014. Our approach achieves state-of-the-art results on all four datasets. On OTB-2013 and OTB-2015, we obtain an absolute gain of 8.0% and 8.2% respectively, in mean overlap precision, compared to the best existing trackers

    The Ninth Visual Object Tracking VOT2021 Challenge Results

    Get PDF
    acceptedVersionPeer reviewe

    Learning visual perception for autonomous systems

    No full text
    In the last decade, developments in hardware, sensors and software have made it possible to create increasingly autonomous systems. These systems can be as simple as limited driver assistance software lane-following in cars, or limited collision warning systems for otherwise manually piloted drones. On the other end of the spectrum exist fully autonomous cars, boats or helicopters. With increasing abilities to function autonomously, the demands to operate with minimal human supervision in unstructured environments increase accordingly. Common to most, if not all, autonomous systems is that they require an accurate model of the surrounding world. While there is currently a large number of possible sensors useful to create such models available, cameras are one of the most versatile. From a sensing perspective cameras have several advantages over other sensors in that they require no external infrastructure, are relatively cheap and can be used to extract such information as the relative positions of other objects, their movements over time, create accurate maps and locate the autonomous system within these maps. Using cameras to produce a model of the surroundings require solving a number of technical problems. Often these problems have a basis in recognizing that an object or region of interest is the same over time or in novel viewpoints. In visual tracking this type of recognition is required to follow an object of interest through a sequence of images. In geometric problems it is often a requirement to recognize corresponding image regions in order to perform 3D reconstruction or localization.  The first set of contributions in this thesis is related to the improvement of a class of on-line learned visual object trackers based on discriminative correlation filters. In visual tracking estimation of the objects size is important for reliable tracking, the first contribution in this part of the thesis investigates this problem. The performance of discriminative correlation filters is highly dependent on what feature representation is used by the filter. The second tracking contribution investigates the performance impact of different features derived from a deep neural network. A second set of contributions relate to the evaluation of visual object trackers. The first of these are the visual object tracking challenge. This challenge is a yearly comparison of state-of-the art visual tracking algorithms. A second contribution is an investigation into the possible issues when using bounding-box representations for ground-truth data. In real world settings tracking typically occur over longer time sequences than is common in benchmarking datasets. In such settings it is common that the model updates of many tracking algorithms cause the tracker to fail silently. For this reason it is important to have an estimate of the trackers performance even in cases when no ground-truth annotations exist. The first of the final three contributions investigates this problem in a robotics setting, by fusing information from a pre-trained object detector in a state-estimation framework. An additional contribution describes how to dynamically re-weight the data used for the appearance model of a tracker. A final contribution investigates how to obtain an estimate of how certain detections are in a setting where geometrical limitations can be imposed on the search region. The proposed solution learns to accurately predict stereo disparities along with accurate assessments of each predictions certainty.De senaste årens allt snabbare utveckling av beräkningshårdvara, sensorer och mjukvarutekniker har gjort det möjligt att skapa allt mer autonoma system. Sådana kan variera i autonomigrad från ett antisladdsystem för en i övrigt manuellt kontrollerad bil, till system för kollisionsundvikning i en manuellt kontrollerad drönare, till en helt autonom bil eller annan farkost. Med en ökande förmåga att arbeta självständigt utan mänsklig övervakning ökar också bredden på möjliga situationer som systemen förväntas hantera.  Gemensamt för många, om inte alla, autonoma system är att de behöver en korrekt och updaterad bild av sin omgivning för att kunna agera på ett intelligent sätt. En lång rad av sensorer som gör detta möjligt finns tillgängliga, där kameror är en av de mest mångsidiga. Jämfört med andra typer av sensorer har kameror en rad fördelar, som att de är relativt billiga, passiva, och kan användas utan krav på extern infrastruktur. Det visuella data som kameror genererar kan användas för att följa externa objekt, bestämma positionen för kameran själv, eller beräkna avstånd.  Att framgångsrikt utnyttja möjligheterna i denna information kräver dock att en lång rad tekniska problem hanteras. Många av dessa problem är grundar sig i att kunna känna igen att två bildregioner från olika tidpunkter eller betraktningsvinklar avbildar samma sak.  Ett typexempel på ett sådant problem är det visuella följningsproblemet. I det visuella följningsproblemet är målet att bestämma ett objekts position och storlek för alla bilder i en sekvens av bilder. I allmänhet är objektets utseende inte känt av algoritmen, utan en utseendemodell måste skapas succesivt med hjälp av maskininlärning.  Problem som liknar detta förekommer inom många andra områden av datorseende, speciellt inom geometri. Inom många geometriska problem krävs det till exempel att man finner korresponderande punkter i ett flertal bilder.  Den första samlingen av bidrag i denna avhandling behandlar det visuella följningsproblemet. De föreslagna metoderna är baserade på en adaptiv utseendemodell kallad diskriminativa korrelationsfilter. I det första bidraget till sådana metoder utökas ramverket till att skatta ett objekts storlek såväl som position. Ett andra bidrag undersöker hur korrelationsfilterbaserade metoder kan utökas till att även utnyttja visuella särdrag som har framställt med hjälp av maskininlärning.  En andra samling med bidrag behandlar utvärdering av metoder för visuell följning. Dels inom den årligt förekommande tävlingen visual object tracking challenge. Ett andra bidrag till utvärderingsmetodig inom visuell följning syftar till att unvdika fallgropar som lätt uppkommer då metoder anpassas allt för väl för måtten som används för att utvärdera dem.  En tredje samling med bidrag relaterar till olika sätt att hantera situationer då inlärningsprocessen i de tidigare beskrivna följningsmetoderna introducerar felaktiga data i modellen. Detta görs i ett första bidrag i ett robotiksystem för följning av människor i en ostrukturerad miljö. Ett andra bidrag är baserat på dynamisk omviktning av tidigare samlad data för att dynamiskt vikta ned datapunkter som inte representerar det följda objektet väl. I ett sista bidrag undersöks hur en prediktions osäkerhet kan skattas samtidigt som prediktionen själv

    Förbättring av korrelationsfilter för visuell följning

    No full text
    Generic visual tracking is one of the classical problems in computer vision. In this problem, no prior knowledge of the target is available aside from a bounding box in the initial frame of the sequence. The generic visual tracking is a difficult task due to a number of factors such as momentary occlusions, target rotations, changes in target illumination and variations in the target size. In recent years, discriminative correlation filter (DCF) based trackers have shown promising results for visual tracking. These DCF based methods use the Fourier transform to efficiently calculate detection and model updates, allowing significantly higher frame rates than competing methods. However, existing DCF based methods only estimate translation of the object while ignoring changes in size.This thesis investigates the problem of accurately estimating the scale variations within a DCF based framework. A novel scale estimation method is proposed by explicitly constructing translation and scale filters. The proposed scale estimation technique is robust and significantly improve the tracking performance, while operating at real-time. In addition, a comprehensive evaluation of feature representations in a DCF framework is performed. Experiments are performed on the benchmark OTB-2015 dataset, as well as the VOT 2014 dataset. The proposed methods are shown to significantly improve the performance of existing DCF based trackers.Allmän visuell följning är ett klassiskt problem inom datorseende. I den vanliga formuleringen antas ingen förkunskap om objektet som skall följas, utöver en initial rektangel i en videosekvens första bild.Detta är ett mycket svårt problem att lösa allmänt på grund av occlusioner, rotationer, belysningsförändringar och variationer i objektets uppfattde storlek. På senare år har följningsmetoder baserade på diskriminativea korrelationsfilter gett lovande resultat inom området. Dessa metoder är baserade på att med hjälp av Fourertransformen effektivt beräkna detektioner och modellupdateringar, samtidigt som de har mycket bra prestanda och klarar av många hundra bilder per sekund. De nuvarande metoderna uppskattar dock bara translationen hos det följda objektet, medans skalförändringar ignoreras. Detta examensarbete utvärderar ett antal metoder för att göra skaluppskattningar inom ett korrelationsfilterramverk. En innovativ metod baserad på att konstruera separata skal och translationsfilter. Den föreslagna metoden är robust och har signifikant bättre följningsprestanda, samtidigt som den kan användas i realtid. Det utförs också en utvärdering av olika särdragsrepresentationer på två stora benchmarking dataset för följning

    Predicting Disparity Distributions

    No full text
    We investigate a novel deep-learning-based approach to estimate uncertainty in stereo disparity prediction networks. Current state-of-the-art methods often formulate disparity prediction as a regression problem with a single scalar output in each pixel. This can be problematic in practical applications as in many cases there might not exist a single well defined disparity, for example in cases of occlusions or at depth-boundaries. While current neural-network-based disparity estimation approaches  obtain good performance on benchmarks, the disparity prediction is treated as a black box at inference time. In this paper we show that by formulating the learning problem as a regression with a distribution target, we obtain a robust estimate of the uncertainty in each pixel, while maintaining the performance of the original method. The proposed method is evaluated both on a large-scale standard benchmark, as well on our own data. We also show that the uncertainty estimate significantly improves by maximizing the uncertainty in those pixels that have no well defined disparity during learning

    Countering bias in tracking evaluations

    No full text
    Recent years have witnessed a significant leap in visual object tracking performance mainly due to powerfulfeatures, sophisticated learning methods and the introduction of benchmark datasets. Despite this significantimprovement, the evaluation of state-of-the-art object trackers still relies on the classical intersection overunion (IoU) score. In this work, we argue that the object tracking evaluations based on classical IoU score aresub-optimal. As our first contribution, we theoretically prove that the IoU score is biased in the case of largetarget objects and favors over-estimated target prediction sizes. As our second contribution, we propose a newscore that is unbiased with respect to target prediction size. We systematically evaluate our proposed approachon benchmark tracking data with variations in relative target size. Our empirical results clearly suggest thatthe proposed score is unbiased in general

    An Overview of the Thermal Infrared Visual Object Tracking VOT-TIR2015 Challenge

    No full text
    The Thermal Infrared Visual Object Tracking (VOT-TIR2015) Challenge was organized in conjunction with ICCV2015. It was the first benchmark on short-term,single-target tracking in thermal infrared (TIR) sequences. The challenge aimed at comparing short-term single-object visual trackers that do not apply pre-learned models of object appearance. It was based on the VOT2013 Challenge, but introduced the following novelties: (i) the utilization of the LTIR (Linköping TIR) dataset, (ii) adaption of the VOT2013 attributes to thermal data, (iii) a similar evaluation to that of VOT2015. This paper provides an overview of the VOT-TIR2015 Challenge as well as the results of the 24 participating trackers

    An Overview of the Thermal Infrared Visual Object Tracking VOT-TIR2015 Challenge

    No full text
    The Thermal Infrared Visual Object Tracking (VOT-TIR2015) Challenge was organized in conjunction with ICCV2015. It was the first benchmark on short-term,single-target tracking in thermal infrared (TIR) sequences. The challenge aimed at comparing short-term single-object visual trackers that do not apply pre-learned models of object appearance. It was based on the VOT2013 Challenge, but introduced the following novelties: (i) the utilization of the LTIR (Linköping TIR) dataset, (ii) adaption of the VOT2013 attributes to thermal data, (iii) a similar evaluation to that of VOT2015. This paper provides an overview of the VOT-TIR2015 Challenge as well as the results of the 24 participating trackers

    Accurate Scale Estimation for Robust Visual Tracking

    No full text
    Robust scale estimation is a challenging problem in visual object tracking. Most existing methods fail to handle large scale variations in complex image sequences. This paper presents a novel approach for robust scale estimation in a tracking-by-detection framework. The proposed approach works by learning discriminative correlation filters based on a scale pyramid representation. We learn separate filters for translation and scale estimation, and show that this improves the performance compared to an exhaustive scale search. Our scale estimation approach is generic as it can be incorporated into any tracking method with no inherent scale estimation. Experiments are performed on 28 benchmark sequences with significant scale variations. Our results show that the proposed approach significantly improves the performance by 18.8 % in median distance precision compared to our baseline. Finally, we provide both quantitative and qualitative comparison of our approach with state-of-the-art trackers in literature. The proposed method is shown to outperform the best existing tracker by 16.6 % in median distance precision, while operating at real-time
    corecore