Search CORE

182 research outputs found

Straight to Shapes: Real-time Detection of Encoded Shapes

Author: Golodetz Stuart
Jetley Saumya
Sapienza Michael
Torr Philip H. S.
Publication venue
Publication date: 05/07/2017
Field of study

Current object detection approaches predict bounding boxes, but these provide little instance-specific information beyond location, scale and aspect ratio. In this work, we propose to directly regress to objects' shapes in addition to their bounding boxes and categories. It is crucial to find an appropriate shape representation that is compact and decodable, and in which objects can be compared for higher-order concepts such as view similarity, pose variation and occlusion. To achieve this, we use a denoising convolutional auto-encoder to establish an embedding space, and place the decoder after a fast end-to-end network trained to regress directly to the encoded shape vectors. This yields what to the best of our knowledge is the first real-time shape prediction network, running at ~35 FPS on a high-end desktop. With higher-order shape reasoning well-integrated into the network pipeline, the network shows the useful practical quality of generalising to unseen categories similar to the ones in the training set, something that most existing approaches fail to handle.Comment: 16 pages including appendix; Published at CVPR 201

arXiv.org e-Print Archive

Crossref

Staple: Complementary Learners for Real-Time Tracking

Author: Bertinetto Luca
Golodetz Stuart
Miksik Ondrej
Torr Philip
Valmadre Jack
Publication venue
Publication date: 01/01/2016
Field of study

Correlation Filter-based trackers have recently achieved excellent performance, showing great robustness to challenging situations exhibiting motion blur and illumination changes. However, since the model that they learn depends strongly on the spatial layout of the tracked object, they are notoriously sensitive to deformation. Models based on colour statistics have complementary traits: they cope well with variation in shape, but suffer when illumination is not consistent throughout a sequence. Moreover, colour distributions alone can be insufficiently discriminative. In this paper, we show that a simple tracker combining complementary cues in a ridge regression framework can operate faster than 80 FPS and outperform not only all entries in the popular VOT14 competition, but also recent and far more sophisticated trackers according to multiple benchmarks.Comment: To appear in CVPR 201

arXiv.org e-Print Archive

CiteSeerX

Crossref

Oxford University Research Archive

R $^3$ SGM: Real-time Raster-Respecting Semi-Global Matching for Power-Constrained Systems

Author: Cavallari Tommaso
Golodetz Stuart
Rahnama Oscar
Torr Philip H. S.
Walker Simon
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 30/10/2018
Field of study

Stereo depth estimation is used for many computer vision applications. Though many popular methods strive solely for depth quality, for real-time mobile applications (e.g. prosthetic glasses or micro-UAVs), speed and power efficiency are equally, if not more, important. Many real-world systems rely on Semi-Global Matching (SGM) to achieve a good accuracy vs. speed balance, but power efficiency is hard to achieve with conventional hardware, making the use of embedded devices such as FPGAs attractive for low-power applications. However, the full SGM algorithm is ill-suited to deployment on FPGAs, and so most FPGA variants of it are partial, at the expense of accuracy. In a non-FPGA context, the accuracy of SGM has been improved by More Global Matching (MGM), which also helps tackle the streaking artifacts that afflict SGM. In this paper, we propose a novel, resource-efficient method that is inspired by MGM's techniques for improving depth quality, but which can be implemented to run in real time on a low-power FPGA. Through evaluation on multiple datasets (KITTI and Middlebury), we show that in comparison to other real-time capable stereo approaches, we can achieve a state-of-the-art balance between accuracy, power efficiency and speed, making our approach highly desirable for use in real-time systems with limited power.Comment: Accepted in FPT 2018 as Oral presentation, 8 pages, 6 figures, 4 table

arXiv.org e-Print Archive

Crossref

InfiniTAM v3: A Framework for Large-Scale 3D Reconstruction with Loop Closure

Author: Cavallari Tommaso
Golodetz Stuart
Kähler Olaf
Murray David W
Prisacariu Victor Adrian
Sapienza Michael
Torr Philip H S
Publication venue
Publication date: 02/08/2017
Field of study

Volumetric models have become a popular representation for 3D scenes in recent years. One breakthrough leading to their popularity was KinectFusion, which focuses on 3D reconstruction using RGB-D sensors. However, monocular SLAM has since also been tackled with very similar approaches. Representing the reconstruction volumetrically as a TSDF leads to most of the simplicity and efficiency that can be achieved with GPU implementations of these systems. However, this representation is memory-intensive and limits applicability to small-scale reconstructions. Several avenues have been explored to overcome this. With the aim of summarizing them and providing for a fast, flexible 3D reconstruction pipeline, we propose a new, unifying framework called InfiniTAM. The idea is that steps like camera tracking, scene representation and integration of new data can easily be replaced and adapted to the user's needs. This report describes the technical implementation details of InfiniTAM v3, the third version of our InfiniTAM system. We have added various new features, as well as making numerous enhancements to the low-level code that significantly improve our camera tracking performance. The new features that we expect to be of most interest are (i) a robust camera tracking module; (ii) an implementation of Glocker et al.'s keyframe-based random ferns camera relocaliser; (iii) a novel approach to globally-consistent TSDF-based reconstruction, based on dividing the scene into rigid submaps and optimising the relative poses between them; and (iv) an implementation of Keller et al.'s surfel-based reconstruction approach.Comment: This article largely supersedes arxiv:1410.0925 (it describes version 3 of the InfiniTAM framework

arXiv.org e-Print Archive

Oxford University Research Archive

Real-Time RGB-D Camera Pose Estimation in Novel Scenes using a Relocalisation Cascade

Author: Cavallari Tommaso
Di Stefano Luigi
Golodetz Stuart
Lord Nicholas A.
Prisacariu Victor A.
Torr Philip H. S.
Valentin Julien
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2019
Field of study

Camera pose estimation is an important problem in computer vision. Common techniques either match the current image against keyframes with known poses, directly regress the pose, or establish correspondences between keypoints in the image and points in the scene to estimate the pose. In recent years, regression forests have become a popular alternative to establish such correspondences. They achieve accurate results, but have traditionally needed to be trained offline on the target scene, preventing relocalisation in new environments. Recently, we showed how to circumvent this limitation by adapting a pre-trained forest to a new scene on the fly. The adapted forests achieved relocalisation performance that was on par with that of offline forests, and our approach was able to estimate the camera pose in close to real time. In this paper, we present an extension of this work that achieves significantly better relocalisation performance whilst running fully in real time. To achieve this, we make several changes to the original approach: (i) instead of accepting the camera pose hypothesis without question, we make it possible to score the final few hypotheses using a geometric approach and select the most promising; (ii) we chain several instantiations of our relocaliser together in a cascade, allowing us to try faster but less accurate relocalisation first, only falling back to slower, more accurate relocalisation as necessary; and (iii) we tune the parameters of our cascade to achieve effective overall performance. These changes allow us to significantly improve upon the performance our original state-of-the-art method was able to achieve on the well-known 7-Scenes and Stanford 4 Scenes benchmarks. As additional contributions, we present a way of visualising the internal behaviour of our forests and show how to entirely circumvent the need to pre-train a forest on a generic scene.Comment: Tommaso Cavallari, Stuart Golodetz, Nicholas Lord and Julien Valentin assert joint first authorshi

arXiv.org e-Print Archive

Archivio istituzionale della ricerca - Alma Mater Studiorum Università di Bologna

Oxford University Research Archive

Calibrating Deep Neural Networks using Focal Loss

Author: Dokania Puneet K.
Golodetz Stuart
Kulharia Viveka
Mukhoti Jishnu
Sanyal Amartya
Torr Philip H. S.
Publication venue
Publication date: 01/01/2020
Field of study

Miscalibration -- a mismatch between a model's confidence and its correctness -- of Deep Neural Networks (DNNs) makes their predictions hard to rely on. Ideally, we want networks to be accurate, calibrated and confident. We show that, as opposed to the standard cross-entropy loss, focal loss (Lin et al., 2017) allows us to learn models that are already very well calibrated. When combined with temperature scaling, whilst preserving accuracy, it yields state-of-the-art calibrated models. We provide a thorough analysis of the factors causing miscalibration, and use the insights we glean from this to justify the empirically excellent performance of focal loss. To facilitate the use of focal loss in practice, we also provide a principled approach to automatically select the hyperparameter involved in the loss function. We perform extensive experiments on a variety of computer vision and NLP datasets, and with a wide variety of network architectures, and show that our approach achieves state-of-the-art accuracy and calibration in almost all cases

arXiv.org e-Print Archive

Oxford University Research Archive

Imagining the impossible before breakfast: the relation between creativity, dissociation, and sleep

Author: Cosgrave Jan
Golodetz Stuart
Haines Ross
Lynn Steven J.
Merckelbach Harald
van Heugten-van der Kloet Dalena
Publication venue: The Open Repository @ Binghamton (The ORB)
Publication date: 01/03/2015
Field of study

Dissociative symptoms have been related to higher rapid eye movement sleep density, a sleep phase during which hyperassociativity may occur. This may enhance artistic creativity during the day. To test this hypothesis, we conducted a creative photo contest to explore the relation between dissociation, sleep, and creativity. During the contest, participants (N = 72) took one photo per day for five consecutive days, based on specific daily themes (consisting of single words) and the instruction to take as creative a photo as possible each day. Furthermore, they completed daily measures of state dissociation and a short sleep diary The photos and their captions were ranked by two professional photographers and two clinical psychologists based on creativity, originality, bizarreness, and quality. We expected that dissociative people would rank higher in the contest compared with low-dissociative participants, and that the most original photos would be taken on days when the participants scored highest on acute dissociation. We found that acute dissociation predicted a higher ranking on creativity. Poorer sleep quality and fewer hours of sleep predicted more bizarreness in the photos and captions. None of the trait measures could predict creativity. In sum, acute dissociation related to enhanced creativity. These findings contribute to our understanding of dissociative symptomatology

The Open Repository @Binghamton (The ORB)