17 research outputs found
ABC Easy as 123: A Blind Counter for Exemplar-Free Multi-Class Class-agnostic Counting
Class-agnostic counting methods enumerate objects of an arbitrary class,
providing tremendous utility in many fields. Prior works have limited
usefulness as they require either a set of examples of the type to be counted
or that the image contains only a single type of object. A significant factor
in these shortcomings is the lack of a dataset to properly address counting in
settings with more than one kind of object present. To address these issues, we
propose the first Multi-class, Class-Agnostic Counting dataset (MCAC) and A
Blind Counter (ABC123), a method that can count multiple types of objects
simultaneously without using examples of type during training or inference.
ABC123 introduces a new paradigm where instead of requiring exemplars to guide
the enumeration, examples are found after the counting stage to help a user
understand the generated outputs. We show that ABC123 outperforms contemporary
methods on MCAC without the requirement of human in-the-loop annotations. We
also show that this performance transfers to FSC-147, the standard
class-agnostic counting dataset
DMS: Differentiable Mean Shift for Dataset Agnostic Task Specific Clustering Using Side Information
We present a novel approach, in which we learn to cluster data directly from
side information, in the form of a small set of pairwise examples. Unlike
previous methods, with or without side information, we do not need to know the
number of clusters, their centers or any kind of distance metric for
similarity. Our method is able to divide the same data points in various ways
dependant on the needs of a specific task, defined by the side information.
Contrastingly, other work generally finds only the intrinsic, most obvious,
clusters. Inspired by the mean shift algorithm, we implement our new clustering
approach using a custom iterative neural network to create Differentiable Mean
Shift (DMS), a state of the art, dataset agnostic, clustering method. We found
that it was possible to train a strong cluster definition without enforcing a
constraint that each cluster must be presented during training. DMS outperforms
current methods in both the intrinsic and non-intrinsic dataset tasks
ABC easy as 123: a blind counter for exemplar-free multi-class class-agnostic counting
Class-agnostic counting methods enumerate objects of an arbitrary class, providing tremendous utility in many fields. Prior works have limited usefulness as they require either a set of examples of the type to be counted or that the image contains only a single type of object. A significant factor in these shortcomings is the lack of a dataset to properly address counting in settings with more than one kind of object present. To address these issues, we propose the first Multi-class, Class-Agnostic Counting dataset (MCAC) and A Blind Counter (ABC123), a method that can count multiple types of objects simultaneously without using examples of type during training or inference. ABC123 introduces a new paradigm where instead of requiring exemplars to guide the enumeration, examples are found after the counting stage to help a user understand the generated outputs. We show that ABC123 outperforms contemporary methods on MCAC without the requirement of human in-the-loop annotations. We also show that this performance transfers to FSC-147, the standard class-agnostic counting dataset
Real-Time RGB-D Camera Pose Estimation in Novel Scenes using a Relocalisation Cascade
Camera pose estimation is an important problem in computer vision. Common
techniques either match the current image against keyframes with known poses,
directly regress the pose, or establish correspondences between keypoints in
the image and points in the scene to estimate the pose. In recent years,
regression forests have become a popular alternative to establish such
correspondences. They achieve accurate results, but have traditionally needed
to be trained offline on the target scene, preventing relocalisation in new
environments. Recently, we showed how to circumvent this limitation by adapting
a pre-trained forest to a new scene on the fly. The adapted forests achieved
relocalisation performance that was on par with that of offline forests, and
our approach was able to estimate the camera pose in close to real time. In
this paper, we present an extension of this work that achieves significantly
better relocalisation performance whilst running fully in real time. To achieve
this, we make several changes to the original approach: (i) instead of
accepting the camera pose hypothesis without question, we make it possible to
score the final few hypotheses using a geometric approach and select the most
promising; (ii) we chain several instantiations of our relocaliser together in
a cascade, allowing us to try faster but less accurate relocalisation first,
only falling back to slower, more accurate relocalisation as necessary; and
(iii) we tune the parameters of our cascade to achieve effective overall
performance. These changes allow us to significantly improve upon the
performance our original state-of-the-art method was able to achieve on the
well-known 7-Scenes and Stanford 4 Scenes benchmarks. As additional
contributions, we present a way of visualising the internal behaviour of our
forests and show how to entirely circumvent the need to pre-train a forest on a
generic scene.Comment: Tommaso Cavallari, Stuart Golodetz, Nicholas Lord and Julien Valentin
assert joint first authorshi
Cos R-CNN for online few-shot object detection
We propose Cos R-CNN, a simple exemplar-based R-CNN formulation that is designed for online few-shot object detection. That is, it is able to localise and classify novel object categories in images with few examples without fine-tuning. Cos R-CNN frames detection as a learning-to-compare task: unseen classes are represented as exemplar images, and objects are detected based on their similarity to these exemplars. The cosine-based classification head allows for dynamic adaptation of classification parameters to the exemplar embedding, and encourages the clustering of similar classes in embedding space without the need for manual tuning of distance-metric hyperparameters. This simple formulation achieves best results on the recently proposed 5-way ImageNet few-shot detection benchmark, beating the online 1/5/10-shot scenarios by more than 8/3/1%, as well as performing up to 20% better in online 20-way few-shot VOC across all shots on novel classes
Real-time RGB-D camera pose estimation in novel scenes using a relocalisation cascade
Camera pose estimation is an important problem in computer vision. Common techniques either match the current image against keyframes with known poses, directly regress the pose, or establish correspondences between keypoints in the image and points in the scene to estimate the pose. In recent years, regression forests have become a popular alternative to establish such correspondences. They achieve accurate results, but have traditionally needed to be trained offline on the target scene, preventing relocalisation in new environments. Recently, we showed how to circumvent this limitation by adapting a pre-trained forest to a new scene on the fly. The adapted forests achieved relocalisation performance that was on par with that of offline forests, and our approach was able to estimate the camera pose in close to real time. In this paper, we present an extension of this work that achieves significantly better relocalisation performance whilst running fully in real time. To achieve this, we make several changes to the original approach: (i) instead of accepting the camera pose hypothesis without question, we make it possible to score the final few hypotheses using a geometric approach and select the most promising; (ii) we chain several instantiations of our relocaliser together in a cascade, allowing us to try faster but less accurate relocalisation first, only falling back to slower, more accurate relocalisation as necessary; and (iii) we tune the parameters of our cascade to achieve effective overall performance. These changes allow us to significantly improve upon the performance our original state-of-the-art method was able to achieve on the well-known 7-Scenes and Stanford 4 Scenes benchmarks. As additional contributions, we present a way of visualising the internal behaviour of our forests and show how to entirely circumvent the need to pre-train a forest on a generic scene
Semi-weakly-supervised neural network training for medical image registration
For training registration networks, weak supervision from segmented
corresponding regions-of-interest (ROIs) have been proven effective for (a)
supplementing unsupervised methods, and (b) being used independently in
registration tasks in which unsupervised losses are unavailable or ineffective.
This correspondence-informing supervision entails cost in annotation that
requires significant specialised effort. This paper describes a
semi-weakly-supervised registration pipeline that improves the model
performance, when only a small corresponding-ROI-labelled dataset is available,
by exploiting unlabelled image pairs. We examine two types of augmentation
methods by perturbation on network weights and image resampling, such that
consistency-based unsupervised losses can be applied on unlabelled data. The
novel WarpDDF and RegCut approaches are proposed to allow commutative
perturbation between an image pair and the predicted spatial transformation
(i.e. respective input and output of registration networks), distinct from
existing perturbation methods for classification or segmentation. Experiments
using 589 male pelvic MR images, labelled with eight anatomical ROIs, show the
improvement in registration performance and the ablated contributions from the
individual strategies. Furthermore, this study attempts to construct one of the
first computational atlases for pelvic structures, enabled by registering
inter-subject MRs, and quantifies the significant differences due to the
proposed semi-weak supervision with a discussion on the potential clinical use
of example atlas-derived statistics
Prototypical few-shot segmentation for cross-institution male pelvic structures with spatial registration
The prowess that makes few-shot learning desirable in medical image analysis is the
efficient use of the support image data, which are labelled to classify or segment new
classes, a task that otherwise requires substantially more training images and expert
annotations. This work describes a fully 3D prototypical few-shot segmentation algorithm, such that the trained networks can be effectively adapted to clinically interesting
structures that are absent in training, using only a few labelled images from a different
institute. First, to compensate for the widely recognised spatial variability between institutions in episodic adaptation of novel classes, a novel spatial registration mechanism
is integrated into prototypical learning, consisting of a segmentation head and an spatial alignment module. Second, to assist the training with observed imperfect alignment,
support mask conditioning module is proposed to further utilise the annotation available
from the support images. Extensive experiments are presented in an application of segmenting eight anatomical structures important for interventional planning, using a data
set of 589 pelvic T2-weighted MR images, acquired at seven institutes. The results
demonstrate the efficacy in each of the 3D formulation, the spatial registration, and the
support mask conditioning, all of which made positive contributions independently or
collectively. Compared with the previously proposed 2D alternatives, the few-shot segmentation performance was improved with statistical significance, regardless whether
the support data come from the same or different institutes