11 research outputs found
ABC Easy as 123: A Blind Counter for Exemplar-Free Multi-Class Class-agnostic Counting
Class-agnostic counting methods enumerate objects of an arbitrary class,
providing tremendous utility in many fields. Prior works have limited
usefulness as they require either a set of examples of the type to be counted
or that the image contains only a single type of object. A significant factor
in these shortcomings is the lack of a dataset to properly address counting in
settings with more than one kind of object present. To address these issues, we
propose the first Multi-class, Class-Agnostic Counting dataset (MCAC) and A
Blind Counter (ABC123), a method that can count multiple types of objects
simultaneously without using examples of type during training or inference.
ABC123 introduces a new paradigm where instead of requiring exemplars to guide
the enumeration, examples are found after the counting stage to help a user
understand the generated outputs. We show that ABC123 outperforms contemporary
methods on MCAC without the requirement of human in-the-loop annotations. We
also show that this performance transfers to FSC-147, the standard
class-agnostic counting dataset
DMS: Differentiable Mean Shift for Dataset Agnostic Task Specific Clustering Using Side Information
We present a novel approach, in which we learn to cluster data directly from
side information, in the form of a small set of pairwise examples. Unlike
previous methods, with or without side information, we do not need to know the
number of clusters, their centers or any kind of distance metric for
similarity. Our method is able to divide the same data points in various ways
dependant on the needs of a specific task, defined by the side information.
Contrastingly, other work generally finds only the intrinsic, most obvious,
clusters. Inspired by the mean shift algorithm, we implement our new clustering
approach using a custom iterative neural network to create Differentiable Mean
Shift (DMS), a state of the art, dataset agnostic, clustering method. We found
that it was possible to train a strong cluster definition without enforcing a
constraint that each cluster must be presented during training. DMS outperforms
current methods in both the intrinsic and non-intrinsic dataset tasks
Real-Time RGB-D Camera Pose Estimation in Novel Scenes using a Relocalisation Cascade
Camera pose estimation is an important problem in computer vision. Common
techniques either match the current image against keyframes with known poses,
directly regress the pose, or establish correspondences between keypoints in
the image and points in the scene to estimate the pose. In recent years,
regression forests have become a popular alternative to establish such
correspondences. They achieve accurate results, but have traditionally needed
to be trained offline on the target scene, preventing relocalisation in new
environments. Recently, we showed how to circumvent this limitation by adapting
a pre-trained forest to a new scene on the fly. The adapted forests achieved
relocalisation performance that was on par with that of offline forests, and
our approach was able to estimate the camera pose in close to real time. In
this paper, we present an extension of this work that achieves significantly
better relocalisation performance whilst running fully in real time. To achieve
this, we make several changes to the original approach: (i) instead of
accepting the camera pose hypothesis without question, we make it possible to
score the final few hypotheses using a geometric approach and select the most
promising; (ii) we chain several instantiations of our relocaliser together in
a cascade, allowing us to try faster but less accurate relocalisation first,
only falling back to slower, more accurate relocalisation as necessary; and
(iii) we tune the parameters of our cascade to achieve effective overall
performance. These changes allow us to significantly improve upon the
performance our original state-of-the-art method was able to achieve on the
well-known 7-Scenes and Stanford 4 Scenes benchmarks. As additional
contributions, we present a way of visualising the internal behaviour of our
forests and show how to entirely circumvent the need to pre-train a forest on a
generic scene.Comment: Tommaso Cavallari, Stuart Golodetz, Nicholas Lord and Julien Valentin
assert joint first authorshi
Prototypical few-shot segmentation for cross-institution male pelvic structures with spatial registration
The prowess that makes few-shot learning desirable in medical image analysis is the
efficient use of the support image data, which are labelled to classify or segment new
classes, a task that otherwise requires substantially more training images and expert
annotations. This work describes a fully 3D prototypical few-shot segmentation algorithm, such that the trained networks can be effectively adapted to clinically interesting
structures that are absent in training, using only a few labelled images from a different
institute. First, to compensate for the widely recognised spatial variability between institutions in episodic adaptation of novel classes, a novel spatial registration mechanism
is integrated into prototypical learning, consisting of a segmentation head and an spatial alignment module. Second, to assist the training with observed imperfect alignment,
support mask conditioning module is proposed to further utilise the annotation available
from the support images. Extensive experiments are presented in an application of segmenting eight anatomical structures important for interventional planning, using a data
set of 589 pelvic T2-weighted MR images, acquired at seven institutes. The results
demonstrate the efficacy in each of the 3D formulation, the spatial registration, and the
support mask conditioning, all of which made positive contributions independently or
collectively. Compared with the previously proposed 2D alternatives, the few-shot segmentation performance was improved with statistical significance, regardless whether
the support data come from the same or different institutes
Approximating continuous convolutions for deep network compression
We present ApproxConv, a novel method for compressing the layers of a convolutional neural network. Reframing conventional discrete convolution as continuous convolution of parametrised functions over space, we use functional approximations to capture the essential structures of CNN filters with fewer parameters than conventional operations. Our method is able to reduce the size of trained CNN layers requiring only a small amount of fine-tuning. We show that our method is able to compress existing deep network models by half whilst losing only 1.86% accuracy. Further, we demonstrate that our method is compatible with other compression methods like quantisation allowing for further reductions in model size