1,002 research outputs found
Automatic annotation for weakly supervised learning of detectors
PhDObject detection in images and action detection in videos are among the most widely studied
computer vision problems, with applications in consumer photography, surveillance, and automatic
media tagging. Typically, these standard detectors are fully supervised, that is they require
a large body of training data where the locations of the objects/actions in images/videos have
been manually annotated. With the emergence of digital media, and the rise of high-speed internet,
raw images and video are available for little to no cost. However, the manual annotation
of object and action locations remains tedious, slow, and expensive. As a result there has been
a great interest in training detectors with weak supervision where only the presence or absence
of object/action in image/video is needed, not the location. This thesis presents approaches for
weakly supervised learning of object/action detectors with a focus on automatically annotating
object and action locations in images/videos using only binary weak labels indicating the presence
or absence of object/action in images/videos.
First, a framework for weakly supervised learning of object detectors in images is presented.
In the proposed approach, a variation of multiple instance learning (MIL) technique for automatically
annotating object locations in weakly labelled data is presented which, unlike existing
approaches, uses inter-class and intra-class cue fusion to obtain the initial annotation. The initial
annotation is then used to start an iterative process in which standard object detectors are used to
refine the location annotation. Finally, to ensure that the iterative training of detectors do not drift
from the object of interest, a scheme for detecting model drift is also presented. Furthermore,
unlike most other methods, our weakly supervised approach is evaluated on data without manual
pose (object orientation) annotation.
Second, an analysis of the initial annotation of objects, using inter-class and intra-class cues,
is carried out. From the analysis, a new method based on negative mining (NegMine) is presented
for the initial annotation of both object and action data. The NegMine based approach is a
much simpler formulation using only inter-class measure and requires no complex combinatorial
optimisation but can still meet or outperform existing approaches including the previously pre3
sented inter-intra class cue fusion approach. Furthermore, NegMine can be fused with existing
approaches to boost their performance.
Finally, the thesis will take a step back and look at the use of generic object detectors as prior
knowledge in weakly supervised learning of object detectors. These generic object detectors are
typically based on sampling saliency maps that indicate if a pixel belongs to the background
or foreground. A new approach to generating saliency maps is presented that, unlike existing
approaches, looks beyond the current image of interest and into images similar to the current
image. We show that our generic object proposal method can be used by itself to annotate the
weakly labelled object data with surprisingly high accuracy
Automated detection of brain abnormalities in neonatal hypoxia ischemic injury from MR images.
We compared the efficacy of three automated brain injury detection methods, namely symmetry-integrated region growing (SIRG), hierarchical region splitting (HRS) and modified watershed segmentation (MWS) in human and animal magnetic resonance imaging (MRI) datasets for the detection of hypoxic ischemic injuries (HIIs). Diffusion weighted imaging (DWI, 1.5T) data from neonatal arterial ischemic stroke (AIS) patients, as well as T2-weighted imaging (T2WI, 11.7T, 4.7T) at seven different time-points (1, 4, 7, 10, 17, 24 and 31 days post HII) in rat-pup model of hypoxic ischemic injury were used to assess the temporal efficacy of our computational approaches. Sensitivity, specificity, and similarity were used as performance metrics based on manual ('gold standard') injury detection to quantify comparisons. When compared to the manual gold standard, automated injury location results from SIRG performed the best in 62% of the data, while 29% for HRS and 9% for MWS. Injury severity detection revealed that SIRG performed the best in 67% cases while 33% for HRS. Prior information is required by HRS and MWS, but not by SIRG. However, SIRG is sensitive to parameter-tuning, while HRS and MWS are not. Among these methods, SIRG performs the best in detecting lesion volumes; HRS is the most robust, while MWS lags behind in both respects
Object-based 2D-to-3D video conversion for effective stereoscopic content generation in 3D-TV applications
Three-dimensional television (3D-TV) has gained increasing popularity in the broadcasting domain, as it enables enhanced viewing experiences in comparison to conventional two-dimensional (2D) TV. However, its application has been constrained due to the lack of essential contents, i.e., stereoscopic videos. To alleviate such content shortage, an economical and practical solution is to reuse the huge media resources that are available in monoscopic 2D and convert them to stereoscopic 3D. Although stereoscopic video can be generated from monoscopic sequences using depth measurements extracted from cues like focus blur, motion and size, the quality of the resulting video may be poor as such measurements are usually arbitrarily defined and appear inconsistent with the real scenes. To help solve this problem, a novel method for object-based stereoscopic video generation is proposed which features i) optical-flow based occlusion reasoning in determining depth ordinal, ii) object segmentation using improved region-growing from masks of determined depth layers, and iii) a hybrid depth estimation scheme using content-based matching (inside a small library of true stereo image pairs) and depth-ordinal based regularization. Comprehensive experiments have validated the effectiveness of our proposed 2D-to-3D conversion method in generating stereoscopic videos of consistent depth measurements for 3D-TV applications
Going beyond semantic image segmentation, towards holistic scene understanding, with associative hierarchical random fields
In this thesis we exploit the generality and expressive power of the Associative Hierarchical
Random Field (AHRF) graphical model to take its use beyond that of semantic image segmentation,
into object-classes, towards a framework for holistic scene understanding. We provide a
working definition for the holistic approach to scene understanding, which allows for the integration
of existing, disparate, applications into an unifying ensemble. We believe that modelling
such an ensemble as an AHRF is both a principled and pragmatic solution. We present a hierarchy
that shows several methods for fusing applications together with the AHRF graphical model.
Each of the three; feature, potential and energy, layers subsumes its predecessor in generality
and together give rise to many options for integration. With applications on street scenes we
demonstrate an implementation of each layer. The first layer application joins appearance and
geometric features. For our second layer we implement a things and stuff co-junction using
higher order AHRF potentials for object detectors, with the goal of answering the classic questions:
What? Where? and How many? A holistic approach to recognition-and-reconstruction
is realised within our third layer by linking two energy based formulations of both applications.
Each application is evaluated qualitatively and quantitatively. In all cases our holistic approach
shows improvement over baseline methods
Thermo-visual feature fusion for object tracking using multiple spatiogram trackers
In this paper, we propose a framework that can efficiently combine features for robust tracking based on fusing the outputs of multiple spatiogram trackers. This is achieved without the exponential increase in storage and processing that other multimodal tracking approaches suffer from. The framework allows the features to be split arbitrarily between the trackers, as well as providing the flexibility to add, remove or dynamically weight features. We derive a mean-shift type algorithm for the framework that allows efficient object tracking with very low computational overhead. We especially target the fusion of thermal infrared and visible spectrum features as the most useful features for automated surveillance applications. Results are shown on multimodal video sequences clearly illustrating the benefits of combining multiple features using our framework
Inner and Inter Label Propagation: Salient Object Detection in the Wild
In this paper, we propose a novel label propagation based method for saliency
detection. A key observation is that saliency in an image can be estimated by
propagating the labels extracted from the most certain background and object
regions. For most natural images, some boundary superpixels serve as the
background labels and the saliency of other superpixels are determined by
ranking their similarities to the boundary labels based on an inner propagation
scheme. For images of complex scenes, we further deploy a 3-cue-center-biased
objectness measure to pick out and propagate foreground labels. A
co-transduction algorithm is devised to fuse both boundary and objectness
labels based on an inter propagation scheme. The compactness criterion decides
whether the incorporation of objectness labels is necessary, thus greatly
enhancing computational efficiency. Results on five benchmark datasets with
pixel-wise accurate annotations show that the proposed method achieves superior
performance compared with the newest state-of-the-arts in terms of different
evaluation metrics.Comment: The full version of the TIP 2015 publicatio
What makes for effective detection proposals?
Current top performing object detectors employ detection proposals to guide
the search for objects, thereby avoiding exhaustive sliding window search
across images. Despite the popularity and widespread use of detection
proposals, it is unclear which trade-offs are made when using them during
object detection. We provide an in-depth analysis of twelve proposal methods
along with four baselines regarding proposal repeatability, ground truth
annotation recall on PASCAL, ImageNet, and MS COCO, and their impact on DPM,
R-CNN, and Fast R-CNN detection performance. Our analysis shows that for object
detection improving proposal localisation accuracy is as important as improving
recall. We introduce a novel metric, the average recall (AR), which rewards
both high recall and good localisation and correlates surprisingly well with
detection performance. Our findings show common strengths and weaknesses of
existing methods, and provide insights and metrics for selecting and tuning
proposal methods.Comment: TPAMI final version, duplicate proposals removed in experiment
Activity profiling for minimally invasive surgery
Imperial Users onl
- …