1,002 research outputs found

    Automatic annotation for weakly supervised learning of detectors

    Get PDF
    PhDObject detection in images and action detection in videos are among the most widely studied computer vision problems, with applications in consumer photography, surveillance, and automatic media tagging. Typically, these standard detectors are fully supervised, that is they require a large body of training data where the locations of the objects/actions in images/videos have been manually annotated. With the emergence of digital media, and the rise of high-speed internet, raw images and video are available for little to no cost. However, the manual annotation of object and action locations remains tedious, slow, and expensive. As a result there has been a great interest in training detectors with weak supervision where only the presence or absence of object/action in image/video is needed, not the location. This thesis presents approaches for weakly supervised learning of object/action detectors with a focus on automatically annotating object and action locations in images/videos using only binary weak labels indicating the presence or absence of object/action in images/videos. First, a framework for weakly supervised learning of object detectors in images is presented. In the proposed approach, a variation of multiple instance learning (MIL) technique for automatically annotating object locations in weakly labelled data is presented which, unlike existing approaches, uses inter-class and intra-class cue fusion to obtain the initial annotation. The initial annotation is then used to start an iterative process in which standard object detectors are used to refine the location annotation. Finally, to ensure that the iterative training of detectors do not drift from the object of interest, a scheme for detecting model drift is also presented. Furthermore, unlike most other methods, our weakly supervised approach is evaluated on data without manual pose (object orientation) annotation. Second, an analysis of the initial annotation of objects, using inter-class and intra-class cues, is carried out. From the analysis, a new method based on negative mining (NegMine) is presented for the initial annotation of both object and action data. The NegMine based approach is a much simpler formulation using only inter-class measure and requires no complex combinatorial optimisation but can still meet or outperform existing approaches including the previously pre3 sented inter-intra class cue fusion approach. Furthermore, NegMine can be fused with existing approaches to boost their performance. Finally, the thesis will take a step back and look at the use of generic object detectors as prior knowledge in weakly supervised learning of object detectors. These generic object detectors are typically based on sampling saliency maps that indicate if a pixel belongs to the background or foreground. A new approach to generating saliency maps is presented that, unlike existing approaches, looks beyond the current image of interest and into images similar to the current image. We show that our generic object proposal method can be used by itself to annotate the weakly labelled object data with surprisingly high accuracy

    Automated detection of brain abnormalities in neonatal hypoxia ischemic injury from MR images.

    Get PDF
    We compared the efficacy of three automated brain injury detection methods, namely symmetry-integrated region growing (SIRG), hierarchical region splitting (HRS) and modified watershed segmentation (MWS) in human and animal magnetic resonance imaging (MRI) datasets for the detection of hypoxic ischemic injuries (HIIs). Diffusion weighted imaging (DWI, 1.5T) data from neonatal arterial ischemic stroke (AIS) patients, as well as T2-weighted imaging (T2WI, 11.7T, 4.7T) at seven different time-points (1, 4, 7, 10, 17, 24 and 31 days post HII) in rat-pup model of hypoxic ischemic injury were used to assess the temporal efficacy of our computational approaches. Sensitivity, specificity, and similarity were used as performance metrics based on manual ('gold standard') injury detection to quantify comparisons. When compared to the manual gold standard, automated injury location results from SIRG performed the best in 62% of the data, while 29% for HRS and 9% for MWS. Injury severity detection revealed that SIRG performed the best in 67% cases while 33% for HRS. Prior information is required by HRS and MWS, but not by SIRG. However, SIRG is sensitive to parameter-tuning, while HRS and MWS are not. Among these methods, SIRG performs the best in detecting lesion volumes; HRS is the most robust, while MWS lags behind in both respects

    Object-based 2D-to-3D video conversion for effective stereoscopic content generation in 3D-TV applications

    Get PDF
    Three-dimensional television (3D-TV) has gained increasing popularity in the broadcasting domain, as it enables enhanced viewing experiences in comparison to conventional two-dimensional (2D) TV. However, its application has been constrained due to the lack of essential contents, i.e., stereoscopic videos. To alleviate such content shortage, an economical and practical solution is to reuse the huge media resources that are available in monoscopic 2D and convert them to stereoscopic 3D. Although stereoscopic video can be generated from monoscopic sequences using depth measurements extracted from cues like focus blur, motion and size, the quality of the resulting video may be poor as such measurements are usually arbitrarily defined and appear inconsistent with the real scenes. To help solve this problem, a novel method for object-based stereoscopic video generation is proposed which features i) optical-flow based occlusion reasoning in determining depth ordinal, ii) object segmentation using improved region-growing from masks of determined depth layers, and iii) a hybrid depth estimation scheme using content-based matching (inside a small library of true stereo image pairs) and depth-ordinal based regularization. Comprehensive experiments have validated the effectiveness of our proposed 2D-to-3D conversion method in generating stereoscopic videos of consistent depth measurements for 3D-TV applications

    Going beyond semantic image segmentation, towards holistic scene understanding, with associative hierarchical random fields

    Get PDF
    In this thesis we exploit the generality and expressive power of the Associative Hierarchical Random Field (AHRF) graphical model to take its use beyond that of semantic image segmentation, into object-classes, towards a framework for holistic scene understanding. We provide a working definition for the holistic approach to scene understanding, which allows for the integration of existing, disparate, applications into an unifying ensemble. We believe that modelling such an ensemble as an AHRF is both a principled and pragmatic solution. We present a hierarchy that shows several methods for fusing applications together with the AHRF graphical model. Each of the three; feature, potential and energy, layers subsumes its predecessor in generality and together give rise to many options for integration. With applications on street scenes we demonstrate an implementation of each layer. The first layer application joins appearance and geometric features. For our second layer we implement a things and stuff co-junction using higher order AHRF potentials for object detectors, with the goal of answering the classic questions: What? Where? and How many? A holistic approach to recognition-and-reconstruction is realised within our third layer by linking two energy based formulations of both applications. Each application is evaluated qualitatively and quantitatively. In all cases our holistic approach shows improvement over baseline methods

    Thermo-visual feature fusion for object tracking using multiple spatiogram trackers

    Get PDF
    In this paper, we propose a framework that can efficiently combine features for robust tracking based on fusing the outputs of multiple spatiogram trackers. This is achieved without the exponential increase in storage and processing that other multimodal tracking approaches suffer from. The framework allows the features to be split arbitrarily between the trackers, as well as providing the flexibility to add, remove or dynamically weight features. We derive a mean-shift type algorithm for the framework that allows efficient object tracking with very low computational overhead. We especially target the fusion of thermal infrared and visible spectrum features as the most useful features for automated surveillance applications. Results are shown on multimodal video sequences clearly illustrating the benefits of combining multiple features using our framework

    Inner and Inter Label Propagation: Salient Object Detection in the Wild

    Full text link
    In this paper, we propose a novel label propagation based method for saliency detection. A key observation is that saliency in an image can be estimated by propagating the labels extracted from the most certain background and object regions. For most natural images, some boundary superpixels serve as the background labels and the saliency of other superpixels are determined by ranking their similarities to the boundary labels based on an inner propagation scheme. For images of complex scenes, we further deploy a 3-cue-center-biased objectness measure to pick out and propagate foreground labels. A co-transduction algorithm is devised to fuse both boundary and objectness labels based on an inter propagation scheme. The compactness criterion decides whether the incorporation of objectness labels is necessary, thus greatly enhancing computational efficiency. Results on five benchmark datasets with pixel-wise accurate annotations show that the proposed method achieves superior performance compared with the newest state-of-the-arts in terms of different evaluation metrics.Comment: The full version of the TIP 2015 publicatio

    What makes for effective detection proposals?

    Full text link
    Current top performing object detectors employ detection proposals to guide the search for objects, thereby avoiding exhaustive sliding window search across images. Despite the popularity and widespread use of detection proposals, it is unclear which trade-offs are made when using them during object detection. We provide an in-depth analysis of twelve proposal methods along with four baselines regarding proposal repeatability, ground truth annotation recall on PASCAL, ImageNet, and MS COCO, and their impact on DPM, R-CNN, and Fast R-CNN detection performance. Our analysis shows that for object detection improving proposal localisation accuracy is as important as improving recall. We introduce a novel metric, the average recall (AR), which rewards both high recall and good localisation and correlates surprisingly well with detection performance. Our findings show common strengths and weaknesses of existing methods, and provide insights and metrics for selecting and tuning proposal methods.Comment: TPAMI final version, duplicate proposals removed in experiment
    corecore