237,825 research outputs found

    Segmentation Driven Object Detection with Fisher Vectors

    Get PDF
    International audienceWe present an object detection system based on the Fisher vector (FV) image representation computed over SIFT and color descriptors. For computational and storage efficiency, we use a recent segmentation-based method to generate class-independent object detection hypotheses, in combination with data compression techniques. Our main contribution is a method to produce tentative object segmentation masks to suppress background clutter in the features. Re-weighting the local image features based on these masks is shown to improve object detection significantly. We also exploit contextual features in the form of a full-image FV descriptor, and an inter-category rescoring mechanism. Our experiments on the VOC 2007 and 2010 datasets show that our detector improves over the current state-of-the-art detection results

    SemanticLoop: loop closure with 3D semantic graph matching

    Full text link
    Loop closure can effectively correct the accumulated error in robot localization, which plays a critical role in the long-term navigation of the robot. Traditional appearance-based methods rely on local features and are prone to failure in ambiguous environments. On the other hand, object recognition can infer objects' category, pose, and extent. These objects can serve as stable semantic landmarks for viewpoint-independent and non-ambiguous loop closure. However, there is a critical object-level data association problem due to the lack of efficient and robust algorithms. We introduce a novel object-level data association algorithm, which incorporates IoU, instance-level embedding, and detection uncertainty, formulated as a linear assignment problem. Then, we model the objects as TSDF volumes and represent the environment as a 3D graph with semantics and topology. Next, we propose a graph matching-based loop detection based on the reconstructed 3D semantic graphs and correct the accumulated error by aligning the matched objects. Finally, we refine the object poses and camera trajectory in an object-level pose graph optimization. Experimental results show that the proposed object-level data association method significantly outperforms the commonly used nearest-neighbor method in accuracy. Our graph matching-based loop closure is more robust to environmental appearance changes than existing appearance-based methods

    Subspace discovery for video anomaly detection

    Get PDF
    PhDIn automated video surveillance anomaly detection is a challenging task. We address this task as a novelty detection problem where pattern description is limited and labelling information is available only for a small sample of normal instances. Classification under these conditions is prone to over-fitting. The contribution of this work is to propose a novel video abnormality detection method that does not need object detection and tracking. The method is based on subspace learning to discover a subspace where abnormality detection is easier to perform, without the need of detailed annotation and description of these patterns. The problem is formulated as one-class classification utilising a low dimensional subspace, where a novelty classifier is used to learn normal actions automatically and then to detect abnormal actions from low-level features extracted from a region of interest. The subspace is discovered (using both labelled and unlabelled data) by a locality preserving graph-based algorithm that utilises the Graph Laplacian of a specially designed parameter-less nearest neighbour graph. The methodology compares favourably with alternative subspace learning algorithms (both linear and non-linear) and direct one-class classification schemes commonly used for off-line abnormality detection in synthetic and real data. Based on these findings, the framework is extended to on-line abnormality detection in video sequences, utilising multiple independent detectors deployed over the image frame to learn the local normal patterns and infer abnormality for the complete scene. The method is compared with an alternative linear method to establish advantages and limitations in on-line abnormality detection scenarios. Analysis shows that the alternative approach is better suited for cases where the subspace learning is restricted on the labelled samples, while in the presence of additional unlabelled data the proposed approach using graph-based subspace learning is more appropriate

    Grid Loss: Detecting Occluded Faces

    Full text link
    Detection of partially occluded objects is a challenging computer vision problem. Standard Convolutional Neural Network (CNN) detectors fail if parts of the detection window are occluded, since not every sub-part of the window is discriminative on its own. To address this issue, we propose a novel loss layer for CNNs, named grid loss, which minimizes the error rate on sub-blocks of a convolution layer independently rather than over the whole feature map. This results in parts being more discriminative on their own, enabling the detector to recover if the detection window is partially occluded. By mapping our loss layer back to a regular fully connected layer, no additional computational cost is incurred at runtime compared to standard CNNs. We demonstrate our method for face detection on several public face detection benchmarks and show that our method outperforms regular CNNs, is suitable for realtime applications and achieves state-of-the-art performance.Comment: accepted to ECCV 201

    Occlusion Coherence: Detecting and Localizing Occluded Faces

    Full text link
    The presence of occluders significantly impacts object recognition accuracy. However, occlusion is typically treated as an unstructured source of noise and explicit models for occluders have lagged behind those for object appearance and shape. In this paper we describe a hierarchical deformable part model for face detection and landmark localization that explicitly models part occlusion. The proposed model structure makes it possible to augment positive training data with large numbers of synthetically occluded instances. This allows us to easily incorporate the statistics of occlusion patterns in a discriminatively trained model. We test the model on several benchmarks for landmark localization and detection including challenging new data sets featuring significant occlusion. We find that the addition of an explicit occlusion model yields a detection system that outperforms existing approaches for occluded instances while maintaining competitive accuracy in detection and landmark localization for unoccluded instances

    Backtracking Spatial Pyramid Pooling (SPP)-based Image Classifier for Weakly Supervised Top-down Salient Object Detection

    Full text link
    Top-down saliency models produce a probability map that peaks at target locations specified by a task/goal such as object detection. They are usually trained in a fully supervised setting involving pixel-level annotations of objects. We propose a weakly supervised top-down saliency framework using only binary labels that indicate the presence/absence of an object in an image. First, the probabilistic contribution of each image region to the confidence of a CNN-based image classifier is computed through a backtracking strategy to produce top-down saliency. From a set of saliency maps of an image produced by fast bottom-up saliency approaches, we select the best saliency map suitable for the top-down task. The selected bottom-up saliency map is combined with the top-down saliency map. Features having high combined saliency are used to train a linear SVM classifier to estimate feature saliency. This is integrated with combined saliency and further refined through a multi-scale superpixel-averaging of saliency map. We evaluate the performance of the proposed weakly supervised topdown saliency and achieve comparable performance with fully supervised approaches. Experiments are carried out on seven challenging datasets and quantitative results are compared with 40 closely related approaches across 4 different applications.Comment: 14 pages, 7 figure
    • …
    corecore