414 research outputs found

    Fast Human Detection in Videos using Joint Appearance and Foreground Learning from Covariances of Image Feature Subsets

    Get PDF
    We present a fast method to detect humans from stationary surveillance videos. Traditional approaches exploit background subtraction as an attentive filter, by applying the still image detectors only on foreground regions. This doesn't take into account that foreground observations contain human shape information which can be used for detection. To address this issue, we propose a method that learn the correlation between appearance and foreground information. It is based on a cascade of LogitBoost classifiers which uses covariance matrices computed from appearance and foreground features as object descriptors. We account for the fact that covariance matrices lie in a Riemanian space, introduce different novelties -like exploiting only covariance sub-matrices- to reduce the induced computation load, as well as an image rectification scheme to remove the slant of people in images when dealing with wide angle cameras. Evaluation on a large set of videos shows that our approach performs better than the attentive filter paradigm while processing from 5 to 20 frames/sec. In addition, on the INRIA human (static image) benchmark database, our sub-matrix approach performs better than the full covariance case while reducing the computation cost by more than one order of magnitude

    Semantic-Aware Image Analysis

    Get PDF
    Extracting and utilizing high-level semantic information from images is one of the important goals of computer vision. The ultimate objective of image analysis is to be able to understand each pixel of an image with regard to high-level semantics, e.g. the objects, the stuff, and their spatial, functional and semantic relations. In recent years, thanks to large labeled datasets and deep learning, great progress has been made to solve image analysis problems, such as image classification, object detection, and object pose estimation. In this work, we explore several aspects of semantic-aware image analysis. First, we explore semantic segmentation of man-made scenes using fully connected conditional random fields which can model long-range connections within the image of man-made scenes and make use of contextual information of scene structures. Second, we introduce a semantic smoothing method by exploiting the semantic information to accomplish semantic structure-preserving image smoothing. Semantic segmentation has achieved significant progress recently and has been widely used in many computer vision tasks. We observe that high-level semantic image labeling information can provide a meaningful structure prior to image smoothing naturally. Third, we present a deep object co-segmentation approach for segmenting common objects of the same class within a pair of images. To address this task, we propose a CNN-based Siamese encoder-decoder architecture. The encoder extracts high-level semantic features of the foreground objects, a mutual correlation layer detects the common objects, and finally, the decoder generates the output foreground masks for each image. Finally, we propose an approach to localize common objects from novel object categories in a set of images. We solve this problem using a new common component activation map in which we treat the class-specific activation maps as components to discover the common components in the image set. We show that our approach can generalize on novel object categories in our experiments

    Re-identification by Covariance Descriptors

    Get PDF
    International audienceThis chapter addresses the problem of appearance matching, while employing the covariance descriptor. We tackle the extremely challenging case in which the same non-rigid object has to be matched across disjoint camera views. Covariance statistics averaged over a Riemannian manifold are fundamental for designing appearance models invariant to camera changes. We discuss different ways of extracting an object appearance by incorporating various training strategies. Appearance matching is enhanced either by discriminative analysis using images from a single camera or by selecting distinctive features in a covariance metric space employing data from two cameras. By selecting only essential features for a specific class of objects (\textit{e.g.} humans) without defining \textit{a priori} feature vector for extracting covariance, we remove redundancy from the covariance descriptor and ensure low computational cost. Using a feature selection technique instead of learning on a manifold, we avoid the over-fitting problem. The proposed models have been successfully applied to the person re-identification task in which a human appearance has to be matched across non-overlapping cameras. We carry out detailed experiments of the suggested strategies, demonstrating their pros and cons \textit{w.r.t.} recognition rate and suitability to video analytics systems

    Factored Shapes and Appearances for Parts-based Object Understanding

    Get PDF
    • 

    corecore