2,357 research outputs found

    Re-Identification with Consistent Attentive Siamese Networks

    Full text link
    We propose a new deep architecture for person re-identification (re-id). While re-id has seen much recent progress, spatial localization and view-invariant representation learning for robust cross-view matching remain key, unsolved problems. We address these questions by means of a new attention-driven Siamese learning architecture, called the Consistent Attentive Siamese Network. Our key innovations compared to existing, competing methods include (a) a flexible framework design that produces attention with only identity labels as supervision, (b) explicit mechanisms to enforce attention consistency among images of the same person, and (c) a new Siamese framework that integrates attention and attention consistency, producing principled supervisory signals as well as the first mechanism that can explain the reasoning behind the Siamese framework's predictions. We conduct extensive evaluations on the CUHK03-NP, DukeMTMC-ReID, and Market-1501 datasets and report competitive performance.Comment: 10 pages, 8 figures, 3 tables, to appear in CVPR 201

    Adaptive Affinity Fields for Semantic Segmentation

    Full text link
    Semantic segmentation has made much progress with increasingly powerful pixel-wise classifiers and incorporating structural priors via Conditional Random Fields (CRF) or Generative Adversarial Networks (GAN). We propose a simpler alternative that learns to verify the spatial structure of segmentation during training only. Unlike existing approaches that enforce semantic labels on individual pixels and match labels between neighbouring pixels, we propose the concept of Adaptive Affinity Fields (AAF) to capture and match the semantic relations between neighbouring pixels in the label space. We use adversarial learning to select the optimal affinity field size for each semantic category. It is formulated as a minimax problem, optimizing our segmentation neural network in a best worst-case learning scenario. AAF is versatile for representing structures as a collection of pixel-centric relations, easier to train than GAN and more efficient than CRF without run-time inference. Our extensive evaluations on PASCAL VOC 2012, Cityscapes, and GTA5 datasets demonstrate its above-par segmentation performance and robust generalization across domains.Comment: To appear in European Conference on Computer Vision (ECCV) 201

    Video Salient Object Detection Using Spatiotemporal Deep Features

    Full text link
    This paper presents a method for detecting salient objects in videos where temporal information in addition to spatial information is fully taken into account. Following recent reports on the advantage of deep features over conventional hand-crafted features, we propose a new set of SpatioTemporal Deep (STD) features that utilize local and global contexts over frames. We also propose new SpatioTemporal Conditional Random Field (STCRF) to compute saliency from STD features. STCRF is our extension of CRF to the temporal domain and describes the relationships among neighboring regions both in a frame and over frames. STCRF leads to temporally consistent saliency maps over frames, contributing to the accurate detection of salient objects' boundaries and noise reduction during detection. Our proposed method first segments an input video into multiple scales and then computes a saliency map at each scale level using STD features with STCRF. The final saliency map is computed by fusing saliency maps at different scale levels. Our experiments, using publicly available benchmark datasets, confirm that the proposed method significantly outperforms state-of-the-art methods. We also applied our saliency computation to the video object segmentation task, showing that our method outperforms existing video object segmentation methods.Comment: accepted at TI

    Ordered or Orderless: A Revisit for Video based Person Re-Identification

    Full text link
    Is recurrent network really necessary for learning a good visual representation for video based person re-identification (VPRe-id)? In this paper, we first show that the common practice of employing recurrent neural networks (RNNs) to aggregate temporal spatial features may not be optimal. Specifically, with a diagnostic analysis, we show that the recurrent structure may not be effective to learn temporal dependencies than what we expected and implicitly yields an orderless representation. Based on this observation, we then present a simple yet surprisingly powerful approach for VPRe-id, where we treat VPRe-id as an efficient orderless ensemble of image based person re-identification problem. More specifically, we divide videos into individual images and re-identify person with ensemble of image based rankers. Under the i.i.d. assumption, we provide an error bound that sheds light upon how could we improve VPRe-id. Our work also presents a promising way to bridge the gap between video and image based person re-identification. Comprehensive experimental evaluations demonstrate that the proposed solution achieves state-of-the-art performances on multiple widely used datasets (iLIDS-VID, PRID 2011, and MARS).Comment: Under Minor Revision in IEEE TPAM

    Frame-wise Motion and Appearance for Real-time Multiple Object Tracking

    Full text link
    The main challenge of Multiple Object Tracking (MOT) is the efficiency in associating indefinite number of objects between video frames. Standard motion estimators used in tracking, e.g., Long Short Term Memory (LSTM), only deal with single object, while Re-IDentification (Re-ID) based approaches exhaustively compare object appearances. Both approaches are computationally costly when they are scaled to a large number of objects, making it very difficult for real-time MOT. To address these problems, we propose a highly efficient Deep Neural Network (DNN) that simultaneously models association among indefinite number of objects. The inference computation of the DNN does not increase with the number of objects. Our approach, Frame-wise Motion and Appearance (FMA), computes the Frame-wise Motion Fields (FMF) between two frames, which leads to very fast and reliable matching among a large number of object bounding boxes. As auxiliary information is used to fix uncertain matches, Frame-wise Appearance Features (FAF) are learned in parallel with FMFs. Extensive experiments on the MOT17 benchmark show that our method achieved real-time MOT with competitive results as the state-of-the-art approaches.Comment: 13 pages, 4 figures, 4 table

    cvpaper.challenge in 2015 - A review of CVPR2015 and DeepSurvey

    Full text link
    The "cvpaper.challenge" is a group composed of members from AIST, Tokyo Denki Univ. (TDU), and Univ. of Tsukuba that aims to systematically summarize papers on computer vision, pattern recognition, and related fields. For this particular review, we focused on reading the ALL 602 conference papers presented at the CVPR2015, the premier annual computer vision event held in June 2015, in order to grasp the trends in the field. Further, we are proposing "DeepSurvey" as a mechanism embodying the entire process from the reading through all the papers, the generation of ideas, and to the writing of paper.Comment: Survey Pape

    Learning Context Graph for Person Search

    Full text link
    Person re-identification has achieved great progress with deep convolutional neural networks. However, most previous methods focus on learning individual appearance feature embedding, and it is hard for the models to handle difficult situations with different illumination, large pose variance and occlusion. In this work, we take a step further and consider employing context information for person search. For a probe-gallery pair, we first propose a contextual instance expansion module, which employs a relative attention module to search and filter useful context information in the scene. We also build a graph learning framework to effectively employ context pairs to update target similarity. These two modules are built on top of a joint detection and instance feature learning framework, which improves the discriminativeness of the learned features. The proposed framework achieves state-of-the-art performance on two widely used person search datasets.Comment: To appear in CVPR 201

    Probabilistic Semantic Retrieval for Surveillance Videos with Activity Graphs

    Full text link
    We present a novel framework for finding complex activities matching user-described queries in cluttered surveillance videos. The wide diversity of queries coupled with unavailability of annotated activity data limits our ability to train activity models. To bridge the semantic gap we propose to let users describe an activity as a semantic graph with object attributes and inter-object relationships associated with nodes and edges, respectively. We learn node/edge-level visual predictors during training and, at test-time, propose to retrieve activity by identifying likely locations that match the semantic graph. We formulate a novel CRF based probabilistic activity localization objective that accounts for mis-detections, mis-classifications and track-losses, and outputs a likelihood score for a candidate grounded location of the query in the video. We seek groundings that maximize overall precision and recall. To handle the combinatorial search over all high-probability groundings, we propose a highest precision subgraph matching algorithm. Our method outperforms existing retrieval methods on benchmarked datasets.Comment: 1520-9210 (c) 2018 IEEE. This paper has been accepted by IEEE Transactions on Multimedia. Print ISSN: 1520-9210. Online ISSN: 1941-0077. Preprint link is https://ieeexplore.ieee.org/document/8438958

    An Introduction to Person Re-identification with Generative Adversarial Networks

    Full text link
    Person re-identification is a basic subject in the field of computer vision. The traditional methods have several limitations in solving the problems of person illumination like occlusion, pose variation and feature variation under complex background. Fortunately, deep learning paradigm opens new ways of the person re-identification research and becomes a hot spot in this field. Generative Adversarial Nets (GANs) in the past few years attracted lots of attention in solving these problems. This paper reviews the GAN based methods for person re-identification focuses on the related papers about different GAN based frameworks and discusses their advantages and disadvantages. Finally, it proposes the direction of future research, especially the prospect of person re-identification methods based on GANs

    Pyramidal Person Re-IDentification via Multi-Loss Dynamic Training

    Full text link
    Most existing Re-IDentification (Re-ID) methods are highly dependent on precise bounding boxes that enable images to be aligned with each other. However, due to the challenging practical scenarios, current detection models often produce inaccurate bounding boxes, which inevitably degenerate the performance of existing Re-ID algorithms. In this paper, we propose a novel coarse-to-fine pyramid model to relax the need of bounding boxes, which not only incorporates local and global information, but also integrates the gradual cues between them. The pyramid model is able to match at different scales and then search for the correct image of the same identity, even when the image pairs are not aligned. In addition, in order to learn discriminative identity representation, we explore a dynamic training scheme to seamlessly unify two losses and extract appropriate shared information between them. Experimental results clearly demonstrate that the proposed method achieves the state-of-the-art results on three datasets. Especially, our approach exceeds the current best method by 9.5% on the most challenging CUHK03 dataset.Comment: Accepted by 2019 Conference on Computer Vision and Pattern Recognitio
    corecore