4,972 research outputs found

    Deep Visual Feature Learning for Vehicle Detection, Recognition and Re-identification

    Get PDF
    Along with the ever-increasing number of motor vehicles in current transportation systems, intelligent video surveillance and management becomes more necessary which is one of the important artificial intelligence fields. Vehicle-related problems are being widely explored and applied practically. Among various techniques, computer vision and machine learning algorithms have been the most popular ones since a vast of video/image surveillance data are available for research, nowadays. In this thesis, vision-based approaches for vehicle detection, recognition, and re-identification are extensively investigated. Moreover, to address different challenges, several novel methods are proposed to overcome weaknesses of previous works and achieve compelling performance. Deep visual feature learning has been widely researched in the past five years and obtained huge progress in many applications including image classification, image retrieval, object detection, image segmentation and image generation. Compared with traditional machine learning methods which consist of hand-crafted feature extraction and shallow model learning, deep neural networks can learn hierarchical feature representations from low-level to high-level features to get more robust recognition precision. For some specific tasks, researchers prefer to embed feature learning and classification/regression methods into end-to-end models, which can benefit both the accuracy and efficiency. In this thesis, deep models are mainly investigated to study the research problems. Vehicle detection is the most fundamental task in intelligent video surveillance but faces many challenges such as severe illumination and viewpoint variations, occlusions and multi-scale problems. Moreover, learning vehicles’ diverse attributes is also an interesting and valuable problem. To address these tasks and their difficulties, a fast framework of Detection and Annotation for Vehicles (DAVE) is presented, which effectively combines vehicle detection and attributes annotation. DAVE consists of two convolutional neural networks (CNNs): afastvehicleproposalnetwork(FVPN)forvehicle-likeobjectsextraction and an attributes learning network (ALN) aiming to verify each proposal and infer each vehicle’s pose, color and type simultaneously. These two nets are jointly optimized so that the abundant latent knowledge learned from the ALN can be exploited to guide FVPN training. Once the model is trained, it can achieve efficient vehicle detection and annotation for real-world traffic surveillance data. The second research problem of the thesis focuses on vehicle re-identification (re-ID). Vehicle re-ID aims to identify a target vehicle in different cameras with non-overlapping views. It has received far less attention in the computer vision community than the prevalent person re-ID problem. Possible reasons for this slow progress are the lack of appropriate research data and the special 3D structure of a vehicle. Previous works have generally focused on some specific views (e.g. front), but these methods are less effective in realistic scenarios where vehicles usually appear in arbitrary view points to cameras. In this thesis, I focus on the uncertainty of vehicle viewpoint in re-ID, proposing four different approaches to address the multi-view vehicle re-ID problem: (1) The Spatially Concatenated ConvNet (SCCN) in an encoder-decoder architecture is proposed to learn transformations across different viewpoints of a vehicle, and then spatially concatenate all the feature maps for further fusing them into a multi-view feature representation. (2) A Cross-View Generative Adversarial Network (XVGAN)is designed to take an input image’s feature as conditional embedding to effectively infer cross-view images. The features of the inferred and original images are combined to learn distance metrics for re-ID.(3)The great advantages of a bi-directional Long Short-Term Memory (LSTM) loop are investigated of modeling transformations across continuous view variation of a vehicle. (4) A Viewpoint-aware Attentive Multi-view Inference (VAMI) model is proposed, adopting a viewpoint-aware attention model to select core regions at different viewpoints and then performing multi-view feature inference by an adversarial training architecture

    Uncertainty-Aware Multi-Shot Knowledge Distillation for Image-Based Object Re-Identification

    Full text link
    Object re-identification (re-id) aims to identify a specific object across times or camera views, with the person re-id and vehicle re-id as the most widely studied applications. Re-id is challenging because of the variations in viewpoints, (human) poses, and occlusions. Multi-shots of the same object can cover diverse viewpoints/poses and thus provide more comprehensive information. In this paper, we propose exploiting the multi-shots of the same identity to guide the feature learning of each individual image. Specifically, we design an Uncertainty-aware Multi-shot Teacher-Student (UMTS) Network. It consists of a teacher network (T-net) that learns the comprehensive features from multiple images of the same object, and a student network (S-net) that takes a single image as input. In particular, we take into account the data dependent heteroscedastic uncertainty for effectively transferring the knowledge from the T-net to S-net. To the best of our knowledge, we are the first to make use of multi-shots of an object in a teacher-student learning manner for effectively boosting the single image based re-id. We validate the effectiveness of our approach on the popular vehicle re-id and person re-id datasets. In inference, the S-net alone significantly outperforms the baselines and achieves the state-of-the-art performance.Comment: Thirty-Fourth AAAI Conference on Artificial Intelligence (AAAI-20

    A Causal And-Or Graph Model for Visibility Fluent Reasoning in Tracking Interacting Objects

    Full text link
    Tracking humans that are interacting with the other subjects or environment remains unsolved in visual tracking, because the visibility of the human of interests in videos is unknown and might vary over time. In particular, it is still difficult for state-of-the-art human trackers to recover complete human trajectories in crowded scenes with frequent human interactions. In this work, we consider the visibility status of a subject as a fluent variable, whose change is mostly attributed to the subject's interaction with the surrounding, e.g., crossing behind another object, entering a building, or getting into a vehicle, etc. We introduce a Causal And-Or Graph (C-AOG) to represent the causal-effect relations between an object's visibility fluent and its activities, and develop a probabilistic graph model to jointly reason the visibility fluent change (e.g., from visible to invisible) and track humans in videos. We formulate this joint task as an iterative search of a feasible causal graph structure that enables fast search algorithm, e.g., dynamic programming method. We apply the proposed method on challenging video sequences to evaluate its capabilities of estimating visibility fluent changes of subjects and tracking subjects of interests over time. Results with comparisons demonstrate that our method outperforms the alternative trackers and can recover complete trajectories of humans in complicated scenarios with frequent human interactions.Comment: accepted by CVPR 201
    • …
    corecore