Search CORE

50,486 research outputs found

Harmonious Attention Network for Person Re-Identification

Author: GONG SS
IEEE Conference on Computer Vision and Pattern Recognition
Publication venue
Publication date: 18/06/2018
Field of study

Scalable Deep Learning Architecture Design.

Author: Li Wei
Publication venue: Queen Mary University of London.
Publication date: 08/02/2021
Field of study

PhD ThesesThe past decade has witnessed a rapid development in deep learning research which has enabled remarkable progress on a wide spectrum of computer vision tasks, such as object recognition, segmentation, and detection. One generic mechanism for deep learning on computer vision is to design optimal deep neural architectures for given tasks, so as to learn compact, rich and expressive features for data collected by artificial visual sensors. Nonetheless, deep artificial neural architecture design for computer vision tasks remains challenging due to the inherent visual task complexity and uncertainty. One can not guarantee that a specific network designed for one task assumably works well for new tasks, especially when it comes to considering scalability (the model size, learning capacity and efficiency, and domain adaptation to new data). Unfortunately, there are no theoretical principles towards guiding deep neural architecture design, which makes researchers having to rely on their own expertise and experience ad hoc. This thesis investigates approaches to designing deep neural architectures for several tasks by considering the underlying task characteristics for more efficient and powerful deep models. More specifically, this thesis develops new methods for addressing four different problems as follows: Chapter 3 The first problem is harmonious attention network design for scalable person reidentification (re-id). Existing person re-identification (re-id) deep learning methods rely heavily on the utilisation of large and computationally expensive convolutional neural networks. They are therefore not scalable to large scale re-id deployment scenarios with the need of processing a large amount of surveillance video data, due to the lengthy inference process with high computing costs. in this chapter, we address this limitation via jointly learning re-id attention selection. Specifically, we formulate a novel Harmonious Attention Network (HAN) framework to jointly learn soft pixel attention and hard regional attention alongside simultaneous deep feature representation learning, particularly enabling more discriminative re-id matching by efficient networks with more scalable inference. Extensive evaluations validate the cost-effectiveness superiority of the proposed HAN approach for person re-id against a wide variety of state-of-the-art methods on large benchmark datasets. Chapter 4 The second problem is hierarchical distillation network design for scalable person search. Existing person search methods typically focus on improving person detection accuracy. This ignores the model inference efficiency, which however is fundamentally significant for real-world applications. in this chapter, we address this limitation by investigating the scalability problem of person search involving both model accuracy and inference efficiency simultaneously. Specifically, we formulate a Hierarchical Distillation Learning (HDL) approach. With HDL, we aim to comprehensively distil the knowledge of a strong teacher model with strong learning capability to a lightweight student model with weak learning capability. To facilitate the HDL process, we design a simple and powerful teacher model for joint learning of person detection and person re-identification matching in unconstrained scene images. Extensive experiments show the modelling advantages and cost-effectiveness superiority of HDL over the 4 state-of-the-art person search methods on large person search benchmarks. Chapter 5 The third problem is neural graph embedding for scalable neural architecture search. Existing neural architecture search (NAS) methods often operate in discrete or continuous spaces directly, which ignores the graphical topology knowledge of neural networks. This leads to suboptimal search performance and efficiency, given that neural networks are essentially directed acyclic graphs (DAG). in this chapter, we address this limitation by introducing a novel idea of neural graph embedding (NGE). Specifically, we represent the building block (i.e. the cell) of neural networks with a neural DAG, and learn it by leveraging a Graph Convolutional Network to propagate and model the intrinsic topology information of network architectures. This results in a generic neural network representation integrable with different existing NAS frameworks. Extensive experiments show the superiority of NGE over the state-of-the-art methods on image classification and semantic segmentation. Chapter 6 The last problem is scalable neural operator search. Existing neural architecture search (NAS) methods explore a limited feature-transformation-only search space, ignoring other advanced feature operations such as feature self-calibration by attention and dynamic convolutions. This disables the NAS algorithms from discovering more optimal network architectures. We address this limitation by additionally exploiting feature self-calibration operations, resulting in a heterogeneous search space. To overcome the challenges of operation heterogeneity and significantly larger search space, we formulate a neural operator search (NOS) method. NOS presents a novel heterogeneous residual block for integrating the heterogeneous operations in a unified structure, and an attention guided search strategy for facilitating the search process over a vast space. Extensive experiments show that NOS can search novel cell architectures with highly competitive performance on the CIFAR and ImageNet benchmarks. Chapter 6 includes concluding remarks and discusses potential areas for future research and extensions

Queen Mary Research Online