50,486 research outputs found
Scalable Deep Learning Architecture Design.
PhD ThesesThe past decade has witnessed a rapid development in deep learning research which has enabled
remarkable progress on a wide spectrum of computer vision tasks, such as object recognition,
segmentation, and detection. One generic mechanism for deep learning on computer vision is
to design optimal deep neural architectures for given tasks, so as to learn compact, rich and expressive
features for data collected by artificial visual sensors. Nonetheless, deep artificial neural
architecture design for computer vision tasks remains challenging due to the inherent visual task
complexity and uncertainty. One can not guarantee that a specific network designed for one task
assumably works well for new tasks, especially when it comes to considering scalability (the
model size, learning capacity and efficiency, and domain adaptation to new data). Unfortunately,
there are no theoretical principles towards guiding deep neural architecture design, which makes
researchers having to rely on their own expertise and experience ad hoc. This thesis investigates
approaches to designing deep neural architectures for several tasks by considering the underlying
task characteristics for more efficient and powerful deep models. More specifically, this thesis
develops new methods for addressing four different problems as follows:
Chapter 3 The first problem is harmonious attention network design for scalable person reidentification
(re-id). Existing person re-identification (re-id) deep learning methods rely heavily
on the utilisation of large and computationally expensive convolutional neural networks. They
are therefore not scalable to large scale re-id deployment scenarios with the need of processing a
large amount of surveillance video data, due to the lengthy inference process with high computing
costs. in this chapter, we address this limitation via jointly learning re-id attention selection.
Specifically, we formulate a novel Harmonious Attention Network (HAN) framework to jointly
learn soft pixel attention and hard regional attention alongside simultaneous deep feature representation
learning, particularly enabling more discriminative re-id matching by efficient networks
with more scalable inference. Extensive evaluations validate the cost-effectiveness superiority of
the proposed HAN approach for person re-id against a wide variety of state-of-the-art methods
on large benchmark datasets.
Chapter 4 The second problem is hierarchical distillation network design for scalable person
search. Existing person search methods typically focus on improving person detection accuracy.
This ignores the model inference efficiency, which however is fundamentally significant for
real-world applications. in this chapter, we address this limitation by investigating the scalability
problem of person search involving both model accuracy and inference efficiency simultaneously.
Specifically, we formulate a Hierarchical Distillation Learning (HDL) approach. With
HDL, we aim to comprehensively distil the knowledge of a strong teacher model with strong
learning capability to a lightweight student model with weak learning capability. To facilitate
the HDL process, we design a simple and powerful teacher model for joint learning of person
detection and person re-identification matching in unconstrained scene images. Extensive experiments
show the modelling advantages and cost-effectiveness superiority of HDL over the
4
state-of-the-art person search methods on large person search benchmarks.
Chapter 5 The third problem is neural graph embedding for scalable neural architecture
search. Existing neural architecture search (NAS) methods often operate in discrete or continuous
spaces directly, which ignores the graphical topology knowledge of neural networks. This
leads to suboptimal search performance and efficiency, given that neural networks are essentially
directed acyclic graphs (DAG). in this chapter, we address this limitation by introducing a novel
idea of neural graph embedding (NGE). Specifically, we represent the building block (i.e. the
cell) of neural networks with a neural DAG, and learn it by leveraging a Graph Convolutional
Network to propagate and model the intrinsic topology information of network architectures.
This results in a generic neural network representation integrable with different existing NAS
frameworks. Extensive experiments show the superiority of NGE over the state-of-the-art methods
on image classification and semantic segmentation.
Chapter 6 The last problem is scalable neural operator search. Existing neural architecture
search (NAS) methods explore a limited feature-transformation-only search space, ignoring other
advanced feature operations such as feature self-calibration by attention and dynamic convolutions.
This disables the NAS algorithms from discovering more optimal network architectures.
We address this limitation by additionally exploiting feature self-calibration operations, resulting
in a heterogeneous search space. To overcome the challenges of operation heterogeneity and
significantly larger search space, we formulate a neural operator search (NOS) method. NOS
presents a novel heterogeneous residual block for integrating the heterogeneous operations in a
unified structure, and an attention guided search strategy for facilitating the search process over a
vast space. Extensive experiments show that NOS can search novel cell architectures with highly
competitive performance on the CIFAR and ImageNet benchmarks.
Chapter 6 includes concluding remarks and discusses potential areas for future research and
extensions
- …