519 research outputs found
ADMM Training Algorithms for Residual Networks: Convergence, Complexity and Parallel Training
We design a series of serial and parallel proximal point (gradient) ADMMs for
the fully connected residual networks (FCResNets) training problem by
introducing auxiliary variables. Convergence of the proximal point version is
proven based on a Kurdyka-Lojasiewicz (KL) property analysis framework, and we
can ensure a locally R-linear or sublinear convergence rate depending on the
different ranges of the Kurdyka-Lojasiewicz (KL) exponent, in which a necessary
auxiliary function is constructed to realize our goal. Moreover, the advantages
of the parallel implementation in terms of lower time complexity and less
(per-node) memory consumption are analyzed theoretically. To the best of our
knowledge, this is the first work analyzing the convergence, convergence rate,
time complexity and (per-node) runtime memory requirement of the ADMM applied
in the FCResNets training problem theoretically. Experiments are reported to
show the high speed, better performance, robustness and potential in the deep
network training tasks. Finally, we present the advantage and potential of our
parallel training in large-scale problems
C-SURE: Shrinkage Estimator and Prototype Classifier for Complex-Valued Deep Learning
The James-Stein (JS) shrinkage estimator is a biased estimator that captures
the mean of Gaussian random vectors.While it has a desirable statistical
property of dominance over the maximum likelihood estimator (MLE) in terms of
mean squared error (MSE), not much progress has been made on extending the
estimator onto manifold-valued data.
We propose C-SURE, a novel Stein's unbiased risk estimate (SURE) of the JS
estimator on the manifold of complex-valued data with a theoretically proven
optimum over MLE. Adapting the architecture of the complex-valued SurReal
classifier, we further incorporate C-SURE into a prototype convolutional neural
network (CNN) classifier. We compare C-SURE with SurReal and a real-valued
baseline on complex-valued MSTAR and RadioML datasets.
C-SURE is more accurate and robust than SurReal, and the shrinkage estimator
is always better than MLE for the same prototype classifier. Like SurReal,
C-SURE is much smaller, outperforming the real-valued baseline on MSTAR
(RadioML) with less than 1 percent (3 percent) of the baseline sizeComment: Submitted to CVPR PBVS worksho
Rethinking Temporal Fusion for Video-based Person Re-identification on Semantic and Time Aspect
Recently, the research interest of person re-identification (ReID) has
gradually turned to video-based methods, which acquire a person representation
by aggregating frame features of an entire video. However, existing video-based
ReID methods do not consider the semantic difference brought by the outputs of
different network stages, which potentially compromises the information
richness of the person features. Furthermore, traditional methods ignore
important relationship among frames, which causes information redundancy in
fusion along the time axis. To address these issues, we propose a novel general
temporal fusion framework to aggregate frame features on both semantic aspect
and time aspect. As for the semantic aspect, a multi-stage fusion network is
explored to fuse richer frame features at multiple semantic levels, which can
effectively reduce the information loss caused by the traditional single-stage
fusion. While, for the time axis, the existing intra-frame attention method is
improved by adding a novel inter-frame attention module, which effectively
reduces the information redundancy in temporal fusion by taking the
relationship among frames into consideration. The experimental results show
that our approach can effectively improve the video-based re-identification
accuracy, achieving the state-of-the-art performance
- …