4 research outputs found

    Amata: An Annealing Mechanism for Adversarial Training Acceleration

    Full text link
    Despite the empirical success in various domains, it has been revealed that deep neural networks are vulnerable to maliciously perturbed input data that much degrade their performance. This is known as adversarial attacks. To counter adversarial attacks, adversarial training formulated as a form of robust optimization has been demonstrated to be effective. However, conducting adversarial training brings much computational overhead compared with standard training. In order to reduce the computational cost, we propose an annealing mechanism, Amata, to reduce the overhead associated with adversarial training. The proposed Amata is provably convergent, well-motivated from the lens of optimal control theory and can be combined with existing acceleration methods to further enhance performance. It is demonstrated that on standard datasets, Amata can achieve similar or better robustness with around 1/3 to 1/2 the computational time compared with traditional methods. In addition, Amata can be incorporated into other adversarial training acceleration algorithms (e.g. YOPO, Free, Fast, and ATTA), which leads to further reduction in computational time on large-scale problems.Comment: accepted by AAA

    A Study on Deep Learning: Training, Models and Applications

    Get PDF
    In the past few years, deep learning has become a very important research field that has attracted a lot of research interests, attributing to the development of the computational hardware like high performance GPUs, training deep models, such as fully-connected deep neural networks (DNNs) and convolutional neural networks (CNNs), from scratch becomes practical, and using well-trained deep models to deal with real-world large scale problems also becomes possible. This dissertation mainly focuses on three important problems in deep learning, i.e., training algorithm, computational models and applications, and provides several methods to improve the performances of different deep learning methods. The first method is a DNN training algorithm called Annealed Gradient Descent (AGD). This dissertation presents a theoretical analysis on the convergence properties and learning speed of AGD to show its benefits. Experimental results have shown that AGD yields comparable performance as SGD but it can significantly expedite training of DNNs in big data sets. Secondly, this dissertation proposes to apply a novel model, namely Hybrid Orthogonal Projection and Estimation (HOPE), to CNNs. HOPE can be viewed as a hybrid model to combine feature extraction with mixture models. The experimental results have shown that HOPE layers can significantly improve the performance of CNNs in the image classification tasks. The third proposed method is to apply CNNs to image saliency detection. In this approach, a gradient descent method is used to iteratively modify the input images based on pixel-wise gradients to reduce a pre-defined cost function. Moreover, SLIC superpixels and low level saliency features are applied to smooth and refine the saliency maps. Experimental results have shown that the proposed methods can generate high-quality salience maps. The last method is also for image saliency detection. However, this method is based on Generative Adversarial Network (GAN). Different from GAN, the proposed method uses fully supervised learning to learn G-Network and D-Network. Therefore, it is called Supervised Adversarial Network (SAN). Moreover, SAN introduces a different G-Network and conv-comparison layers to further improve the saliency performance. Experimental results show that the SAN model can also generate state-of-the-art saliency maps for complicate images
    corecore