20 research outputs found

    Gradient Descent Ascent for Min-Max Problems on Riemannian Manifolds

    Full text link
    In the paper, we study a class of useful non-convex minimax optimization problems on Riemanian manifolds and propose a class of Riemanian gradient descent ascent algorithms to solve these minimax problems. Specifically, we propose a new Riemannian gradient descent ascent (RGDA) algorithm for the \textbf{deterministic} minimax optimization. Moreover, we prove that the RGDA has a sample complexity of O(κ2ϵ−2)O(\kappa^2\epsilon^{-2}) for finding an ϵ\epsilon-stationary point of the nonconvex strongly-concave minimax problems, where κ\kappa denotes the condition number. At the same time, we introduce a Riemannian stochastic gradient descent ascent (RSGDA) algorithm for the \textbf{stochastic} minimax optimization. In the theoretical analysis, we prove that the RSGDA can achieve a sample complexity of O(κ3ϵ−4)O(\kappa^3\epsilon^{-4}). To further reduce the sample complexity, we propose a novel momentum variance-reduced Riemannian stochastic gradient descent ascent (MVR-RSGDA) algorithm based on the momentum-based variance-reduced technique of STORM. We prove that the MVR-RSGDA algorithm achieves a lower sample complexity of O~(κ(3−ν/2)ϵ−3)\tilde{O}(\kappa^{(3-\nu/2)}\epsilon^{-3}) for ν≥0\nu \geq 0, which reaches the best known sample complexity for its Euclidean counterpart. Extensive experimental results on the robust deep neural networks training over Stiefel manifold demonstrate the efficiency of our proposed algorithms.Comment: 32 pages. We have updated the theoretical results of our methods in this new revision. E.g., our MVR-RSGDA algorithm achieves a lower sample complexity. arXiv admin note: text overlap with arXiv:2008.0817

    New Efficient Pruning Algorithms for Compressing and Accelerating Convolutional Neural Networks

    No full text
    Recently, Convolutional Neural Networks (CNNs) are continuously achieving state-of-the-art results in numerous machine-learning tasks. While having impressive performance, the size of current models is also exploding. Motivated by efficient inference, many researchers have been devoted to reducing the storage and computational costs of state-of-the-art models. Channel pruning emerges as a promising solution to reduce the size of the model, and it can achieve acceleration without any post-processing steps. Current channel pruning methods are either time-consuming (reinforcement learning, greedy search, etc.) or depend on fixed criteria of channels resulting in poor results. In this dissertation work, we propose new methods from the perspective of gradient-guided pruning. We then formulate the pruning problem as a constrained discrete optimization problem. Our discrete model compression work aims to solve this constrained problem by using differentiable gates and propagating gradients through a straight-through estimator. We further improve the results in network pruning via performance maximization by adding a performance prediction loss into the constrained optimization problem. The search for sub-networks is then directly guided by the accuracy of a sub-network. The improvement of supervision leads to better pruning results. On top of previous works, we propose to further improve our algorithms from different perspectives. The first perspective is to disentangle width and importance for finding the optimal model architecture. From this end, we propose to use an importance generation network and a width generation network to generate the importance and width for each layer. Another challenge in previous works is the huge gap between the model before and after network pruning. To mitigate this gap, we first learn a target sub-network during the model training process, and then we use this sub-network to guide the learning of model weights through partial regularization. Based on the success of previous static pruning methods, we further incorporate dynamic pruning for storage-efficient dynamic pruning

    EffConv: Efficient Learning of Kernel Sizes for Convolution Layers of CNNs

    No full text
    Determining kernel sizes of a CNN model is a crucial and non-trivial design choice and significantly impacts its performance. The majority of kernel size design methods rely on complex heuristic tricks or leverage neural architecture search that requires extreme computational resources. Thus, learning kernel sizes, using methods such as modeling kernels as a combination of basis functions, jointly with the model weights has been proposed as a workaround. However, previous methods cannot achieve satisfactory results or are inefficient for large-scale datasets. To fill this gap, we design a novel efficient kernel size learning method in which a size predictor model learns to predict optimal kernel sizes for a classifier given a desired number of parameters. It does so in collaboration with a kernel predictor model that predicts the weights of the kernels - given kernel sizes predicted by the size predictor - to minimize the training objective, and both models are trained end-to-end. Our method only needs a small fraction of the training epochs of the original CNN to train these two models and find proper kernel sizes for it. Thus, it offers an efficient and effective solution for the kernel size learning problem. Our extensive experiments on MNIST, CIFAR-10, STL-10, and ImageNet-32 demonstrate that our method can achieve the best training time vs. accuracy trade-off compared to previous kernel size learning methods and significantly outperform them on challenging datasets such as STL-10 and ImageNet-32. Our implementations are available at https://github.com/Alii-Ganjj/EffConv

    Video Recovery via Learning Variation and Consistency of Images

    No full text
    Matrix completion algorithms have been popularly used to recover images with missing entries, and they are proved to be very effective. Recent works utilized tensor completion models in video recovery assuming that all video frames are homogeneous and correlated. However, real videos are made up of different episodes or scenes, i.e. heterogeneous. Therefore, a video recovery model which utilizes both video spatiotemporal consistency and variation is necessary. To solve this problem, we propose a new video recovery method Sectional Trace Norm with Variation and Consistency Constraints (STN-VCC). In our model, capped L1-norm regularization is utilized to learn the spatial-temporal consistency and variation between consecutive frames in video clips. Meanwhile, we introduce a new low-rank model to capture the low-rank structure in video frames with a better approximation of rank minimization than traditional trace norm. An efficient optimization algorithm is proposed, and we also provide a proof of convergence in the paper. We evaluate the proposed method via several video recovery tasks and experiment results show that our new method consistently outperforms other related approaches

    Discriminative Multi-instance Multitask Learning for 3D Action Recognition

    No full text
    corecore