5 research outputs found

    Learning by correlation for computer vision applications: from Kernel methods to deep learning

    Get PDF
    Learning to spot analogies and differences within/across visual categories is an arguably powerful approach in machine learning and pattern recognition which is directly inspired by human cognition. In this thesis, we investigate a variety of approaches which are primarily driven by correlation and tackle several computer vision applications

    ALGORITHMIC HEURISTICS IN DEEP LEARNING: REGULARIZATION AND ROBUSTNESS

    Get PDF
    While deep learning continues to advance our technological world, its theoretical underpinnings are far from understood. In this thesis, we focus on regularization and robustness due to algorithmic heuristics that are often leveraged in state-of-the-art deep learning systems. In particular, we take steps towards a formal understanding of regularization due to dropout, which is a popular local-search heuristics in deep learning. We also present a theoretical study of adversarial training, an effective local-search heuristic to train models that are more robust against adversarial perturbations. The thesis is organized as follows. In Chapters 2 and 3, we focus on the explicit regularization due to dropout in shallow and deep linear networks. We show that dropout, as a learning rule, amounts to regularizing the objective with a data-dependent term, which includes products of the weights along certain cycles in the network graph. We then show that under certain conditions, this regularizer boils down to a trace-norm penalty, which provides a rich inductive bias in matrix learning problems. In Chapter 4, we study the learning theoretic implications of the explicit regularizer. In particular, focusing on the matrix completion problem, we provide precise ε- suboptimality results for the dropout rule. We also provide extensive empirical evidence establishing that even in this simple application, algorithmic heuristics such as dropout can dramatically boost the generalization performance of gradient-based optimization methods. We further provide generalization error guarantees for the dropout rule in the two-layer neural networks with ReLU activation. We provide extensive numerical evaluations verifying that the proposed theoretical bound is predictive of the observed generalization gap. In Chapter 5, we focus on the computational aspects of dropout. We provide precise iteration complexity rates for training two-layer ReLU neural networks with dropout, under certain distributional assumptions and over-parameterization requirements. We also show that dropout implicitly compresses the network. In particular, we show that there exists a sub-network, i.e., one of the iterates of dropout training, that can generalize as well as any complete network. Finally, in Chapter 6, we switch gears towards adversarial training in two-layer neural networks with Leaky ReLU activation. We provide precise iteration complexity results for end-to-end adversarial training when the underlying distribution is separable. Our results include a convergence guarantee for the PGD attack, which is a popular local-search heuristic for finding adversarial perturbations, and guarantees suboptimality in terms of the robust generalization error, both of which are the first of their kind. More importantly, our results hold for any width and initialization
    corecore