17 research outputs found

    Double Backpropagation with Applications to Robustness and Saliency Map Interpretability

    Get PDF
    This thesis is concerned with works in connection to double backpropagation, which is a phenomenon that arises when first-order optimization methods are applied to a neural network's loss function, if this contains derivatives. Its connection to robustness and saliency map interpretability is explained

    Deep learning methods for solving linear inverse problems: Research directions and paradigms

    Get PDF
    The linear inverse problem is fundamental to the development of various scientific areas. Innumerable attempts have been carried out to solve different variants of the linear inverse problem in different applications. Nowadays, the rapid development of deep learning provides a fresh perspective for solving the linear inverse problem, which has various well-designed network architectures results in state-of-the-art performance in many applications. In this paper, we present a comprehensive survey of the recent progress in the development of deep learning for solving various linear inverse problems. We review how deep learning methods are used in solving different linear inverse problems, and explore the structured neural network architectures that incorporate knowledge used in traditional methods. Furthermore, we identify open challenges and potential future directions along this research line

    White-Box Transformers via Sparse Rate Reduction

    Full text link
    In this paper, we contend that the objective of representation learning is to compress and transform the distribution of the data, say sets of tokens, towards a mixture of low-dimensional Gaussian distributions supported on incoherent subspaces. The quality of the final representation can be measured by a unified objective function called sparse rate reduction. From this perspective, popular deep networks such as transformers can be naturally viewed as realizing iterative schemes to optimize this objective incrementally. Particularly, we show that the standard transformer block can be derived from alternating optimization on complementary parts of this objective: the multi-head self-attention operator can be viewed as a gradient descent step to compress the token sets by minimizing their lossy coding rate, and the subsequent multi-layer perceptron can be viewed as attempting to sparsify the representation of the tokens. This leads to a family of white-box transformer-like deep network architectures which are mathematically fully interpretable. Despite their simplicity, experiments show that these networks indeed learn to optimize the designed objective: they compress and sparsify representations of large-scale real-world vision datasets such as ImageNet, and achieve performance very close to thoroughly engineered transformers such as ViT. Code is at \url{https://github.com/Ma-Lab-Berkeley/CRATE}.Comment: 33 pages, 11 figure

    0/1 Deep Neural Networks via Block Coordinate Descent

    Full text link
    The step function is one of the simplest and most natural activation functions for deep neural networks (DNNs). As it counts 1 for positive variables and 0 for others, its intrinsic characteristics (e.g., discontinuity and no viable information of subgradients) impede its development for several decades. Even if there is an impressive body of work on designing DNNs with continuous activation functions that can be deemed as surrogates of the step function, it is still in the possession of some advantageous properties, such as complete robustness to outliers and being capable of attaining the best learning-theoretic guarantee of predictive accuracy. Hence, in this paper, we aim to train DNNs with the step function used as an activation function (dubbed as 0/1 DNNs). We first reformulate 0/1 DNNs as an unconstrained optimization problem and then solve it by a block coordinate descend (BCD) method. Moreover, we acquire closed-form solutions for sub-problems of BCD as well as its convergence properties. Furthermore, we also integrate 2,0\ell_{2,0}-regularization into 0/1 DNN to accelerate the training process and compress the network scale. As a result, the proposed algorithm has a high performance on classifying MNIST and Fashion-MNIST datasets. As a result, the proposed algorithm has a desirable performance on classifying MNIST, FashionMNIST, Cifar10, and Cifar100 datasets

    Outlier Robust Adversarial Training

    Full text link
    Supervised learning models are challenged by the intrinsic complexities of training data such as outliers and minority subpopulations and intentional attacks at inference time with adversarial samples. While traditional robust learning methods and the recent adversarial training approaches are designed to handle each of the two challenges, to date, no work has been done to develop models that are robust with regard to the low-quality training data and the potential adversarial attack at inference time simultaneously. It is for this reason that we introduce Outlier Robust Adversarial Training (ORAT) in this work. ORAT is based on a bi-level optimization formulation of adversarial training with a robust rank-based loss function. Theoretically, we show that the learning objective of ORAT satisfies the H\mathcal{H}-consistency in binary classification, which establishes it as a proper surrogate to adversarial 0/1 loss. Furthermore, we analyze its generalization ability and provide uniform convergence rates in high probability. ORAT can be optimized with a simple algorithm. Experimental evaluations on three benchmark datasets demonstrate the effectiveness and robustness of ORAT in handling outliers and adversarial attacks. Our code is available at https://github.com/discovershu/ORAT.Comment: Accepted by The 15th Asian Conference on Machine Learning (ACML 2023

    Convex and non-convex optimization using centroid-encoding for visualization, classification, and feature selection

    Get PDF
    Includes bibliographical references.2022 Fall.Classification, visualization, and feature selection are the three essential tasks of machine learning. This Ph.D. dissertation presents convex and non-convex models suitable for these three tasks. We propose Centroid-Encoder (CE), an autoencoder-based supervised tool for visualizing complex and potentially large, e.g., SUSY with 5 million samples and high-dimensional datasets, e.g., GSE73072 clinical challenge data. Unlike an autoencoder, which maps a point to itself, a centroid-encoder has a modified target, i.e., the class centroid in the ambient space. We present a detailed comparative analysis of the method using various data sets and state-of-the-art techniques. We have proposed a variation of the centroid-encoder, Bottleneck Centroid-Encoder (BCE), where additional constraints are imposed at the bottleneck layer to improve generalization performance in the reduced space. We further developed a sparse optimization problem for the non-linear mapping of the centroid-encoder called Sparse Centroid-Encoder (SCE) to determine the set of discriminate features between two or more classes. The sparse model selects variables using the 1-norm applied to the input feature space. SCE extracts discriminative features from multi-modal data sets, i.e., data whose classes appear to have multiple clusters, by using several centers per class. This approach seems to have advantages over models which use a one-hot-encoding vector. We also provide a feature selection framework that first ranks each feature by its occurrence, and the optimal number of features is chosen using a validation set. CE and SCE are models based on neural network architectures and require the solution of non-convex optimization problems. Motivated by the CE algorithm, we have developed a convex optimization for the supervised dimensionality reduction technique called Centroid Component Retrieval (CCR). The CCR model optimizes a multi-objective cost by balancing two complementary terms. The first term pulls the samples of a class towards its centroid by minimizing a sample's distance from its class centroid in low dimensional space. The second term pushes the classes by maximizing the scattering volume of the ellipsoid formed by the class-centroids in embedded space. Although the design principle of CCR is similar to LDA, our experimental results show that CCR exhibits performance advantages over LDA, especially on high-dimensional data sets, e.g., Yale Faces, ORL, and COIL20. Finally, we present a linear formulation of Centroid-Encoder with orthogonality constraints, called Principal Centroid Component Analysis (PCCA). This formulation is similar to PCA, except the class labels are used to formulate the objective, resulting in the form of supervised PCA. We show the classification and visualization experiments results with this new linear tool

    Blind Hyperspectral Unmixing Using Autoencoders

    Get PDF
    The subject of this thesis is blind hyperspectral unmixing using deep learning based autoencoders. Two methods based on autoencoders are proposed and analyzed. Both methods seek to exploit the spatial correlations in the hyperspectral images to improve the performance. One by using multitask learning to simultaneously unmix a neighbourhood of pixels while the other by using a convolutional neural network autoencoder. This increases the consistency and robustness of the methods. In addition, a review of the various autoencoder methods in the literature is given along with a detailed discussion of different types of autoencoders. The thesis concludes by a critical comparison of eleven different autoencoder based methods. Ablation experiments are performed to answer the question of why autoencoders are so effective in blind hyperspectral unmixing, and an opinion is given on what the future in autoencoder unmixing holds.Efni þessarar ritgerðar er aðgreining fjölrásamynda (e. blind hyperspectral unmixing) með sjálfkóðurum (e. autoencoders) byggðum á djúpum lærdómi (e. deep learning). Tvær aðferðir byggðar á sjálfkóðurum eru kynntar og rannsakaðar. Báðar aðferðirnar leitast við að nýta sér rúmfræðilega fylgni rófa í fjölrásamyndum til að bæta árangur aðgreiningar. Ein aðferð með að nýta sér fjölbeitingarlærdóm (e. multitask learning) og hin með að nota sjálfkóðara útfærðan með földunartaugnaneti (e. convolutional neural network). Hvortveggja bætir samkvæmni og hæfni fjölrásagreiningarinnar. Ennfremur inniheldur ritgerðin yfirgripsmikið yfirlit yfir þær sjálfkóðaraaðferðir sem hafa verið birtar ásamt greinargóðri umræðu um mismunandi gerðir sjálfkóðara og útfærslur á þeim. í lok ritgerðarinnar er svo að finna gagnrýninn samanburð á 11 mismunandi aðferðum byggðum á sjálfkóðurum. Brottnáms (e. ablation) tilraunir eru gerðar til að svara spurningunni hvers vegna sjálfkóðarar eru svo árangursríkir í fjölrásagreiningu og stuttlega rætt um hvað framtíðin ber í skauti sér varðandi aðgreiningu fjölrásamynda með sjálfkóðurum. Megin framlag ritgerðarinnar er eftirfarandi: - Ný sjálfkóðaraaðferð, MTLAEU, sem nýtir á beinan hátt rúmfræðilega fylgni rófa í fjölrásamyndum til að bæta árangur aðgreiningar. Aðferðin notar fjölbeitingarlærdóm til að aðgreina grennd af rófum í einu. - Ný aðferð, CNNAEU, sem notar 2D földunartaugnanet fyrir bæði kóðara og afkóðara og er fyrsta birta aðferðin til að gera það. Aðferðin er þjálfuð á myndbútum (e.patches) og því er rúmfræðileg bygging myndarinnar sem greina á varðveitt í gegnum aðferðina. - Yfirgripsmikil og ítarlegt fræðilegt yfirlit yfir birtar sjálfkóðaraaðferðir fyrir fjölrásagreiningu. Gefinn er inngangur að sjálfkóðurum og elstu tegundir sjálfkóðara eru kynntar. Gefið er greinargott yfirlit yfir helstu birtar aðferðir fyrir fjölrásagreiningu sem byggja á sjálfkóðurum og gerður er gangrýninn samburður á 11 mismunandi sjálfkóðaraaðferðum.The Icelandic Research Fund under Grants 174075-05 and 207233-05