17 research outputs found
Double Backpropagation with Applications to Robustness and Saliency Map Interpretability
This thesis is concerned with works in connection to double backpropagation, which is a phenomenon that arises when first-order optimization methods are applied to a neural network's loss function, if this contains derivatives. Its connection to robustness and saliency map interpretability is explained
Deep learning methods for solving linear inverse problems: Research directions and paradigms
The linear inverse problem is fundamental to the development of various scientific areas. Innumerable attempts have been carried out to solve different variants of the linear inverse problem in different applications. Nowadays, the rapid development of deep learning provides a fresh perspective for solving the linear inverse problem, which has various well-designed network architectures results in state-of-the-art performance in many applications. In this paper, we present a comprehensive survey of the recent progress in the development of deep learning for solving various linear inverse problems. We review how deep learning methods are used in solving different linear inverse problems, and explore the structured neural network architectures that incorporate knowledge used in traditional methods. Furthermore, we identify open challenges and potential future directions along this research line
White-Box Transformers via Sparse Rate Reduction
In this paper, we contend that the objective of representation learning is to
compress and transform the distribution of the data, say sets of tokens,
towards a mixture of low-dimensional Gaussian distributions supported on
incoherent subspaces. The quality of the final representation can be measured
by a unified objective function called sparse rate reduction. From this
perspective, popular deep networks such as transformers can be naturally viewed
as realizing iterative schemes to optimize this objective incrementally.
Particularly, we show that the standard transformer block can be derived from
alternating optimization on complementary parts of this objective: the
multi-head self-attention operator can be viewed as a gradient descent step to
compress the token sets by minimizing their lossy coding rate, and the
subsequent multi-layer perceptron can be viewed as attempting to sparsify the
representation of the tokens. This leads to a family of white-box
transformer-like deep network architectures which are mathematically fully
interpretable. Despite their simplicity, experiments show that these networks
indeed learn to optimize the designed objective: they compress and sparsify
representations of large-scale real-world vision datasets such as ImageNet, and
achieve performance very close to thoroughly engineered transformers such as
ViT. Code is at \url{https://github.com/Ma-Lab-Berkeley/CRATE}.Comment: 33 pages, 11 figure
0/1 Deep Neural Networks via Block Coordinate Descent
The step function is one of the simplest and most natural activation
functions for deep neural networks (DNNs). As it counts 1 for positive
variables and 0 for others, its intrinsic characteristics (e.g., discontinuity
and no viable information of subgradients) impede its development for several
decades. Even if there is an impressive body of work on designing DNNs with
continuous activation functions that can be deemed as surrogates of the step
function, it is still in the possession of some advantageous properties, such
as complete robustness to outliers and being capable of attaining the best
learning-theoretic guarantee of predictive accuracy. Hence, in this paper, we
aim to train DNNs with the step function used as an activation function (dubbed
as 0/1 DNNs). We first reformulate 0/1 DNNs as an unconstrained optimization
problem and then solve it by a block coordinate descend (BCD) method. Moreover,
we acquire closed-form solutions for sub-problems of BCD as well as its
convergence properties. Furthermore, we also integrate
-regularization into 0/1 DNN to accelerate the training process and
compress the network scale. As a result, the proposed algorithm has a high
performance on classifying MNIST and Fashion-MNIST datasets. As a result, the
proposed algorithm has a desirable performance on classifying MNIST,
FashionMNIST, Cifar10, and Cifar100 datasets
Outlier Robust Adversarial Training
Supervised learning models are challenged by the intrinsic complexities of
training data such as outliers and minority subpopulations and intentional
attacks at inference time with adversarial samples. While traditional robust
learning methods and the recent adversarial training approaches are designed to
handle each of the two challenges, to date, no work has been done to develop
models that are robust with regard to the low-quality training data and the
potential adversarial attack at inference time simultaneously. It is for this
reason that we introduce Outlier Robust Adversarial Training (ORAT) in this
work. ORAT is based on a bi-level optimization formulation of adversarial
training with a robust rank-based loss function. Theoretically, we show that
the learning objective of ORAT satisfies the -consistency in
binary classification, which establishes it as a proper surrogate to
adversarial 0/1 loss. Furthermore, we analyze its generalization ability and
provide uniform convergence rates in high probability. ORAT can be optimized
with a simple algorithm. Experimental evaluations on three benchmark datasets
demonstrate the effectiveness and robustness of ORAT in handling outliers and
adversarial attacks. Our code is available at
https://github.com/discovershu/ORAT.Comment: Accepted by The 15th Asian Conference on Machine Learning (ACML 2023
Convex and non-convex optimization using centroid-encoding for visualization, classification, and feature selection
Includes bibliographical references.2022 Fall.Classification, visualization, and feature selection are the three essential tasks of machine learning. This Ph.D. dissertation presents convex and non-convex models suitable for these three tasks. We propose Centroid-Encoder (CE), an autoencoder-based supervised tool for visualizing complex and potentially large, e.g., SUSY with 5 million samples and high-dimensional datasets, e.g., GSE73072 clinical challenge data. Unlike an autoencoder, which maps a point to itself, a centroid-encoder has a modified target, i.e., the class centroid in the ambient space. We present a detailed comparative analysis of the method using various data sets and state-of-the-art techniques. We have proposed a variation of the centroid-encoder, Bottleneck Centroid-Encoder (BCE), where additional constraints are imposed at the bottleneck layer to improve generalization performance in the reduced space. We further developed a sparse optimization problem for the non-linear mapping of the centroid-encoder called Sparse Centroid-Encoder (SCE) to determine the set of discriminate features between two or more classes. The sparse model selects variables using the 1-norm applied to the input feature space. SCE extracts discriminative features from multi-modal data sets, i.e., data whose classes appear to have multiple clusters, by using several centers per class. This approach seems to have advantages over models which use a one-hot-encoding vector. We also provide a feature selection framework that first ranks each feature by its occurrence, and the optimal number of features is chosen using a validation set. CE and SCE are models based on neural network architectures and require the solution of non-convex optimization problems. Motivated by the CE algorithm, we have developed a convex optimization for the supervised dimensionality reduction technique called Centroid Component Retrieval (CCR). The CCR model optimizes a multi-objective cost by balancing two complementary terms. The first term pulls the samples of a class towards its centroid by minimizing a sample's distance from its class centroid in low dimensional space. The second term pushes the classes by maximizing the scattering volume of the ellipsoid formed by the class-centroids in embedded space. Although the design principle of CCR is similar to LDA, our experimental results show that CCR exhibits performance advantages over LDA, especially on high-dimensional data sets, e.g., Yale Faces, ORL, and COIL20. Finally, we present a linear formulation of Centroid-Encoder with orthogonality constraints, called Principal Centroid Component Analysis (PCCA). This formulation is similar to PCA, except the class labels are used to formulate the objective, resulting in the form of supervised PCA. We show the classification and visualization experiments results with this new linear tool
Blind Hyperspectral Unmixing Using Autoencoders
The subject of this thesis is blind hyperspectral unmixing using deep learning based autoencoders. Two methods based on autoencoders are proposed and analyzed. Both methods seek to exploit the spatial correlations in the hyperspectral images to improve the performance. One by using multitask learning to simultaneously unmix a neighbourhood of pixels while the other by using a convolutional neural network autoencoder. This increases the consistency and robustness of the methods. In addition, a review of the various autoencoder methods in the literature is given along with a detailed discussion of different types of autoencoders. The thesis concludes by a critical comparison of eleven different autoencoder based methods. Ablation experiments are performed to answer the question of why autoencoders are so effective in blind hyperspectral unmixing, and an opinion is given on what the future in autoencoder unmixing holds.Efni þessarar ritgerðar er aðgreining fjölrásamynda (e. blind hyperspectral unmixing)
með sjálfkóðurum (e. autoencoders) byggðum á djúpum lærdómi (e. deep learning).
Tvær aðferðir byggðar á sjálfkóðurum eru kynntar og rannsakaðar. Báðar aðferðirnar
leitast við að nýta sér rúmfræðilega fylgni rófa í fjölrásamyndum til að bæta árangur
aðgreiningar. Ein aðferð með að nýta sér fjölbeitingarlærdóm (e. multitask learning)
og hin með að nota sjálfkóðara útfærðan með földunartaugnaneti (e. convolutional
neural network). Hvortveggja bætir samkvæmni og hæfni fjölrásagreiningarinnar.
Ennfremur inniheldur ritgerðin yfirgripsmikið yfirlit yfir þær sjálfkóðaraaðferðir sem
hafa verið birtar ásamt greinargóðri umræðu um mismunandi gerðir sjálfkóðara og
útfærslur á þeim.
í lok ritgerðarinnar er svo að finna gagnrýninn samanburð á 11 mismunandi aðferðum byggðum á sjálfkóðurum. Brottnáms (e. ablation) tilraunir eru gerðar til að svara
spurningunni hvers vegna sjálfkóðarar eru svo árangursríkir í fjölrásagreiningu og stuttlega rætt um hvað framtíðin ber í skauti sér varðandi aðgreiningu fjölrásamynda með
sjálfkóðurum.
Megin framlag ritgerðarinnar er eftirfarandi:
- Ný sjálfkóðaraaðferð, MTLAEU, sem nýtir á beinan hátt rúmfræðilega fylgni rófa í
fjölrásamyndum til að bæta árangur aðgreiningar. Aðferðin notar fjölbeitingarlærdóm
til að aðgreina grennd af rófum í einu.
- Ný aðferð, CNNAEU, sem notar 2D földunartaugnanet fyrir bæði kóðara og afkóðara
og er fyrsta birta aðferðin til að gera það. Aðferðin er þjálfuð á myndbútum (e.patches)
og því er rúmfræðileg bygging myndarinnar sem greina á varðveitt í gegnum aðferðina.
- Yfirgripsmikil og ítarlegt fræðilegt yfirlit yfir birtar sjálfkóðaraaðferðir fyrir fjölrásagreiningu. Gefinn er inngangur að sjálfkóðurum og elstu tegundir sjálfkóðara eru
kynntar. Gefið er greinargott yfirlit yfir helstu birtar aðferðir fyrir fjölrásagreiningu
sem byggja á sjálfkóðurum og gerður er gangrýninn samburður á 11 mismunandi sjálfkóðaraaðferðum.The Icelandic Research Fund under Grants 174075-05 and 207233-05