5,305 research outputs found
Opportunities and risks of stochastic deep learning
This thesis studies opportunities and risks associated with stochasticity in deep learning that specifically manifest in the context of adversarial robustness and neural architecture search (NAS). On the one hand, opportunities arise because stochastic methods have a strong impact on robustness and generalisation, both from a theoretical and an empirical standpoint. In addition, they provide a framework for navigating non-differentiable search spaces, and for expressing data and model uncertainty. On the other hand, trade-offs (i.e., risks) that are coupled with these benefits need to be carefully considered. The three novel contributions that comprise the main body of this thesis are, by these standards, instances of opportunities and risks.
In the context of adversarial robustness, our first contribution proves that the impact of an adversarial input perturbation on the output of a stochastic neural network (SNN) is theoretically bounded. Specifically, we demonstrate that SNNs are maximally robust when they achieve weight-covariance alignment, i.e., when the vectors of their classifier layer are aligned with the eigenvectors of that layer's covariance matrix. Based on our theoretical insights, we develop a novel SNN architecture with excellent empirical adversarial robustness and show that our theoretical guarantees also hold experimentally.
Furthermore, we discover that SNNs partially owe their robustness to having a noisy loss landscape. Gradient-based adversaries find this landscape difficult to ascend during adversarial perturbation search, and therefore fail to create strong adversarial examples. We show that inducing a noisy loss landscape is not an effective defence mechanism, as it is easy to circumvent. To demonstrate that point, we develop a stochastic loss-smoothing extension to state-of-the-art gradient-based adversaries that allows them to attack successfully. Interestingly, our loss-smoothing extension can also (i) be successful against non-stochastic neural networks that defend by altering their loss landscape in different ways, and (ii) strengthen gradient-free adversaries.
Our third and final contribution lies in the field of few-shot learning, where we develop a stochastic NAS method for adapting pre-trained neural networks to previously unseen classes, by observing only a few training examples of each new class. We determine that the adaptation of a pre-trained backbone is not as simple as adapting all of its parameters. In fact, adapting or fine-tuning the entire architecture is sub-optimal, as a lot of layers already encode knowledge optimally. Our NAS algorithm searches for the optimal subset of pre-trained parameters to be adapted or fine-tuned, which yields a significant improvement over the existing paradigm for few-shot adaptation
The ins and outs of open-angle Glaucoma:drugs, diet, and defecation
Glaucoma is the leading cause of irreversible blindness and second leading cause of blindness. The primary aim of this thesis is to provide insight into the role of systemic effectsin the pathophysiology of OAG.<br/
Development of variable and robust brain wiring patterns in the fly visual system
Precise generation of synapse-specific neuronal connections are crucial for establishing a robust and functional brain. Neuronal wiring patterns emerge from proper spatiotemporal regulation of axon branching and synapse formation during development. Several neuropsychiatric and neurodevelopmental disorders exhibit defects in neuronal wiring owing to synapse loss and/or dys-regulated axon branching. Despite decades of research, how the two inter-dependent cellular processes: axon branching and synaptogenesis are coupled locally in the presynaptic arborizations is still unclear.
In my doctoral work, I investigated the possible role of EGF receptor (EGFR) activity in coregulating axon branching and synapse formation in a spatiotemporally restricted fashion, locally in the medulla innervating Dorsal Cluster Neuron (M- DCN)/LC14 axon terminals. In this work I have explored how genetically encoded EGFR randomly recycles in the axon branch terminals, thus creating an asymmetric, non-deterministic distribution pattern. Asymmetric EGFR activity in the branches acts as a permissive signal for axon branch pruning. I observed that the M-DCN branches which stochastically becomes EGFR ‘+’ during development are synaptogenic, which means they can recruit synaptic machineries like Syd1 and Bruchpilot (Brp). My work showed that EGFR activity has a dual role in establishing proper M-DCN wiring; first in regulating primary branch consolidation possibly via actin regulation prior to synaptogenesis. Later in maintaining/protecting the levels of late Active Zone (AZ) protein Brp in the presynaptic branches by suppressing basal autophagy level during synaptogenesis. When M-DCNs lack optimal EGFR activity, the basal autophagy level increases resulting in loss of Brp marked synapses which is causal to increased exploratory branches and post-synaptic target loss. Lack of EGFR activity affects the M-DCN wiring pattern that makes adult flies more active and behave like obsessive compulsive in object fixation assay. In the second part of my doctoral work, I have asked how non-genetic factors like developmental temperature affects adult brain wiring. To test that, I increased or decreased rearing temperature which is known to inversely affect pupal developmental rate. We asked if all the noisy cellular processes of neuronal assembly: filopodial dynamics, axon branching, synapse formation and postsynaptic connections scale up or down accordingly. I observed that indeed all the cellular processes slow down at lower developmental temperature and vice versa, which changes the DCN wiring pattern accordingly. Interestingly, behavior of flies adapts to their developmental temperature, performing best at the temperature they have been raised at. This shows that optimal brain function is an adaptation of robust brain wiring patterns which are specified by noisy developmental processes.
In conclusion, my doctoral work helps us better understand the developmental regulation of axon branching and synapse formation for establishing precise brain wiring pattern. We need all the cell intrinsic developmental processes to be highly regulated in space and time. It is infact a combinatorial effect of such stochastic processes and external factors that contribute to the final outcome, a functional and robust adult brain
Game-theoretic statistics and safe anytime-valid inference
Safe anytime-valid inference (SAVI) provides measures of statistical evidence
and certainty -- e-processes for testing and confidence sequences for
estimation -- that remain valid at all stopping times, accommodating continuous
monitoring and analysis of accumulating data and optional stopping or
continuation for any reason. These measures crucially rely on test martingales,
which are nonnegative martingales starting at one. Since a test martingale is
the wealth process of a player in a betting game, SAVI centrally employs
game-theoretic intuition, language and mathematics. We summarize the SAVI goals
and philosophy, and report recent advances in testing composite hypotheses and
estimating functionals in nonparametric settings.Comment: 25 pages. Under review. ArXiv does not compile/space some references
properl
Generalizing Backpropagation for Gradient-Based Interpretability
Many popular feature-attribution methods for interpreting deep neural
networks rely on computing the gradients of a model's output with respect to
its inputs. While these methods can indicate which input features may be
important for the model's prediction, they reveal little about the inner
workings of the model itself. In this paper, we observe that the gradient
computation of a model is a special case of a more general formulation using
semirings. This observation allows us to generalize the backpropagation
algorithm to efficiently compute other interpretable statistics about the
gradient graph of a neural network, such as the highest-weighted path and
entropy. We implement this generalized algorithm, evaluate it on synthetic
datasets to better understand the statistics it computes, and apply it to study
BERT's behavior on the subject-verb number agreement task (SVA). With this
method, we (a) validate that the amount of gradient flow through a component of
a model reflects its importance to a prediction and (b) for SVA, identify which
pathways of the self-attention mechanism are most important.Comment: Long paper accepted at ACL 202
A Comparative Analysis of Groundwater Vulnerability and PFAS Contamination in Maine
As more information is learned regarding the long-term health effects of per- and polyfluoroalkyl substances (PFAS), increasing regulatory measures are being taken to protect the public from these chemicals. States like Maine are on the forefront of such legislation, banning land-applied biosolids in 2022 for fear of PFAS contamination, with plans to halt sales of all unnecessary PFAS products in the state by 2030. The state has conducted some testing of groundwater supplies, but the near-ubiquitous nature of PFAS in manufacturing indicates contamination may be widespread. To prioritize testing in Maine’s most vulnerable aquifers, a groundwater susceptibility map has been developed using a modified form of the EPA’s DRASTIC model. The model uses geological, atmospheric, and land use data to estimate the relative vulnerability of groundwater across the state. Additionally, a heatmap of potential PFAS sources was created, where each site was assigned a risk score based on the upper magnitude of PFAS contamination associated with its industry. These maps were compared with state PFAS test results to determine the validity of each method. Regional vulnerability trends were found which indicate karst features, coarse glacial/fluvial deposits, volcanic geology, and urban development are signs of high groundwater vulnerability. Density of potential PFAS sources was also found to be highest around urban centers, with PFAS test data affirming the relationship. Recommendations are made for best management practices guided by the models, such as protection of the most vulnerable aquifers via rezoning and building factories on impermeable geologies. Future model development is encouraged with more robust datasets and additional fine-tuning of the statewide Depth to Groundwater, Depth to Bedrock, and Hydraulic Conductivity maps
Less is More: Restricted Representations for Better Interpretability and Generalizability
Deep neural networks are prevalent in supervised learning for large amounts of tasks such as image classification, machine translation and even scientific discovery.
Their success is often at the sacrifice of interpretability and generalizability. The increasing complexity of models and involvement of the pre-training process make the inexplicability more imminent. The outstanding performance when labeled data are abundant while prone to overfit when labeled data are limited demonstrates the difficulty of deep neural networks' generalizability to different datasets.
This thesis aims to improve interpretability and generalizability by restricting representations. We choose to approach interpretability by focusing on attribution analysis to understand which features contribute to prediction on BERT, and to approach generalizability by focusing on effective methods in a low-data regime.
We consider two strategies of restricting representations: (1) adding bottleneck, and (2) introducing compression. Given input x, suppose we want to learn y with the latent representation z (i.e. x→z→y), adding bottleneck means adding function R such that L(R(z)) < L(z) and introducing compression means adding function R so that L(R(y)) < L(y) where L refers to the number of bits. In other words, the restriction is added either in the middle of the pipeline or at the end of it.
We first introduce how adding information bottleneck can help attribution analysis and apply it to investigate BERT's behavior on text classification in Chapter 3.
We then extend this attribution method to analyze passage reranking in Chapter 4, where we conduct a detailed analysis to understand cross-layer and cross-passage behavior.
Adding bottleneck can not only provide insight to understand deep neural networks but can also be used to increase generalizability.
In Chapter 5, we demonstrate the equivalence between adding bottleneck and doing neural compression. We then leverage this finding with a framework called Non-Parametric learning by Compression with Latent Variables (NPC-LV), and show how optimizing neural compressors can be used in the non-parametric image classification with few labeled data.
To further investigate how compression alone helps non-parametric learning without latent variables (NPC), we carry out experiments with a universal compressor gzip on text classification in Chapter 6.
In Chapter 7, we elucidate methods of adopting the perspective of doing compression but without the actual process of compression using T5.
Using experimental results in passage reranking, we show that our method is highly effective in a low-data regime when only one thousand query-passage pairs are available.
In addition to the weakly supervised scenario, we also extend our method to large language models like GPT under almost no supervision --- in one-shot and zero-shot settings. The experiments show that without extra parameters or in-context learning, GPT can be used for semantic similarity, text classification, and text ranking and outperform strong baselines, which is presented in Chapter 8.
The thesis proposes to tackle two big challenges in machine learning --- "interpretability" and "generalizability" through restricting representation. We provide both theoretical derivation and empirical results to show the effectiveness of using information-theoretic approaches. We not only design new algorithms but also provide numerous insights on why and how "compression" is so important in understanding deep neural networks and improving generalizability
- …