3,817 research outputs found
Exploiting Structural Properties in the Analysis of High-dimensional Dynamical Systems
The physical and cyber domains with which we interact are filled with high-dimensional dynamical systems. In machine learning, for instance, the evolution of overparametrized neural networks can be seen as a dynamical system. In networked systems, numerous agents or nodes dynamically interact with each other. A deep understanding of these systems can enable us to predict their behavior, identify potential pitfalls, and devise effective solutions for optimal outcomes. In this dissertation, we will discuss two classes of high-dimensional dynamical systems with specific structural properties that aid in understanding their dynamic behavior.
In the first scenario, we consider the training dynamics of multi-layer neural networks. The high dimensionality comes from overparametrization: a typical network has a large depth and hidden layer width. We are interested in the following question regarding convergence: Do network weights converge to an equilibrium point corresponding to a global minimum of our training loss, and how fast is the convergence rate? The key to those questions is the symmetry of the weights, a critical property induced by the multi-layer architecture. Such symmetry leads to a set of time-invariant quantities, called weight imbalance, that restrict the training trajectory to a low-dimensional manifold defined by the weight initialization. A tailored convergence analysis is developed over this low-dimensional manifold, showing improved rate bounds for several multi-layer network models studied in the literature, leading to novel characterizations of the effect of weight imbalance on the convergence rate.
In the second scenario, we consider large-scale networked systems with multiple weakly-connected groups. Such a multi-cluster structure leads to a time-scale separation between the fast intra-group interaction due to high intra-group connectivity, and the slow inter-group oscillation, due to the weak inter-group connection. We develop a novel frequency-domain network coherence analysis that captures both the coherent behavior within each group, and the dynamical interaction between groups, leading to a structure-preserving model-reduction methodology for large-scale dynamic networks with multiple clusters under general node dynamics assumptions
Opportunities and risks of stochastic deep learning
This thesis studies opportunities and risks associated with stochasticity in deep learning that specifically manifest in the context of adversarial robustness and neural architecture search (NAS). On the one hand, opportunities arise because stochastic methods have a strong impact on robustness and generalisation, both from a theoretical and an empirical standpoint. In addition, they provide a framework for navigating non-differentiable search spaces, and for expressing data and model uncertainty. On the other hand, trade-offs (i.e., risks) that are coupled with these benefits need to be carefully considered. The three novel contributions that comprise the main body of this thesis are, by these standards, instances of opportunities and risks.
In the context of adversarial robustness, our first contribution proves that the impact of an adversarial input perturbation on the output of a stochastic neural network (SNN) is theoretically bounded. Specifically, we demonstrate that SNNs are maximally robust when they achieve weight-covariance alignment, i.e., when the vectors of their classifier layer are aligned with the eigenvectors of that layer's covariance matrix. Based on our theoretical insights, we develop a novel SNN architecture with excellent empirical adversarial robustness and show that our theoretical guarantees also hold experimentally.
Furthermore, we discover that SNNs partially owe their robustness to having a noisy loss landscape. Gradient-based adversaries find this landscape difficult to ascend during adversarial perturbation search, and therefore fail to create strong adversarial examples. We show that inducing a noisy loss landscape is not an effective defence mechanism, as it is easy to circumvent. To demonstrate that point, we develop a stochastic loss-smoothing extension to state-of-the-art gradient-based adversaries that allows them to attack successfully. Interestingly, our loss-smoothing extension can also (i) be successful against non-stochastic neural networks that defend by altering their loss landscape in different ways, and (ii) strengthen gradient-free adversaries.
Our third and final contribution lies in the field of few-shot learning, where we develop a stochastic NAS method for adapting pre-trained neural networks to previously unseen classes, by observing only a few training examples of each new class. We determine that the adaptation of a pre-trained backbone is not as simple as adapting all of its parameters. In fact, adapting or fine-tuning the entire architecture is sub-optimal, as a lot of layers already encode knowledge optimally. Our NAS algorithm searches for the optimal subset of pre-trained parameters to be adapted or fine-tuned, which yields a significant improvement over the existing paradigm for few-shot adaptation
On the Generation of Realistic and Robust Counterfactual Explanations for Algorithmic Recourse
This recent widespread deployment of machine learning algorithms presents many new challenges. Machine learning algorithms are usually opaque and can be particularly difficult to interpret. When humans are involved, algorithmic and automated decisions can negatively impact people’s lives. Therefore, end users would like to be insured against potential harm. One popular way to achieve this is to provide end users access to algorithmic recourse, which gives end users negatively affected by algorithmic decisions the opportunity to reverse unfavorable decisions, e.g., from a loan denial to a loan acceptance. In this thesis, we design recourse algorithms to meet various end user needs. First, we propose methods for the generation of realistic recourses. We use generative models to suggest recourses likely to occur under the data distribution. To this end, we shift the recourse action from the input space to the generative model’s latent space, allowing to generate counterfactuals that lie in regions with data support. Second, we observe that small changes applied to the recourses prescribed to end users likely invalidate the suggested recourse after being nosily implemented in practice. Motivated by this observation, we design methods for the generation of robust recourses and for assessing the robustness of recourse algorithms to data deletion requests. Third, the lack of a commonly used code-base for counterfactual explanation and algorithmic recourse algorithms and the vast array of evaluation measures in literature make it difficult to compare the per formance of different algorithms. To solve this problem, we provide an open source benchmarking library that streamlines the evaluation process and can be used for benchmarking, rapidly developing new methods, and setting up new
experiments. In summary, our work contributes to a more reliable interaction of end users and machine learned models by covering fundamental aspects of the recourse process and suggests new solutions towards generating realistic and robust counterfactual explanations for algorithmic recourse
Self-supervised learning for transferable representations
Machine learning has undeniably achieved remarkable advances thanks to large labelled datasets and supervised learning. However, this progress is constrained by the labour-intensive annotation process. It is not feasible to generate extensive labelled datasets for every problem we aim to address. Consequently, there has been a notable shift in recent times toward approaches that solely leverage raw data. Among these, self-supervised learning has emerged as a particularly powerful approach, offering scalability to massive datasets and showcasing considerable potential for effective knowledge transfer. This thesis investigates self-supervised representation learning with a strong focus on computer vision applications. We provide a comprehensive survey of self-supervised methods across various modalities, introducing a taxonomy that categorises them into four distinct families while also highlighting practical considerations for real-world implementation. Our focus thenceforth is on the computer vision modality, where we perform a comprehensive benchmark evaluation of state-of-the-art self supervised models against many diverse downstream transfer tasks. Our findings reveal that self-supervised models often outperform supervised learning across a spectrum of tasks, albeit with correlations weakening as tasks transition beyond classification, particularly for datasets with distribution shifts. Digging deeper, we investigate the influence of data augmentation on the transferability of contrastive learners, uncovering a trade-off between spatial and appearance-based invariances that generalise to real-world transformations. This begins to explain the differing empirical performances achieved by self-supervised learners on different downstream tasks, and it showcases the advantages of specialised representations produced with tailored augmentation. Finally, we introduce a novel self-supervised pre-training algorithm for object detection, aligning pre-training with downstream architecture and objectives, leading to reduced localisation errors and improved label efficiency. In conclusion, this thesis contributes a comprehensive understanding of self-supervised representation learning and its role in enabling effective transfer across computer vision tasks
Recommended from our members
Interpretable Machine Learning Architectures for Efficient Signal Detection with Applications to Gravitational Wave Astronomy
Deep learning has seen rapid evolution in the past decade, accomplishing tasks that were previously unimaginable. At the same time, researchers strive to better understand and interpret the underlying mechanisms of the deep models, which are often justifiably regarded as "black boxes". Overcoming this deficiency will not only serve to suggest better learning architectures and training methods, but also extend deep learning to scenarios where interpretability is key to the application. One such scenario is signal detection and estimation, with gravitational wave detection as a specific example, where classic methods are often preferred for their interpretability. Nonetheless, while classic statistical detection methods such as matched filtering excel in their simplicity and intuitiveness, they can be suboptimal in terms of both accuracy and computational efficiency. Therefore, it is appealing to have methods that achieve ``the best of both worlds'', namely enjoying simultaneously excellent performance and interpretability.
In this thesis, we aim to bridge this gap between modern deep learning and classic statistical detection, by revisiting the signal detection problem from a new perspective. First, to address the perceived distinction in interpretability between classic matched filtering and deep learning, we state the intrinsic connections between the two families of methods, and identify how trainable networks can address the structural limitations of matched filtering. Based on these ideas, we propose two trainable architectures that are constructed based on matched filtering, but with learnable templates and adaptivity to unknown noise distributions, and therefore higher detection accuracy. We next turn our attention toward improving the computational efficiency of detection, where we aim to design architectures that leverage structures within the problem for efficiency gains. By leveraging the statistical structure of class imbalance, we integrate hierarchical detection into trainable networks, and use a novel loss function which explicitly encodes both detection accuracy and efficiency. Furthermore, by leveraging the geometric structure of the signal set, we consider using signal space optimization as an alternative computational primitive for detection, which is intuitively more efficient than covering with a template bank. We theoretical prove the efficiency gain by analyzing Riemannian gradient descent on the signal manifold, which reveals an exponential improvement in efficiency over matched filtering. We also propose a practical trainable architecture for template optimization, which makes use of signal embedding and kernel interpolation.
We demonstrate the performance of all proposed architectures on the task of gravitational wave detection in astrophysics, where matched filtering is the current method of choice. The architectures are also widely applicable to general signal or pattern detection tasks, which we exemplify with the handwritten digit recognition task using the template optimization architecture. Together, we hope the this work useful to scientists and engineers seeking machine learning architectures with high performance and interpretability, and contribute to our understanding of deep learning as a whole
LIPIcs, Volume 251, ITCS 2023, Complete Volume
LIPIcs, Volume 251, ITCS 2023, Complete Volum
A Neural Pre-Conditioning Active Learning Algorithm to Reduce Label Complexity
Deep learning (DL) algorithms rely on massive amounts of labeled data.
Semi-supervised learning (SSL) and active learning (AL) aim to reduce this
label complexity by leveraging unlabeled data or carefully acquiring labels,
respectively. In this work, we primarily focus on designing an AL algorithm but
first argue for a change in how AL algorithms should be evaluated. Although
unlabeled data is readily available in pool-based AL, AL algorithms are usually
evaluated by measuring the increase in supervised learning (SL) performance at
consecutive acquisition steps. Because this measures performance gains from
both newly acquired instances and newly acquired labels, we propose to instead
evaluate the label efficiency of AL algorithms by measuring the increase in SSL
performance at consecutive acquisition steps. After surveying tools that can be
used to this end, we propose our neural pre-conditioning (NPC) algorithm
inspired by a Neural Tangent Kernel (NTK) analysis. Our algorithm incorporates
the classifier's uncertainty on unlabeled data and penalizes redundant samples
within candidate batches to efficiently acquire a diverse set of informative
labels. Furthermore, we prove that NPC improves downstream training in the
large-width regime in a manner previously observed to correlate with
generalization. Comparisons with other AL algorithms show that a
state-of-the-art SSL algorithm coupled with NPC can achieve high performance
using very few labeled data.Comment: NeurIPS 202
Do price trajectory data increase the efficiency of market impact estimation?
Market impact is an important problem faced by large institutional investor
and active market participant. In this paper, we rigorously investigate whether
price trajectory data from the metaorder increases the efficiency of
estimation, from an asymptotic view of statistical estimation. We show that,
for popular market impact models, estimation methods based on partial price
trajectory data, especially those containing early trade prices, can outperform
established estimation methods (e.g., VWAP-based) asymptotically. We discuss
theoretical and empirical implications of such phenomenon, and how they could
be readily incorporated into practice
- …