1,460 research outputs found
Recommended from our members
Joint Multivariate Modelling and Prediction for Genetic and Biomedical Data
In the area of statistical genetics, classical genome-wide association studies (GWAS) assess the association between a biological characteristic and genetic variants, working with one variant at a time in a regression model, and reporting the most significant associations. These studies test genetic markers individually, even though the data may exhibit multivariate structure due to the way genes are transmitted together from the parents to the offspring. Despite considering covariates like age and sex in the model, the classical GWAS does not account for the joint effects of genetic variants. Moreover, when multiple genetic variants within a gene have small effects on a phenotype, testing them individually can lack statistical power, but testing them together in a joint model can be more useful in pooling together all the evidence. In this thesis, I reviewed different multivariate testing procedures in joint multivariate model settings, explored their properties, and demonstrated them in further real-life database applications, such as enhancing statistical power by conditioning on major variants.
I studied the mathematical properties of various multivariate test procedures, particularly within the context of multiple linear regression. Considering the theoretical aspect as well as their availability in literature, I adapt various multivariate test procedures for canonical correlation in multiple regression settings. These procedures have been demonstrated to asymptotically follow the chi-square distribution. Importantly, these test procedures exhibit asymptotic equivalence among themselves and with the Wald test statistic. This indicates that the Wald test statistic may be sufficient for future studies, given its equivalence to the multivariate test procedures.
In many cases, there are known databases of major genetic variants that have a substantial effect on the trait. In such situations, it makes sense statistically to condition on these major variants to improve power in detecting associations with new variants, but this is not a common practice in GWAS applications. In this study, we also showed theoretically and computationally how conducting a joint analysis of the genetic variants in a multiple regression model, where the estimated effect of a new variant is conditioned upon some major variants, can improve the performance of the model in terms of reducing the standard error and improving the power. The amount of gain of power will depend on the correlation between the response and the covariates, as well as the correlation
between the covariates. I further show that conditional results can sometimes
be obtained from publicly available summary statistics reported for univariate associations in published GWAS studies, even when the individual-level data are unavailable. A prominent example of such a trait is skin color, for which there are many studies consistently identifying a handful of major genes. I looked into a dataset of over 6,500 mixed-ethnicity Latin Americans to see how the conditioning process can improve the detection power of GWAS studies and identify new genetic variants in such a situation.
In practical applications, the statistical models I worked with for association testing can be carried forward for predictive purposes in new datasets. In this thesis, I have also demonstrated mathematical formulations of prediction errors in different linear models, including simple linear regression models, as well as shrinkage methods like ridge regression and lasso regression. These expressions for prediction errors show the inherent trade-off between bias and variance at both individual data points and across a set of observations. Moreover, these formulations have found the connections between prediction errors and genetic heritability that can enhance prediction performance in genetic association studies. Additionally, I reviewed various statistical and machine learning predictive models. Based on a dental morphology dataset, I compared their performance using classification metrics such as average error rate and maximum classification error rate per specimen
Robust interventions in network epidemiology
Which individual should we vaccinate to minimize the spread of a disease? Designing optimal interventions of this kind can be formalized as an optimization problem on networks, in which we have to select a budgeted number of dynamically important nodes to receive treatment that optimizes a dynamical outcome. Describing this optimization problem requires specifying the network, a model of the dynamics, and an objective for the outcome of the dynamics. In real-world contexts, these inputs are vulnerable to misspecification---the network and dynamics must be inferred from data, and the decision-maker must operationalize some (potentially abstract) goal into a mathematical objective function. Moreover, the tools to make reliable inferences---on the dynamical parameters, in particular---remain limited due to computational problems and issues of identifiability. Given these challenges, models thus remain more useful for building intuition than for designing actual interventions. This thesis seeks to elevate complex dynamical models from intuition-building tools to methods for the practical design of interventions.
First, we circumvent the inference problem by searching for robust decisions that are insensitive to model misspecification.If these robust solutions work well across a broad range of structural and dynamic contexts, the issues associated with accurately specifying the problem inputs are largely moot. We explore the existence of these solutions across three facets of dynamic importance common in network epidemiology.
Second, we introduce a method for analytically calculating the expected outcome of a spreading process under various interventions. Our method is based on message passing, a technique from statistical physics that has received attention in a variety of contexts, from epidemiology to statistical inference.We combine several facets of the message-passing literature for network epidemiology.Our method allows us to test general probabilistic, temporal intervention strategies (such as seeding or vaccination). Furthermore, the method works on arbitrary networks without requiring the network to be locally tree-like .This method has the potential to improve our ability to discriminate between possible intervention outcomes.
Overall, our work builds intuition about the decision landscape of designing interventions in spreading dynamics. This work also suggests a way forward for probing the decision-making landscape of other intervention contexts. More broadly, we provide a framework for exploring the boundaries of designing robust interventions with complex systems modeling tools
Towards Neuromorphic Gradient Descent: Exact Gradients and Low-Variance Online Estimates for Spiking Neural Networks
Spiking Neural Networks (SNNs) are biologically-plausible models that can run on low-powered non-Von Neumann neuromorphic hardware, positioning them as promising alternatives to conventional Deep Neural Networks (DNNs) for energy-efficient edge computing and robotics. Over the past few years, the Gradient Descent (GD) and Error Backpropagation (BP) algorithms used in DNNs have inspired various training methods for SNNs. However, the non-local and the reverse nature of BP, combined with the inherent non-differentiability of spikes, represent fundamental obstacles to computing gradients with SNNs directly on neuromorphic hardware. Therefore, novel approaches are required to overcome the limitations of GD and BP and enable online gradient computation on neuromorphic hardware.
In this thesis, I address the limitations of GD and BP with SNNs by proposing three algorithms. First, I extend a recent method that computes exact gradients with temporally-coded SNNs by relaxing the firing constraint of temporal coding and allowing multiple spikes per neuron. My proposed method generalizes the computation of exact gradients with SNNs and enhances the tradeoffs between performance and various other aspects of spiking neurons. Next, I introduce a novel alternative to BP that computes low-variance gradient estimates in a local and online manner. Compared to other alternatives to BP, the proposed method demonstrates an improved convergence rate and increased performance with DNNs. Finally, I combine these two methods and propose an algorithm that estimates gradients with SNNs in a manner that is compatible with the constraints of neuromorphic hardware. My empirical results demonstrate the effectiveness of the resulting algorithm in training SNNs without performing BP
Backpropagation Beyond the Gradient
Automatic differentiation is a key enabler of deep learning: previously, practitioners were limited to models
for which they could manually compute derivatives. Now, they can create sophisticated models with almost
no restrictions and train them using first-order, i. e. gradient, information. Popular libraries like PyTorch
and TensorFlow compute this gradient efficiently, automatically, and conveniently with a single line of
code. Under the hood, reverse-mode automatic differentiation, or gradient backpropagation, powers the
gradient computation in these libraries. Their entire design centers around gradient backpropagation.
These frameworks are specialized around one specific task—computing the average gradient in a mini-batch.
This specialization often complicates the extraction of other information like higher-order statistical moments
of the gradient, or higher-order derivatives like the Hessian. It limits practitioners and researchers to methods
that rely on the gradient. Arguably, this hampers the field from exploring the potential of higher-order
information and there is evidence that focusing solely on the gradient has not lead to significant recent
advances in deep learning optimization.
To advance algorithmic research and inspire novel ideas, information beyond the batch-averaged gradient
must be made available at the same level of computational efficiency, automation, and convenience.
This thesis presents approaches to simplify experimentation with rich information beyond the gradient
by making it more readily accessible. We present an implementation of these ideas as an extension to the
backpropagation procedure in PyTorch. Using this newly accessible information, we demonstrate possible use
cases by (i) showing how it can inform our understanding of neural network training by building a diagnostic
tool, and (ii) enabling novel methods to efficiently compute and approximate curvature information.
First, we extend gradient backpropagation for sequential feedforward models to Hessian backpropagation
which enables computing approximate per-layer curvature. This perspective unifies recently proposed block-
diagonal curvature approximations. Like gradient backpropagation, the computation of these second-order
derivatives is modular, and therefore simple to automate and extend to new operations.
Based on the insight that rich information beyond the gradient can be computed efficiently and at the
same time, we extend the backpropagation in PyTorch with the BackPACK library. It provides efficient and
convenient access to statistical moments of the gradient and approximate curvature information, often at a
small overhead compared to computing just the gradient.
Next, we showcase the utility of such information to better understand neural network training. We build
the Cockpit library that visualizes what is happening inside the model during training through various
instruments that rely on BackPACK’s statistics. We show how Cockpit provides a meaningful statistical
summary report to the deep learning engineer to identify bugs in their machine learning pipeline, guide
hyperparameter tuning, and study deep learning phenomena.
Finally, we use BackPACK’s extended automatic differentiation functionality to develop ViViT, an approach
to efficiently compute curvature information, in particular curvature noise. It uses the low-rank structure
of the generalized Gauss-Newton approximation to the Hessian and addresses shortcomings in existing
curvature approximations. Through monitoring curvature noise, we demonstrate how ViViT’s information
helps in understanding challenges to make second-order optimization methods work in practice.
This work develops new tools to experiment more easily with higher-order information in complex deep
learning models. These tools have impacted works on Bayesian applications with Laplace approximations,
out-of-distribution generalization, differential privacy, and the design of automatic differentia-
tion systems. They constitute one important step towards developing and establishing more efficient deep
learning algorithms
From a causal representation of multiloop scattering amplitudes to quantum computing in the Loop-Tree Duality
La teoría cúantica de campos con enfoque perturbativo ha logrado de manera exitosa proporcionar predicciones teóricas increíblemente precisas en física de altas energías. A pesar del desarrollo de diversas técnicas con el objetivo de incrementar la eficiencia de estos cálculos, algunos ingredientes continuan siendo un verdadero reto. Este es el caso de las amplitudes de dispersión con lazos múltiples, las cuales describen las fluctuaciones cuánticas en los procesos de dispersión a altas energías.
La Dualidad Lazo-Árbol (LTD) es un método innovador, propuesto con el objetivo de afrontar estas dificultades abriendo las amplitudes de lazo a amplitudes conectadas de tipo árbol. En esta tesis presentamos tres logros fundamentales: la reformulación de la Dualidad Lazo-Árbol a todos los órdenes en la expansión perturbativa, una metodología general para obtener expresiones LTD con un comportamiento manifiestamente causal, y la primera aplicación de un algoritmo cuántico a integrales de lazo de Feynman. El cambio de estrategia propuesto para implementar la metodología LTD, consiste en la aplicación iterada del teorema del residuo de Cauchy a un conjunto de topologías con lazos m\'ultiples y configuraciones internas arbitrarias. La representación LTD que se obtiene, sigue una estructura factorizada en términos de subtopologías más simples, caracterizada por un comportamiento causal bien conocido. Además, a través de un proceso avanzado desarrollamos representaciones duales analíticas explícitamente libres de singularidades no causales. Estas propiedades permiten escribir cualquier amplitud de dispersión, hasta cinco lazos, de forma factorizada con una mejor estabilidad numérica en comparación con otras representaciones, debido a la ausencia de singularidades no causales.
Por último, establecemos la conexión entre las integrales de lazo de Feynman y la computación cuántica, mediante la asociación de los dos estados sobre la capa de masas de un propagador de Feynman con los dos estados de un qubit. Proponemos una modificación del algoritmo cuántico de Grover para encontrar las configuraciones singulares causales de los diagramas de Feynman con lazos múltiples. Estas configuraciones son requeridas para establecer la representación causal de topologías con lazos múltiples.The perturbative approach to Quantum Field Theories has successfully provided incredibly accurate theoretical predictions in high-energy physics. Despite the development of several techniques to boost the efficiency of these calculations, some ingredients remain a hard bottleneck. This is the case of multiloop scattering amplitudes, describing the quantum fluctuations at high-energy scattering processes.
The Loop-Tree Duality (LTD) is a novel method aimed to overcome these difficulties by opening the loop amplitudes into connected tree-level diagrams. In this thesis we present three core achievements: the reformulation of the Loop-Tree Duality to all orders in the perturbative expansion, a general methodology to obtain LTD expressions which are manifestly causal, and the first flagship application of a quantum algorithm to Feynman loop integrals.
The proposed strategy to implement the LTD framework consists in the iterated application of the Cauchy's residue theorem to a series of mutiloop topologies with arbitrary internal configurations. We derive a LTD representation exhibiting a factorized cascade form in terms of simpler subtopologies characterized by a well-known causal behaviour. Moreover, through a clever approach we extract analytic dual representations that are explicitly free of noncausal singularities. These properties enable to open any scattering amplitude of up to five loops in a factorized form, with a better numerical stability than in other representations due to the absence of noncausal singularities. Last but not least, we establish the connection between Feynman loop integrals and quantum computing by encoding the two on-shell states of a Feynman propagator through the two states of a qubit. We propose a modified Grover's quantum algorithm to unfold the causal singular configurations of multiloop Feynman diagrams used to bootstrap the causal LTD representation of multiloop topologies
Factor Graph Neural Networks
In recent years, we have witnessed a surge of Graph Neural Networks (GNNs),
most of which can learn powerful representations in an end-to-end fashion with
great success in many real-world applications. They have resemblance to
Probabilistic Graphical Models (PGMs), but break free from some limitations of
PGMs. By aiming to provide expressive methods for representation learning
instead of computing marginals or most likely configurations, GNNs provide
flexibility in the choice of information flowing rules while maintaining good
performance. Despite their success and inspirations, they lack efficient ways
to represent and learn higher-order relations among variables/nodes. More
expressive higher-order GNNs which operate on k-tuples of nodes need increased
computational resources in order to process higher-order tensors. We propose
Factor Graph Neural Networks (FGNNs) to effectively capture higher-order
relations for inference and learning. To do so, we first derive an efficient
approximate Sum-Product loopy belief propagation inference algorithm for
discrete higher-order PGMs. We then neuralize the novel message passing scheme
into a Factor Graph Neural Network (FGNN) module by allowing richer
representations of the message update rules; this facilitates both efficient
inference and powerful end-to-end learning. We further show that with a
suitable choice of message aggregation operators, our FGNN is also able to
represent Max-Product belief propagation, providing a single family of
architecture that can represent both Max and Sum-Product loopy belief
propagation. Our extensive experimental evaluation on synthetic as well as real
datasets demonstrates the potential of the proposed model.Comment: Accepted by JML
Video Summarization Using Unsupervised Deep Learning
In this thesis, we address the task of video summarization using unsupervised deep-learning architectures. Video summarization aims to generate a short summary by selecting the most informative and important frames (key-frames) or fragments (key-fragments) of the full-length video, and presenting them in temporally-ordered fashion. Our objective is to overcome observed weaknesses of existing video summarization approaches that utilize RNNs for modeling the temporal dependence of frames, related to: i) the small influence of the estimated frame-level importance scores in the created video summary, ii) the insufficiency of RNNs to model long-range frames' dependence, and iii) the small amount of parallelizable operations during the training of RNNs. To address the first weakness, we propose a new unsupervised network architecture, called AC-SUM-GAN, which formulates the selection of important video fragments as a sequence generation task and learns this task by embedding an Actor-Critic model in a Generative Adversarial Network. The feedback of a trainable Discriminator is used as a reward by the Actor-Critic model in order to explore a space of actions and learn a value function (Critic) and a policy (Actor) for video fragment selection. To tackle the remaining weaknesses, we investigate the use of attention mechanisms for video summarization and propose a new supervised network architecture, called PGL-SUM, that combines global and local multi-head attention mechanisms which take into account the temporal position of the video frames, in order to discover different modelings of the frames' dependencies at different levels of granularity. Based on the acquired experience, we then propose a new unsupervised network architecture, called CA-SUM, which estimates the frames' importance using a novel concentrated attention mechanism that focuses on non-overlapping blocks in the main diagonal of the attention matrix and takes into account the attentive uniqueness and diversity of the associated frames of the video. All the proposed architectures have been extensively evaluated on the most commonly-used benchmark datasets, demonstrating their competitiveness against other approaches and documenting the contribution of our proposals on advancing the current state-of-the-art on video summarization. Finally, we make a first attempt on producing explanations for the video summarization results. Inspired by relevant works in the Natural Language Processing domain, we propose an attention-based method for explainable video summarization and we evaluate the performance of various explanation signals using our CA-SUM architecture and two benchmark datasets for video summarization. The experimental results indicate the advanced performance of explanation signals formed using the inherent attention weights, and demonstrate the ability of the proposed method to explain the video summarization results using clues about the focus of the attention mechanism
Reinforcement learning in large state action spaces
Reinforcement learning (RL) is a promising framework for training intelligent agents which learn to optimize long term utility by directly interacting with the environment. Creating RL methods which scale to large state-action spaces is a critical problem towards ensuring real world deployment of RL systems. However, several challenges limit the applicability of RL to large scale settings. These include difficulties with exploration, low sample efficiency, computational intractability, task constraints like decentralization and lack of guarantees about important properties like performance, generalization and robustness in potentially unseen scenarios.
This thesis is motivated towards bridging the aforementioned gap. We propose several principled algorithms and frameworks for studying and addressing the above challenges RL. The proposed methods cover a wide range of RL settings (single and multi-agent systems (MAS) with all the variations in the latter, prediction and control, model-based and model-free methods, value-based and policy-based methods). In this work we propose the first results on several different problems: e.g. tensorization of the Bellman equation which allows exponential sample efficiency gains (Chapter 4), provable suboptimality arising from structural constraints in MAS(Chapter 3), combinatorial generalization results in cooperative MAS(Chapter 5), generalization results on observation shifts(Chapter 7), learning deterministic policies in a probabilistic RL framework(Chapter 6). Our algorithms exhibit provably enhanced performance and sample efficiency along with better scalability. Additionally, we also shed light on generalization aspects of the agents under different frameworks. These properties have been been driven by the use of several advanced tools (e.g. statistical machine learning, state abstraction, variational inference, tensor theory).
In summary, the contributions in this thesis significantly advance progress towards making RL agents ready for large scale, real world applications
A review of technical factors to consider when designing neural networks for semantic segmentation of Earth Observation imagery
Semantic segmentation (classification) of Earth Observation imagery is a
crucial task in remote sensing. This paper presents a comprehensive review of
technical factors to consider when designing neural networks for this purpose.
The review focuses on Convolutional Neural Networks (CNNs), Recurrent Neural
Networks (RNNs), Generative Adversarial Networks (GANs), and transformer
models, discussing prominent design patterns for these ANN families and their
implications for semantic segmentation. Common pre-processing techniques for
ensuring optimal data preparation are also covered. These include methods for
image normalization and chipping, as well as strategies for addressing data
imbalance in training samples, and techniques for overcoming limited data,
including augmentation techniques, transfer learning, and domain adaptation. By
encompassing both the technical aspects of neural network design and the
data-related considerations, this review provides researchers and practitioners
with a comprehensive and up-to-date understanding of the factors involved in
designing effective neural networks for semantic segmentation of Earth
Observation imagery.Comment: 145 pages with 32 figure
Geometric Learning on Graph Structured Data
Graphs provide a ubiquitous and universal data structure that can be applied in many domains such as social networks, biology, chemistry, physics, and computer science. In this thesis we focus on two fundamental paradigms in graph learning: representation learning and similarity learning over graph-structured data. Graph representation learning aims to learn embeddings for nodes by integrating topological and feature information of a graph. Graph similarity learning brings into play with similarity functions that allow to compute similarity between pairs of graphs in a vector space. We address several challenging issues in these two paradigms, designing powerful, yet efficient and theoretical guaranteed machine learning models that can leverage rich topological structural properties of real-world graphs.
This thesis is structured into two parts. In the first part of the thesis, we will present how to develop powerful Graph Neural Networks (GNNs) for graph representation learning from three different perspectives: (1) spatial GNNs, (2) spectral GNNs, and (3) diffusion GNNs. We will discuss the model architecture, representational power, and convergence properties of these GNN models. Specifically, we first study how to develop expressive, yet efficient and simple message-passing aggregation schemes that can go beyond the Weisfeiler-Leman test (1-WL). We propose a generalized message-passing framework by incorporating graph structural properties into an aggregation scheme. Then, we introduce a new local isomorphism hierarchy on neighborhood subgraphs. We further develop a novel neural model, namely GraphSNN, and theoretically prove that this model is more expressive than the 1-WL test. After that, we study how to build an effective and efficient graph convolution model with spectral graph filters. In this study, we propose a spectral GNN model, called DFNets, which incorporates a novel spectral graph filter, namely feedback-looped filters. As a result, this model can provide better localization on neighborhood while achieving fast convergence and linear memory requirements. Finally, we study how to capture the rich topological information of a graph using graph diffusion. We propose a novel GNN architecture with dynamic PageRank, based on a learnable transition matrix. We explore two variants of this GNN architecture: forward-euler solution and invariable feature solution, and theoretically prove that our forward-euler GNN architecture is guaranteed with the convergence to a stationary distribution.
In the second part of this thesis, we will introduce a new optimal transport distance metric on graphs in a regularized learning framework for graph kernels. This optimal transport distance metric can preserve both local and global structures between graphs during the transport, in addition to preserving features and their local variations. Furthermore, we propose two strongly convex regularization terms to theoretically guarantee the convergence and numerical stability in finding an optimal assignment between graphs. One regularization term is used to regularize a Wasserstein distance between graphs in the same ground space. This helps to preserve the local clustering structure on graphs by relaxing the optimal transport problem to be a cluster-to-cluster assignment between locally connected vertices. The other regularization term is used to regularize a Gromov-Wasserstein distance between graphs across different ground spaces based on degree-entropy KL divergence. This helps to improve the matching robustness of an optimal alignment to preserve the global connectivity structure of graphs. We have evaluated our optimal transport-based graph kernel using different benchmark tasks. The experimental results show that our models considerably outperform all the state-of-the-art methods in all benchmark tasks
- …