Search CORE

4 research outputs found

Principled Weight Initialisation for Input-Convex Neural Networks

Author: Hoedt Pieter-Jan
Klambauer Günter
Publication venue
Publication date: 19/12/2023
Field of study

Input-Convex Neural Networks (ICNNs) are networks that guarantee convexity in their input-output mapping. These networks have been successfully applied for energy-based modelling, optimal transport problems and learning invariances. The convexity of ICNNs is achieved by using non-decreasing convex activation functions and non-negative weights. Because of these peculiarities, previous initialisation strategies, which implicitly assume centred weights, are not effective for ICNNs. By studying signal propagation through layers with non-negative weights, we are able to derive a principled weight initialisation for ICNNs. Concretely, we generalise signal propagation theory by removing the assumption that weights are sampled from a centred distribution. In a set of experiments, we demonstrate that our principled initialisation effectively accelerates learning in ICNNs and leads to better generalisation. Moreover, we find that, in contrast to common belief, ICNNs can be trained without skip-connections when initialised correctly. Finally, we apply ICNNs to a real-world drug discovery task and show that they allow for more effective molecular latent space exploration.Comment: Presented at NeurIPS 202

arXiv.org e-Print Archive

Moment Dynamics in Self-Normalising Neural Networks

Author: Hoedt Pieter-Jan
Publication venue
Publication date
Field of study

In the passed decade, deep learning has achieved state-of-the-art performance for various machine learning tasks. The wide applicability of deep learning mainly arises from the ability of deep neural networks to learn useful features by themselves. By combining multiple layers in a neural networks, hierarchical representations can be created with an increasing level of abstraction. However, there is one fundamental difficulty in learning deep neural networks, which is known as the vanishing gradient problem. Today, this issue has been alleviated by new activation functions and better ways to initialise weights. Although less severe, also internal covariate shift slows down learning in deep networks. Several techniques such as batch normalisation have been proposed to counter this issue by normalising the data in each layer. An alternative approach is to construct the layers in a network so that their activations have the same mean and variance as the data in the input. ^Networks which are capable of doing so are called self-normalising neural networks (SNNs) and can be constructed by enforcing certain characteristics onto the mappings that each layer induces on the moments of its input data. In this thesis, we extend the idea of analysing the moments in SNNs to the backward pass, in which we both theoretically and empirically investigate the backward dynamics. Further, we extend SNNs to networks with bias units and show that similar dynamics hold as for the weights. Additionally, we compare the performance and learning behaviour of SNNs with respect to different data normalisation techniques, weight distributions for initialisation and optimisers on several machine learning benchmarking data sets. ^We find that (1) the variance of the weights steadily increases with very small steps in networks with random errors, (2) the architecture affects how the error signal propagates back through the network and (3) that an error signal with reduced variance in lower layers can be advantageous for learning. Furthermore, our analysis reveals that SNNs perform best with whitened data, the choice of the initial weight distribution has no significant effect on the learning behaviour and that most adaptive learning rate schedules do help, although they break the conditions for self-normalisation.submitted by Pieter-Jan HoedtUniversität Linz, Masterarbeit, 2017(VLID)234494

JKU | ePub

HyperPCM: Robust Task-Conditioned Modeling of Drug–Target Interactions

Author: Emma Svensson (216369)
Günter Klambauer (5690318)
Pieter-Jan Hoedt (17750363)
Sepp Hochreiter (122596)
Publication venue
Publication date: 08/01/2024
Field of study

A central problem in drug discovery is to identify the interactions between drug-like compounds and protein targets. Over the past few decades, various quantitative structure–activity relationship (QSAR) and proteo-chemometric (PCM) approaches have been developed to model and predict these interactions. While QSAR approaches solely utilize representations of the drug compound, PCM methods incorporate both representations of the protein target and the drug compound, enabling them to achieve above-chance predictive accuracy on previously unseen protein targets. Both QSAR and PCM approaches have recently been improved by machine learning and deep neural networks, that allow the development of drug–target interaction prediction models from measurement data. However, deep neural networks typically require large amounts of training data and cannot robustly adapt to new tasks, such as predicting interaction for unseen protein targets at inference time. In this work, we propose to use HyperNetworks to efficiently transfer information between tasks during inference and thus to accurately predict drug–target interactions on unseen protein targets. Our HyperPCM method reaches state-of-the-art performance compared to previous methods on multiple well-known benchmarks, including Davis, DUD-E, and a ChEMBL derived data set, and particularly excels at zero-shot inference involving unseen protein targets. Our method, as well as reproducible data preparation, is available at https://github.com/ml-jku/hyper-dti

FigShare