Search CORE

28 research outputs found

Initialization of ReLUs for Dynamical Isometry

Author: Burkholz Rebekka
Dubatovka Alina
Publication venue
Publication date: 24/10/2019
Field of study

Deep learning relies on good initialization schemes and hyperparameter choices prior to training a neural network. Random weight initializations induce random network ensembles, which give rise to the trainability, training speed, and sometimes also generalization ability of an instance. In addition, such ensembles provide theoretical insights into the space of candidate models of which one is selected during training. The results obtained so far rely on mean field approximations that assume infinite layer width and that study average squared signals. We derive the joint signal output distribution exactly, without mean field assumptions, for fully-connected networks with Gaussian weights and biases, and analyze deviations from the mean field results. For rectified linear units, we further discuss limitations of the standard initialization scheme, such as its lack of dynamical isometry, and propose a simple alternative that overcomes these by initial parameter sharing.Comment: NeurIPS 201

arXiv.org e-Print Archive

Repository for Publications and Research Data

International crop trade networks: The impact of shocks and cascades

Author: Burkholz Rebekka
Schweitzer Frank
Publication venue: 'IOP Publishing'
Publication date: 01/01/2019
Field of study

Analyzing available FAO data from 176 countries over 21 years, we observe an increase of complexity in the international trade of maize, rice, soy, and wheat. A larger number of countries play a role as producers or intermediaries, either for trade or food processing. In consequence, we find that the trade networks become more prone to failure cascades caused by exogenous shocks. In our model, countries compensate for demand deficits by imposing export restrictions. To capture these, we construct higher-order trade dependency networks for the different crops and years. These networks reveal hidden dependencies between countries and allow to discuss policy implications

arXiv.org e-Print Archive

Repository for Publications and Research Data

Directory of Open Access Journals

Convolutional and Residual Networks Provably Contain Lottery Tickets

Author: Burkholz Rebekka
Publication venue: PMLR
Publication date: 17/07/2022
Field of study

The Lottery Ticket Hypothesis continues to have a profound practical impact on the quest for small scale deep neural networks that solve modern deep learning tasks at competitive performance. These lottery tickets are identified by pruning large randomly initialized neural networks with architectures that are as diverse as their applications. Yet, theoretical insights that attest their existence have been mostly focused on deep fully-connected feed forward networks with ReLU activation functions. We prove that also modern architectures consisting of convolutional and residual layers that can be equipped with almost arbitrary activation functions can contain lottery tickets with high probability

CISPA – Helmholtz-Zentrum für Informationssicherheit

Modeling the formation of R\&D alliances: An agent-based model with empirical validation

Author: Burkholz Rebekka
Schweitzer Frank
Tomasello Mario V.
Publication venue
Publication date: 01/12/2017
Field of study

We develop an agent-based model to reproduce the size distribution of R\&D alliances of firms. Agents are uniformly selected to initiate an alliance and to invite collaboration partners. These decide about acceptance based on an individual threshold that is compared with the utility expected from joining the current alliance. The benefit of alliances results from the fitness of the agents involved. Fitness is obtained from an empirical distribution of agent's activities. The cost of an alliance reflects its coordination effort. Two free parameters

a_{c}

and

a_{l}

scale the costs and the individual threshold. If initiators receive

R

rejections of invitations, the alliance formation stops and another initiator is selected. The three free parameters

(a_{c},a_{l},R)

are calibrated against a large scale data set of about 15,000 firms engaging in about 15,000 R\&D alliances over 26 years. For the validation of the model we compare the empirical size distribution with the theoretical one, using confidence bands, to find a very good agreement. As an asset of our agent-based model, we provide an analytical solution that allows to reduce the simulation effort considerably. The analytical solution applies to general forms of the utility of alliances. Hence, the model can be extended to other cases of alliance formation. While no information about the initiators of an alliance is available, our results indicate that mostly firms with high fitness are able to attract newcomers and to establish larger alliances

arXiv.org e-Print Archive

Repository for Publications and Research Data

Directory of Open Access Journals

Most Activation Functions Can Win the Lottery Without Excessive Depth

Author: Burkholz Rebekka
Publication venue
Publication date: 01/12/2022
Field of study

The strong lottery ticket hypothesis has highlighted the potential for training deep neural networks by pruning, which has inspired interesting practical and theoretical insights into how neural networks can represent functions. For networks with ReLU activation functions, it has been proven that a target network with depth L can be approximated by the subnetwork of a randomly initialized neural network that has double the target’s depth 2L and is wider by a logarithmic factor. We show that a depth L + 1 network is sufficient. This result indicates that we can expect to find lottery tickets at realistic, commonly used depths while only requiring logarithmic overparametrization. Our novel construction approach applies to a large class of activation functions and is not limited to ReLUs

CISPA – Helmholtz-Zentrum für Informationssicherheit

Cascade Size Distributions: Why They Matter and How to Compute Them Efficiently

Author: Burkholz Rebekka
Quackenbush John
Publication venue
Publication date: 16/12/2020
Field of study

Cascade models are central to understanding, predicting, and controlling epidemic spreading and information propagation. Related optimization, including influence maximization, model parameter inference, or the development of vaccination strategies, relies heavily on sampling from a model. This is either inefficient or inaccurate. As alternative, we present an efficient message passing algorithm that computes the probability distribution of the cascade size for the Independent Cascade Model on weighted directed networks and generalizations. Our approach is exact on trees but can be applied to any network topology. It approximates locally tree-like networks well, scales to large networks, and can lead to surprisingly good performance on more dense networks, as we also exemplify on real world data.Comment: Accepted at AAAI 202

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

Are GATs Out of Balance?

Author: Bojchevski Aleksandar
Burkholz Rebekka
Mustafa Nimrah
Publication venue
Publication date: 25/10/2023
Field of study

While the expressive power and computational capabilities of graph neural networks (GNNs) have been theoretically studied, their optimization and learning dynamics, in general, remain largely unexplored. Our study undertakes the Graph Attention Network (GAT), a popular GNN architecture in which a node's neighborhood aggregation is weighted by parameterized attention coefficients. We derive a conservation law of GAT gradient flow dynamics, which explains why a high portion of parameters in GATs with standard initialization struggle to change during training. This effect is amplified in deeper GATs, which perform significantly worse than their shallow counterparts. To alleviate this problem, we devise an initialization scheme that balances the GAT network. Our approach i) allows more effective propagation of gradients and in turn enables trainability of deeper networks, and ii) attains a considerable speedup in training and convergence time in comparison to the standard initialization. Our main theorem serves as a stepping stone to studying the learning dynamics of positive homogeneous models with attention mechanisms.Comment: 25 pages. To be published in Advances in Neural Information Processing Systems (NeurIPS), 202

arXiv.org e-Print Archive