1,274 research outputs found
Network Parameterisation and Activation Functions in Deep Learning
Deep learning, the study of multi-layered artificial neural networks, has received tremendous attention over the course of the last few years. Neural networks are now able to outperform humans in a growing variety of tasks and increasingly have an impact on our day-to-day lives. There is a wide range of potential directions to advance deep learning, two of which we investigate in this thesis:(1) One of the key components of a network are its activation functions. The activations have a big impact on the overall mathematical form of the network. The \textit{first paper} studies generalisation of neural networks with rectified linear activations units (“ReLUs”). Such networks partition the input space into so-called linear regions, which are the maximally connected subsets on which the network is affine. In contrast to previous work, which focused on obtaining estimates of the number of linear regions, we proposed a tropical algebra-based algorithm called TropEx to extract coefficients of the linear regions. Applied to fully-connected and convolutional neural networks, TropEx shows significant differences between the linear regions of these network types. The \textit{second paper} proposes a parametric rational activation function called ERA, which is learnable during network training. Although ERA only adds about ten parameters per layer, the activation significantly increases network expressivity and makes small architectures have a performance close to large ones. ERA outperforms previous activations when used in small architectures. This is relevant because neural networks keep growing larger and larger and the computational resources they require result in greater costs and electricity usage (which in turn increases the CO2 footprint).(2) For a given network architecture, each parameter configuration gives rise to a mathematical function. This functional realisation is far from unique and many different parameterisations can give rise to the same function. Changes to the parameterisation that do not change the function are called symmetries. The \textit{third paper} theoretically studies and classifies all the symmetries of 2-layer networks using the ReLU activation. Finally, the \textit{fourth paper} studies the effect of network parameterisation on network training. We provide a theoretical analysis of the effect that scaling layers have on the gradient updates. This provides a motivation for us to propose a Cooling method, which automatically scales the network parameters during training. Cooling reduces the reliance of the network on specific tricks, in particular the use of a learning rate schedule
Improving Model-Based Software Synthesis: A Focus on Mathematical Structures
Computer hardware keeps increasing in complexity. Software design needs to keep up with this. The right models and abstractions empower developers to leverage the novelties of modern hardware. This thesis deals primarily with Models of Computation, as a basis for software design, in a family of methods called software synthesis.
We focus on Kahn Process Networks and dataflow applications as abstractions, both for programming and for deriving an efficient execution on heterogeneous multicores. The latter we accomplish by exploring the design space of possible mappings of computation and data to hardware resources. Mapping algorithms are not at the center of this thesis, however. Instead, we examine the mathematical structure of the mapping
space, leveraging its inherent symmetries or geometric properties to improve mapping methods in general.
This thesis thoroughly explores the process of model-based design, aiming to go beyond the more established software synthesis on dataflow applications. We starting with the problem of assessing these methods through benchmarking, and go on to formally examine the general goals of benchmarks. In this context, we also consider the role modern machine learning methods play in benchmarking.
We explore different established semantics, stretching the limits of Kahn Process Networks. We also discuss novel models, like Reactors, which are designed to be a deterministic, adaptive model with time as a first-class citizen. By investigating abstractions and transformations in the Ohua language for implicit dataflow programming, we also focus on programmability.
The focus of the thesis is in the models and methods, but we evaluate them in diverse use-cases, generally centered around Cyber-Physical Systems. These include the 5G telecommunication standard, automotive and signal processing domains. We even go beyond embedded systems and discuss use-cases in GPU programming and microservice-based architectures
BQ-NCO: Bisimulation Quotienting for Efficient Neural Combinatorial Optimization
Despite the success of neural-based combinatorial optimization methods for
end-to-end heuristic learning, out-of-distribution generalization remains a
challenge. In this paper, we present a novel formulation of Combinatorial
Optimization Problems (COPs) as Markov Decision Processes (MDPs) that
effectively leverages common symmetries of COPs to improve out-of-distribution
robustness. Starting from a direct MDP formulation of a constructive method, we
introduce a generic way to reduce the state space, based on Bisimulation
Quotienting (BQ) in MDPs. Then, for COPs with a recursive nature, we specialize
the bisimulation and show how the reduced state exploits the symmetries of
these problems and facilitates MDP solving. Our approach is principled and we
prove that an optimal policy for the proposed BQ-MDP actually solves the
associated COPs. We illustrate our approach on five classical problems: the
Euclidean and Asymmetric Traveling Salesman, Capacitated Vehicle Routing,
Orienteering and Knapsack Problems. Furthermore, for each problem, we introduce
a simple attention-based policy network for the BQ-MDPs, which we train by
imitation of (near) optimal solutions of small instances from a single
distribution. We obtain new state-of-the-art results for the five COPs on both
synthetic and realistic benchmarks. Notably, in contrast to most existing
neural approaches, our learned policies show excellent generalization
performance to much larger instances than seen during training, without any
additional search procedure
Enhancing Graph Neural Networks with Quantum Computed Encodings
Transformers are increasingly employed for graph data, demonstrating
competitive performance in diverse tasks. To incorporate graph information into
these models, it is essential to enhance node and edge features with positional
encodings. In this work, we propose novel families of positional encodings
tailored for graph transformers. These encodings leverage the long-range
correlations inherent in quantum systems, which arise from mapping the topology
of a graph onto interactions between qubits in a quantum computer. Our
inspiration stems from the recent advancements in quantum processing units,
which offer computational capabilities beyond the reach of classical hardware.
We prove that some of these quantum features are theoretically more expressive
for certain graphs than the commonly used relative random walk probabilities.
Empirically, we show that the performance of state-of-the-art models can be
improved on standard benchmarks and large-scale datasets by computing tractable
versions of quantum features. Our findings highlight the potential of
leveraging quantum computing capabilities to potentially enhance the
performance of transformers in handling graph data.Comment: arXiv admin note: text overlap with arXiv:2210.1061
- …