91,467 research outputs found

    On Neural Networks with Minimal Weights

    Get PDF
    Linear threshold elements are the basic building blocks of artificial neural networks. A linear threshold element computes a function that is a sign of a weighted sum of the input variables. The weights are arbitrary integers: actually, they can be very big integers- exponential in the number of the input variables. However, in practice, it is difficult to implement big weights. In the present literature a distinction is made between the two extreme cases: linear threshold functions with polynomial-size weights as opposed to those with exponential-size weights. The main contribution of this paper is to fill up the gap by further refining that separation. Namely, we prove that the class of linear threshold functions with polynomial-size weights can be divided into subclasses according to the degree of the polynomial. In fact we prove a more general result-that there exists a minimal weight linear threshold function for any arbitrary number of inputs and any weight size. To prove those results we have developed a novel technique for constructing linear threshold functions with minimal weights

    Approximation results for Gradient Descent trained Shallow Neural Networks in 1d1d

    Full text link
    Two aspects of neural networks that have been extensively studied in the recent literature are their function approximation properties and their training by gradient descent methods. The approximation problem seeks accurate approximations with a minimal number of weights. In most of the current literature these weights are fully or partially hand-crafted, showing the capabilities of neural networks but not necessarily their practical performance. In contrast, optimization theory for neural networks heavily relies on an abundance of weights in over-parametrized regimes. This paper balances these two demands and provides an approximation result for shallow networks in 1d1d with non-convex weight optimization by gradient descent. We consider finite width networks and infinite sample limits, which is the typical setup in approximation theory. Technically, this problem is not over-parametrized, however, some form of redundancy reappears as a loss in approximation rate compared to best possible rates

    SPFQ: A Stochastic Algorithm and Its Error Analysis for Neural Network Quantization

    Full text link
    Quantization is a widely used compression method that effectively reduces redundancies in over-parameterized neural networks. However, existing quantization techniques for deep neural networks often lack a comprehensive error analysis due to the presence of non-convex loss functions and nonlinear activations. In this paper, we propose a fast stochastic algorithm for quantizing the weights of fully trained neural networks. Our approach leverages a greedy path-following mechanism in combination with a stochastic quantizer. Its computational complexity scales only linearly with the number of weights in the network, thereby enabling the efficient quantization of large networks. Importantly, we establish, for the first time, full-network error bounds, under an infinite alphabet condition and minimal assumptions on the weights and input data. As an application of this result, we prove that when quantizing a multi-layer network having Gaussian weights, the relative square quantization error exhibits a linear decay as the degree of over-parametrization increases. Furthermore, we demonstrate that it is possible to achieve error bounds equivalent to those obtained in the infinite alphabet case, using on the order of a mere loglogN\log\log N bits per weight, where NN represents the largest number of neurons in a layer

    Optimal approximation of piecewise smooth functions using deep ReLU neural networks

    Full text link
    We study the necessary and sufficient complexity of ReLU neural networks---in terms of depth and number of weights---which is required for approximating classifier functions in L2L^2. As a model class, we consider the set Eβ(Rd)\mathcal{E}^\beta (\mathbb R^d) of possibly discontinuous piecewise CβC^\beta functions f:[1/2,1/2]dRf : [-1/2, 1/2]^d \to \mathbb R, where the different smooth regions of ff are separated by CβC^\beta hypersurfaces. For dimension d2d \geq 2, regularity β>0\beta > 0, and accuracy ε>0\varepsilon > 0, we construct artificial neural networks with ReLU activation function that approximate functions from Eβ(Rd)\mathcal{E}^\beta(\mathbb R^d) up to L2L^2 error of ε\varepsilon. The constructed networks have a fixed number of layers, depending only on dd and β\beta, and they have O(ε2(d1)/β)O(\varepsilon^{-2(d-1)/\beta}) many nonzero weights, which we prove to be optimal. In addition to the optimality in terms of the number of weights, we show that in order to achieve the optimal approximation rate, one needs ReLU networks of a certain depth. Precisely, for piecewise Cβ(Rd)C^\beta(\mathbb R^d) functions, this minimal depth is given---up to a multiplicative constant---by β/d\beta/d. Up to a log factor, our constructed networks match this bound. This partly explains the benefits of depth for ReLU networks by showing that deep networks are necessary to achieve efficient approximation of (piecewise) smooth functions. Finally, we analyze approximation in high-dimensional spaces where the function ff to be approximated can be factorized into a smooth dimension reducing feature map τ\tau and classifier function gg---defined on a low-dimensional feature space---as f=gτf = g \circ \tau. We show that in this case the approximation rate depends only on the dimension of the feature space and not the input dimension.Comment: Generalized some estimates to LpL^p norms for $0<p<\infty

    Dissecting the Biological Motherboard (Systems Biology and Beyond)

    Get PDF
    Genome-scale molecular networks, including gene pathways, gene regulatory networks and protein interactions, are central to the investigation of the nascent disciplines of systems biology and bio-complexity. Dissecting these genome-scale molecular networks in its all-possible manifestations is paramount in our quest for a genotype-input phenotype-output application which will also take environment-genome interactions into account.&#xd;&#xa;&#xd;&#xa;Machine learning approaches are now increasingly being used for reverse engineering such networks. Our work stresses the importance of a system approach in biological research and how artificial neural networks are at the forefront of Artificial Intelligence techniques that are increasingly being used to construct as well as dissect molecular networks, the building blocks of the living system.&#xd;&#xa;&#xd;&#xa;Our paper will show the application of artificial neural networks to reverse engineer a temporal gene pathway &#xd;&#xa;In this paper we will also explore the pruning of nodes of these artificial neural networks to simulate gene silencing and thus generate novel biological insight into these molecular networks (The Biological Motherboard).&#xd;&#xa;&#xd;&#xa;The research described is novel, in that this may be the first time that the application of neural networks to temporal gene expression data is described. It will be shown that a trained artificial neural network, with pruning, can also be described as a gene network with minimal re-interpretation, where the weights on links between nodes reflect the probability of one gene affecting another gene in time

    Energy Efficient Hardware Acceleration of Neural Networks with Power-of-Two Quantisation

    Full text link
    Deep neural networks virtually dominate the domain of most modern vision systems, providing high performance at a cost of increased computational complexity.Since for those systems it is often required to operate both in real-time and with minimal energy consumption (e.g., for wearable devices or autonomous vehicles, edge Internet of Things (IoT), sensor networks), various network optimisation techniques are used, e.g., quantisation, pruning, or dedicated lightweight architectures. Due to the logarithmic distribution of weights in neural network layers, a method providing high performance with significant reduction in computational precision (for 4-bit weights and less) is the Power-of-Two (PoT) quantisation (and therefore also with a logarithmic distribution). This method introduces additional possibilities of replacing the typical for neural networks Multiply and ACcumulate (MAC -- performing, e.g., convolution operations) units, with more energy-efficient Bitshift and ACcumulate (BAC). In this paper, we show that a hardware neural network accelerator with PoT weights implemented on the Zynq UltraScale + MPSoC ZCU104 SoC FPGA can be at least 1.4x1.4x more energy efficient than the uniform quantisation version. To further reduce the actual power requirement by omitting part of the computation for zero weights, we also propose a new pruning method adapted to logarithmic quantisation.Comment: Accepted for the ICCVG 2022 conferenc

    Geometry and Expressive Power of Conditional Restricted Boltzmann Machines

    Full text link
    Conditional restricted Boltzmann machines are undirected stochastic neural networks with a layer of input and output units connected bipartitely to a layer of hidden units. These networks define models of conditional probability distributions on the states of the output units given the states of the input units, parametrized by interaction weights and biases. We address the representational power of these models, proving results their ability to represent conditional Markov random fields and conditional distributions with restricted supports, the minimal size of universal approximators, the maximal model approximation errors, and on the dimension of the set of representable conditional distributions. We contribute new tools for investigating conditional probability models, which allow us to improve the results that can be derived from existing work on restricted Boltzmann machine probability models.Comment: 30 pages, 5 figures, 1 algorith