5 research outputs found
Recommended from our members
When constants are important
In this paper the authors discuss several complexity aspects pertaining to neural networks, commonly known as the curse of dimensionality. The focus will be on: (1) size complexity and depth-size tradeoffs; (2) complexity of learning; and (3) precision and limited interconnectivity. Results have been obtained for each of these problems when dealt with separately, but few things are known as to the links among them. They start by presenting known results and try to establish connections between them. These show that they are facing very difficult problems--exponential growth in either space (i.e. precision and size) and/or time (i.e., learning and depth)--when resorting to neural networks for solving general problems. The paper will present a solution for lowering some constants, by playing on the depth-size tradeoff
Recommended from our members
Implementing size-optimal discrete neural networks require analog circuitry
This paper starts by overviewing results dealing with the approximation capabilities of neural networks, as well as bounds on the size of threshold gate circuits. Based on a constructive solution for Kolmogorov`s superpositions the authors show that implementing Boolean functions can be done using neurons having an identity transfer function. Because in this case the size of the network is minimized, it follows that size-optimal solutions for implementing Boolean functions can be obtained using analog circuitry. Conclusions and several comments on the required precision are ending the paper
Recommended from our members
2D neural hardware versus 3D biological ones
This paper will present important limitations of hardware neural nets as opposed to biological neural nets (i.e. the real ones). The author starts by discussing neural structures and their biological inspirations, while mentioning the simplifications leading to artificial neural nets. Going further, the focus will be on hardware constraints. The author will present recent results for three different alternatives of implementing neural networks: digital, threshold gate, and analog, while the area and the delay will be related to neurons' fan-in and weights' precision. Based on all of these, it will be shown why hardware implementations cannot cope with their biological inspiration with respect to their power of computation: the mapping onto silicon lacking the third dimension of biological nets. This translates into reduced fan-in, and leads to reduced precision. The main conclusion is that one is faced with the following alternatives: (1) try to cope with the limitations imposed by silicon, by speeding up the computation of the elementary silicon neurons; (2) investigate solutions which would allow one to use the third dimension, e.g. using optical interconnections
Recommended from our members
How to build VLSI-efficient neural chips
This paper presents several upper and lower bounds for the number-of-bits required for solving a classification problem, as well as ways in which these bounds can be used to efficiently build neural network chips. The focus will be on complexity aspects pertaining to neural networks: (1) size complexity and depth (size) tradeoffs, and (2) precision of weights and thresholds as well as limited interconnectivity. They show difficult problems-exponential growth in either space (precision and size) and/or time (learning and depth)-when using neural networks for solving general classes of problems (particular cases may enjoy better performances). The bounds for the number-of-bits required for solving a classification problem represent the first step of a general class of constructive algorithms, by showing how the quantization of the input space could be done in O (m{sup 2}n) steps. Here m is the number of examples, while n is the number of dimensions. The second step of the algorithm finds its roots in the implementation of a class of Boolean functions using threshold gates. It is substantiated by mathematical proofs for the size O (mn/{Delta}), and the depth O [log(mn)/log{Delta}] of the resulting network (here {Delta} is the maximum fan in). Using the fan in as a parameter, a full class of solutions can be designed. The third step of the algorithm represents a reduction of the size and an increase of its generalization capabilities. Extensions by using analogue COMPARISONs, allows for real inputs, and increase the generalization capabilities at the expense of longer training times. Finally, several solutions which can lower the size of the resulting neural network are detailed. The interesting aspect is that they are obtained for limited, or even constant, fan-ins. In support of these claims many simulations have been performed and are called upon
An Application of Kolmogorov's Superposition Theorem to Function Reconstruction in Higher Dimensions
In this thesis we present a Regularization Network approach to reconstruct a continuous function ƒ:[0,1]n→R from its function values ƒ(xj) on discrete data points xj, j=1,…,P. The ansatz is based on a new constructive version of Kolmogorov's superposition theorem. Typically, the numerical solution of mathematical problems underlies the so--called curse of dimensionality. This term describes the exponential dependency of the involved numerical costs on the dimensionality n. To circumvent the curse at least to some extend, typically higher regularity assumptions on the function ƒ are made which however are unrealistic in most cases. Therefore, we employ a representation of the function as superposition of one--dimensional functions which does not require higher smoothness assumptions on ƒ than continuity. To this end, a constructive version of Kolmogorov's superposition theorem which is based on D. Sprecher is adapted in such a manner that one single outer function Φ and a universal inner function ψ suffice to represent the function ƒ. Here, ψ is the extension of a function which was defined by M. Köppen on a dense subset of the real line. The proofs of existence, continuity, and monotonicity are presented in this thesis. To compute the outer function Φ, we adapt a constructive algorithm by Sprecher such that in each iteration step, depending on ƒ, an element of a sequence of univariate functions { Φr}r is computed. It will be shown that this sequence converges to a continuous limit Φ:R→R. This constructively proves Kolmogorov's superposition theorem with a single outer and inner function. Due to the fact that the numerical complexity to compute the outer function Φ by the algorithm grows exponentially with the dimensionality, we alternatively present a Regularization Network approach which is based on this representation. Here, the outer function is computed from discrete function samples (xj,ƒ(xj)), j=1,…,P. The model to reconstruct ƒ will be introduced in two steps. First, the outer function Φ is represented in a finite basis with unknown coefficients which are then determined by a variational formulation, i.e. by the minimization of a regularized empirical error functional. A detailed numerical analysis of this model shows that the dimensionality of ƒ is transformed by Kolmogorov's representation into oscillations of Φ. Thus, the use of locally supported basis functions leads to an exponential growth of the complexity since the spatial mesh resolution has to resolve the strong oscillations. Furthermore, a numerical analysis of the Fourier transform of Φ shows that the locations of the relevant frequencies in Fourier space can be determined a priori and are independent of ƒ. It also reveals a product structure of the outer function and directly motivates the definition of the final model. Therefore, Φ is replaced in the second step by a product of functions for which each factor is expanded in a Fourier basis with appropriate frequency numbers. Again, the coefficients in the expansions are determined by the minimization of a regularized empirical error functional. For both models, the underlying approximation spaces are developed by means of reproducing kernel Hilbert spaces and the corresponding norms are the respective regularization terms in the empirical error functionals. Thus, both approaches can be interpreted as Regularization Networks. However, it is important to note that the error functional for the second model is not convex and that nonlinear minimizers have to be used for the computation of the model parameters. A detailed numerical analysis of the product model shows that it is capable of reconstructing functions which depend on up to ten variables.Eine Anwendung von Kolmogorovs Superpositionen Theorem zur Funktionsrekonstruktion in höheren Dimensionen In der vorliegenden Arbeit wird ein Regularisierungsnetzwerk zur Rekonstruktion von stetigen Funktionen ƒ:[0,1]n→R vorgestellt, welches direkt auf einer neuen konstruktiven Version von Kolmogorovs Superpositionen Theorem basiert. Dabei sind lediglich die Funktionswerte ƒ(xj) an diskreten Datenpunktenxj, j=1,…,P bekannt. Typischerweise leidet die numerische Lösung mathematischer Probleme unter dem sogenannten Fluch der Dimension. Dieser Begriff beschreibt das exponentielle Wachstum der Komplexität des verwendeten Verfahrens mit der Dimension n. Um dies zumindest teilweise zu vermeiden, werden üblicherweise höhere Regularitätsannahmen an die Lösung des Problems gemacht, was allerdings häufig unrealistisch ist. Daher wird in dieser Arbeit eine Darstellung der Funktion ƒ als Superposition eindimensionaler Funktionen verwendet, welche keiner höheren Regularitätsannahmen als Stetigkeit bedarf. Zu diesem Zweck wird eine konstruktive Variante des Kolmogorov Superpositionen Theorems, welche auf D. Sprecher zurückgeht, so angepasst, dass nur eine äußere Funktion Φ sowie eine universelle innere Funktion ψ zur Darstellung von ƒ notwendig ist. Die Funktion ψ ist nach einer Definition von M. Köppen explizit und unabhängig von ƒ als Fortsetzung einer Funktion, welche auf einer Dichten Teilmenge der reellen Achse definiert ist, gegeben. Der fehlende Beweis von Existenz, Stetigkeit und Monotonie von ψ wird in dieser Arbeit geführt. Zur Berechnung der äußeren Funktion Φ wird ein iterativer Algorithmus von Sprecher so modifiziert, dass jeder Iterationsschritt, abhängig von ƒ, ein Element einer Folge univariater Funktionen{ Φr}r liefert. Es wird gezeigt werden, dass die Folge gegen einen stetigen Grenzwert Φ:R→R konvergiert. Dies liefert einen konstruktiven Beweis einer neuen Version des Kolmogorov Superpositionen Theorems mit einer äußeren und einer inneren Funktion. Da die numerische Komplexität des Algorithmus zur Berechnung von Φ exponentiell mit der Dimension wächst, stellen wir alternativ ein Regularisierungsnetzwerk, basierend auf dieser Darstellung, vor. Dabei wird die äußere Funktion aus gegebenen Daten (xj,ƒ(xj)), j=1,…,P berechnet. Das Modell zur Rekonstruktion von ƒ wird in zwei Schritten eingeführt. Zunächst wird zur Definition eines vorläufigen Modells die äußere Funktion, bzw. eine Approximation an Φ, in einer endlichen Basis mit unbekannten Koeffizienten dargestellt. Diese werden dann durch eine Variationsformulierung bestimmt, d.h. durch die Minimierung eines regularisierten empirischen Fehlerfunktionals. Eine detaillierte numerische Analyse zeigt dann, dass Kolmogorovs Darstellung die Dimensionalität von ƒ in Oszillationen von F transformiert. Somit ist die Verwendung von Basisfunktionen mit lokalem Träger nicht geeignet, da die räumliche Auflösung der Approximation die starken Oszillationen erfassen muss. Des Weiteren zeigt eine Analyse der Fouriertransformation von Φ, dass die relevanten Frequenzen, unabhängig von ƒ, a priori bestimmbar sind, und dass die äußere Funktion Produktstruktur aufweist. Dies motiviert die Definition des endgültigen Modells. Dazu wird Φ nun durch ein Produkt von Funktionen ersetzt und jeder Faktor in einer Fourierbasis entwickelt. Die Koeffizienten werden ebenfalls durch Minimierung eines regularisierten empirischen Fehlerfunktionals bestimmt. Für beide Modelle wird ein theoretischer Rahmen in Form von Hilberträumen mit reproduzierendem Kern entwickelt. Die zugehörigen Normen bilden dabei jeweils den Regularisierungsterm der entsprechenden Fehlerfunktionale. Somit können beide Ansätze als Regularisierungsnetzwerke interpretiert werden. Allerdings ist zu beachten, dass das Fehlerfunktional für den Produktansatz nicht konvex ist und nichtlineare Minimierungsverfahren zur Berechnung der Koeffizienten notwendig sind. Weitere ausführliche numerische Tests zeigen, dass dieses Modell in der Lage ist Funktionen zu rekonstruieren welche von bis zu zehn Variablen abhängen