1,400 research outputs found
On the equivalence between graph isomorphism testing and function approximation with GNNs
Graph neural networks (GNNs) have achieved lots of success on
graph-structured data. In the light of this, there has been increasing interest
in studying their representation power. One line of work focuses on the
universal approximation of permutation-invariant functions by certain classes
of GNNs, and another demonstrates the limitation of GNNs via graph isomorphism
tests.
Our work connects these two perspectives and proves their equivalence. We
further develop a framework of the representation power of GNNs with the
language of sigma-algebra, which incorporates both viewpoints. Using this
framework, we compare the expressive power of different classes of GNNs as well
as other methods on graphs. In particular, we prove that order-2 Graph
G-invariant networks fail to distinguish non-isomorphic regular graphs with the
same degree. We then extend them to a new architecture, Ring-GNNs, which
succeeds on distinguishing these graphs and provides improvements on real-world
social network datasets
A Multi-In and Multi-Out Dendritic Neuron Model and its Optimization
Artificial neural networks (ANNs), inspired by the interconnection of real
neurons, have achieved unprecedented success in various fields such as computer
vision and natural language processing. Recently, a novel mathematical ANN
model, known as the dendritic neuron model (DNM), has been proposed to address
nonlinear problems by more accurately reflecting the structure of real neurons.
However, the single-output design limits its capability to handle multi-output
tasks, significantly lowering its applications. In this paper, we propose a
novel multi-in and multi-out dendritic neuron model (MODN) to tackle
multi-output tasks. Our core idea is to introduce a filtering matrix to the
soma layer to adaptively select the desired dendrites to regress each output.
Because such a matrix is designed to be learnable, MODN can explore the
relationship between each dendrite and output to provide a better solution to
downstream tasks. We also model a telodendron layer into MODN to simulate
better the real neuron behavior. Importantly, MODN is a more general and
unified framework that can be naturally specialized as the DNM by customizing
the filtering matrix. To explore the optimization of MODN, we investigate both
heuristic and gradient-based optimizers and introduce a 2-step training method
for MODN. Extensive experimental results performed on 11 datasets on both
binary and multi-class classification tasks demonstrate the effectiveness of
MODN, with respect to accuracy, convergence, and generality
Recommended from our members
Modern Problems in Mathematical Signal Processing: Quantized Compressed Sensing and Randomized Neural Networks
We study two problems from mathematical signal processing. First, we consider problem of approximately recovering signals on a smooth, compact manifold from one-bit linear measurements drawn from either a Gaussian ensemble, partial circulant ensemble, or bounded orthonormal ensemble and quantized using or distributed noise-shaping schemes. We construct a convex optimization algorithm for signal recovery that, given a Geometric Multi-Resolution Analysis approximation of the manifold, guarantees signal recovery with high probability. We prove an upper bound on the recovery error which outperforms prior works that use memoryless scalar quantization, requires a simpler analysis, and extends the class of measurements beyond Gaussians.Second, we consider the problem of approximation continuous functions on compact domains using neural networks. The learning speed of feed-forward neural networks is notoriously slow and has presented a bottleneck in deep learning applications for several decades. For instance, gradient-based learning algorithms, which are used extensively to train neural networks, tend to work slowly when all of the network parameters must be iteratively tuned. To counter this, both researchers and practitioners have tried introducing randomness to reduce the learning requirement. Based on the original construction of B.~Igelnik and Y.H.~Pao, single layer neural-networks with random input-to-hidden layer weights and biases have seen success in practice, but the necessary theoretical justification is lacking. We begin to fill this theoretical gap by providing a (corrected) rigorous proof that the Igelnik and Pao construction is a universal approximator for continuous functions on compact domains, with -error convergence rate inversely proportional to the number of network nodes; we then extend this result to the non-asymptotic setting using a concentration inequality for Monte-Carlo integral approximations. We further adapt this randomized neural network architecture to approximate functions on smooth, compact submanifolds of Euclidean space, providing theoretical guarantees in both the asymptotic and non-asymptotic cases
ML4Chem: A Machine Learning Package for Chemistry and Materials Science
ML4Chem is an open-source machine learning library for chemistry and
materials science. It provides an extendable platform to develop and deploy
machine learning models and pipelines and is targeted to the non-expert and
expert users. ML4Chem follows user-experience design and offers the needed
tools to go from data preparation to inference. Here we introduce its atomistic
module for the implementation, deployment, and reproducibility of atom-centered
models. This module is composed of six core building blocks: data,
featurization, models, model optimization, inference, and visualization. We
present their functionality and easiness of use with demonstrations utilizing
neural networks and kernel ridge regression algorithms.Comment: 32 pages, 11 Figure
Mathematical properties of the Stochastic Approximation and the Multi-Armed Bandit problem
Παρά το γεγονός ότι τα νευρωνικά δίκτυα χρησιμοποιούνταν επί δεκαετίες με εντυπωσιακά αποτελέσματα, η ανάπτυξη ενός θεωρητικού υπόβαθρου που θα εξηγούσε αυτήν τους την επιτυχία είναι σχετικά πρόσφατο επίτευγμα. Στο Κεφάλαιο 2, παρουσιάζουμε τα κυριότερα αποτελέσματα που έδωσαν απάντηση σε αυτά τα ερωτήματα. Αποδεικνύουμε το θεώρημα του Cybenko, σύμφωνα με το οποίο κάθε συνεχής και σιγμοειδής συνάρτηση είναι καθολικός προσεγγιστής. Παρουσιάζουμε επίσης αρκετές επεκτάσεις του θεωρήματος αυτού.
Το Κεφάλαιο 3 είναι αφιερωμένο στη μελέτη αλγορίθμων στοχαστικής προσέγγισης, οι οποίοι στοχεύουν στην εύρεση του σταθερού σημείου ενόςτελεστή, όταν οι ακριβείς τιμές που παίρνει δεν είναι γνωστές σε εμάς, αλλά μας αποκαλύπτονται με την παρουσία θορύβου. Παρουσιάζουμε επίσης την απόδειξη του αλγορίθμου της Q-Μάθησης, η οποία αποτελεί γενίκευση μιας μεθόδου που χρησιμοποιείται ευρέως στον κλασσικό δυναμικό προγραμματισμό, της μεθόδου των διαδοχικών προσεγγίσεων, για προβλήματα στα οποία δεν έχουμε γνώση των διαφόρων παραμέτρων (πιθανότητες μετάβασης και δομή κόστους), αλλά αντίθετα μπορούμε μόνο να προσομοιώνουμε παρατηρήσεις από αυτές.
Τέλος, στο Κεφάλαιο 4, μελετάμε το πρόβλημα των multi-armed bandit, το αντικείμενο του οποίου είναι ο προσδιορισμός της πιο κερδοφόρας δράσης από ένα δοσμένο σύνολο, μαζί με την ταυτόχρονη μεγιστοποίηση του αναμενόμενου κέρδους μας σε βάθος χρόνου. Αποδεικνύουμε το φράγμα των Lai-Robbins, σύμφωνα με το οποίο για μια συγκεκριμένη κλάση κατανομών, υπάρχουν όρια στο πόσο γρήγορα μπορούμε να πλησιάσουμε το βέλτιστο κέρδος, ενώ επίσης παρουσιάζουμε και έναν αλγόριθμο που επιτυγχάνει το φράγμα αυτό. Ο αλγόριθμος των Lai-robbins περιέχει αρκετά σκοτεινά σημεία τα οποία προσπαθεί να απλοποιήσει η μέθοδος upper confidence bounds των Auer et al., με την οποία ολοκληρώνουμε την εργασία μας.Despite the fact that neural networks had been used extensively for decades, a theoretical background that would explain their success was, until recently, elusive. In Chapter 2, we present the main results which settled this question, developed mostly in the early `90s. We prove Cybenko's theorem, which states that continuous and sigmoidal functions are always universal approximators, and we also study some extensions of this result.
Chapter 3 is devoted to the study of stochastic approximation algorithms. The goal of these algorithms is to determine the fixed point of an operator when its values are not known to us, but they are revealed perturbed by some noise. We also present the proof of the convergence of the Q-Learning algorithm which is based on this theory. The Q-Learning algorithm is a generalization of the successive approximation method, a method used extensively in classical dynamic programming, when we have no prior information on the underlying process (transition probabilities and cost functions), but we can only draw and observe values from it.
In the final chapter, we study the multi-armed bandit problem, a subfield of reinforcement learning, where the goal is to determine the most profitable action among a given set, while simultaneously, maximizing one's profit. We prove the Lai-Robbins lower bound, which shows that for a certain class of reward distributions there are limits to how fast one can reach a maximum profit, and we also present an algorithm that attains it. We conclude the chapter by studying the upper confidence bound algorithm, introduced by Auer et al., which resolves several issues of the Lai-Robbins approach
Low rank surrogates for polymorphic fields with application to fuzzy-stochastic partial differential equations
We consider a general form of fuzzy-stochastic PDEs depending on the interaction of probabilistic and non-probabilistic ("possibilistic") influences. Such a combined modelling of aleatoric and epistemic uncertainties for instance can be applied beneficially in an engineering context for real-world applications, where probabilistic modelling and expert knowledge has to be accounted for. We examine existence and well-definedness of polymorphic PDEs in appropriate function spaces. The fuzzy-stochastic dependence is described in a high-dimensional parameter space, thus easily leading to an exponential complexity in practical computations. To aleviate this severe obstacle in practise, a compressed low-rank approximation of the problem formulation and the solution is derived. This is based on the Hierarchical Tucker format which is constructed with solution samples by a non-intrusive tensor reconstruction algorithm. The performance of the proposed model order reduction approach is demonstrated with two examples. One of these is the ubiquitous groundwater flow model with Karhunen-Loeve coefficient field which is generalized by a fuzzy correlation length
Low rank surrogates for polymorphic fields with application to fuzzy-stochastic partial differential equations
We consider a general form of fuzzy-stochastic PDEs depending on the interaction of probabilistic
and non-probabilistic ("possibilistic") influences. Such a combined modelling of aleatoric
and epistemic uncertainties for instance can be applied beneficially in an engineering context for
real-world applications, where probabilistic modelling and expert knowledge has to be accounted
for. We examine existence and well-definedness of polymorphic PDEs in appropriate function
spaces. The fuzzy-stochastic dependence is described in a high-dimensional parameter space,
thus easily leading to an exponential complexity in practical computations.
To aleviate this severe obstacle in practise, a compressed low-rank approximation of the problem
formulation and the solution is derived. This is based on the Hierarchical Tucker format which
is constructed with solution samples by a non-intrusive tensor reconstruction algorithm. The performance
of the proposed model order reduction approach is demonstrated with two examples.
One of these is the ubiquitous groundwater flow model with Karhunen-Loeve coefficient field
which is generalized by a fuzzy correlation length
Efficient Hardware Architectures for Accelerating Deep Neural Networks: Survey
In the modern-day era of technology, a paradigm shift has been witnessed in the areas involving applications of Artificial Intelligence (AI), Machine Learning (ML), and Deep Learning (DL). Specifically, Deep Neural Networks (DNNs) have emerged as a popular field of interest in most AI applications such as computer vision, image and video processing, robotics, etc. In the context of developed digital technologies and the availability of authentic data and data handling infrastructure, DNNs have been a credible choice for solving more complex real-life problems. The performance and accuracy of a DNN is a way better than human intelligence in certain situations. However, it is noteworthy that the DNN is computationally too cumbersome in terms of the resources and time to handle these computations. Furthermore, general-purpose architectures like CPUs have issues in handling such computationally intensive algorithms. Therefore, a lot of interest and efforts have been invested by the research fraternity in specialized hardware architectures such as Graphics Processing Unit (GPU), Field Programmable Gate Array (FPGA), Application Specific Integrated Circuit (ASIC), and Coarse Grained Reconfigurable Array (CGRA) in the context of effective implementation of computationally intensive algorithms. This paper brings forward the various research works carried out on the development and deployment of DNNs using the aforementioned specialized hardware architectures and embedded AI accelerators. The review discusses the detailed description of the specialized hardware-based accelerators used in the training and/or inference of DNN. A comparative study based on factors like power, area, and throughput, is also made on the various accelerators discussed. Finally, future research and development directions are discussed, such as future trends in DNN implementation on specialized hardware accelerators. This review article is intended to serve as a guide for hardware architectures for accelerating and improving the effectiveness of deep learning research.publishedVersio
- …