1,400 research outputs found

    On the equivalence between graph isomorphism testing and function approximation with GNNs

    Full text link
    Graph neural networks (GNNs) have achieved lots of success on graph-structured data. In the light of this, there has been increasing interest in studying their representation power. One line of work focuses on the universal approximation of permutation-invariant functions by certain classes of GNNs, and another demonstrates the limitation of GNNs via graph isomorphism tests. Our work connects these two perspectives and proves their equivalence. We further develop a framework of the representation power of GNNs with the language of sigma-algebra, which incorporates both viewpoints. Using this framework, we compare the expressive power of different classes of GNNs as well as other methods on graphs. In particular, we prove that order-2 Graph G-invariant networks fail to distinguish non-isomorphic regular graphs with the same degree. We then extend them to a new architecture, Ring-GNNs, which succeeds on distinguishing these graphs and provides improvements on real-world social network datasets

    A Multi-In and Multi-Out Dendritic Neuron Model and its Optimization

    Full text link
    Artificial neural networks (ANNs), inspired by the interconnection of real neurons, have achieved unprecedented success in various fields such as computer vision and natural language processing. Recently, a novel mathematical ANN model, known as the dendritic neuron model (DNM), has been proposed to address nonlinear problems by more accurately reflecting the structure of real neurons. However, the single-output design limits its capability to handle multi-output tasks, significantly lowering its applications. In this paper, we propose a novel multi-in and multi-out dendritic neuron model (MODN) to tackle multi-output tasks. Our core idea is to introduce a filtering matrix to the soma layer to adaptively select the desired dendrites to regress each output. Because such a matrix is designed to be learnable, MODN can explore the relationship between each dendrite and output to provide a better solution to downstream tasks. We also model a telodendron layer into MODN to simulate better the real neuron behavior. Importantly, MODN is a more general and unified framework that can be naturally specialized as the DNM by customizing the filtering matrix. To explore the optimization of MODN, we investigate both heuristic and gradient-based optimizers and introduce a 2-step training method for MODN. Extensive experimental results performed on 11 datasets on both binary and multi-class classification tasks demonstrate the effectiveness of MODN, with respect to accuracy, convergence, and generality

    ML4Chem: A Machine Learning Package for Chemistry and Materials Science

    Full text link
    ML4Chem is an open-source machine learning library for chemistry and materials science. It provides an extendable platform to develop and deploy machine learning models and pipelines and is targeted to the non-expert and expert users. ML4Chem follows user-experience design and offers the needed tools to go from data preparation to inference. Here we introduce its atomistic module for the implementation, deployment, and reproducibility of atom-centered models. This module is composed of six core building blocks: data, featurization, models, model optimization, inference, and visualization. We present their functionality and easiness of use with demonstrations utilizing neural networks and kernel ridge regression algorithms.Comment: 32 pages, 11 Figure

    Mathematical properties of the Stochastic Approximation and the Multi-Armed Bandit problem

    Get PDF
    Παρά το γεγονός ότι τα νευρωνικά δίκτυα χρησιμοποιούνταν επί δεκαετίες με εντυπωσιακά αποτελέσματα, η ανάπτυξη ενός θεωρητικού υπόβαθρου που θα εξηγούσε αυτήν τους την επιτυχία είναι σχετικά πρόσφατο επίτευγμα. Στο Κεφάλαιο 2, παρουσιάζουμε τα κυριότερα αποτελέσματα που έδωσαν απάντηση σε αυτά τα ερωτήματα. Αποδεικνύουμε το θεώρημα του Cybenko, σύμφωνα με το οποίο κάθε συνεχής και σιγμοειδής συνάρτηση είναι καθολικός προσεγγιστής. Παρουσιάζουμε επίσης αρκετές επεκτάσεις του θεωρήματος αυτού. Το Κεφάλαιο 3 είναι αφιερωμένο στη μελέτη αλγορίθμων στοχαστικής προσέγγισης, οι οποίοι στοχεύουν στην εύρεση του σταθερού σημείου ενόςτελεστή, όταν οι ακριβείς τιμές που παίρνει δεν είναι γνωστές σε εμάς, αλλά μας αποκαλύπτονται με την παρουσία θορύβου. Παρουσιάζουμε επίσης την απόδειξη του αλγορίθμου της Q-Μάθησης, η οποία αποτελεί γενίκευση μιας μεθόδου που χρησιμοποιείται ευρέως στον κλασσικό δυναμικό προγραμματισμό, της μεθόδου των διαδοχικών προσεγγίσεων, για προβλήματα στα οποία δεν έχουμε γνώση των διαφόρων παραμέτρων (πιθανότητες μετάβασης και δομή κόστους), αλλά αντίθετα μπορούμε μόνο να προσομοιώνουμε παρατηρήσεις από αυτές. Τέλος, στο Κεφάλαιο 4, μελετάμε το πρόβλημα των multi-armed bandit, το αντικείμενο του οποίου είναι ο προσδιορισμός της πιο κερδοφόρας δράσης από ένα δοσμένο σύνολο, μαζί με την ταυτόχρονη μεγιστοποίηση του αναμενόμενου κέρδους μας σε βάθος χρόνου. Αποδεικνύουμε το φράγμα των Lai-Robbins, σύμφωνα με το οποίο για μια συγκεκριμένη κλάση κατανομών, υπάρχουν όρια στο πόσο γρήγορα μπορούμε να πλησιάσουμε το βέλτιστο κέρδος, ενώ επίσης παρουσιάζουμε και έναν αλγόριθμο που επιτυγχάνει το φράγμα αυτό. Ο αλγόριθμος των Lai-robbins περιέχει αρκετά σκοτεινά σημεία τα οποία προσπαθεί να απλοποιήσει η μέθοδος upper confidence bounds των Auer et al., με την οποία ολοκληρώνουμε την εργασία μας.Despite the fact that neural networks had been used extensively for decades, a theoretical background that would explain their success was, until recently, elusive. In Chapter 2, we present the main results which settled this question, developed mostly in the early `90s. We prove Cybenko's theorem, which states that continuous and sigmoidal functions are always universal approximators, and we also study some extensions of this result. Chapter 3 is devoted to the study of stochastic approximation algorithms. The goal of these algorithms is to determine the fixed point of an operator when its values are not known to us, but they are revealed perturbed by some noise. We also present the proof of the convergence of the Q-Learning algorithm which is based on this theory. The Q-Learning algorithm is a generalization of the successive approximation method, a method used extensively in classical dynamic programming, when we have no prior information on the underlying process (transition probabilities and cost functions), but we can only draw and observe values from it. In the final chapter, we study the multi-armed bandit problem, a subfield of reinforcement learning, where the goal is to determine the most profitable action among a given set, while simultaneously, maximizing one's profit. We prove the Lai-Robbins lower bound, which shows that for a certain class of reward distributions there are limits to how fast one can reach a maximum profit, and we also present an algorithm that attains it. We conclude the chapter by studying the upper confidence bound algorithm, introduced by Auer et al., which resolves several issues of the Lai-Robbins approach

    Low rank surrogates for polymorphic fields with application to fuzzy-stochastic partial differential equations

    Get PDF
    We consider a general form of fuzzy-stochastic PDEs depending on the interaction of probabilistic and non-probabilistic ("possibilistic") influences. Such a combined modelling of aleatoric and epistemic uncertainties for instance can be applied beneficially in an engineering context for real-world applications, where probabilistic modelling and expert knowledge has to be accounted for. We examine existence and well-definedness of polymorphic PDEs in appropriate function spaces. The fuzzy-stochastic dependence is described in a high-dimensional parameter space, thus easily leading to an exponential complexity in practical computations. To aleviate this severe obstacle in practise, a compressed low-rank approximation of the problem formulation and the solution is derived. This is based on the Hierarchical Tucker format which is constructed with solution samples by a non-intrusive tensor reconstruction algorithm. The performance of the proposed model order reduction approach is demonstrated with two examples. One of these is the ubiquitous groundwater flow model with Karhunen-Loeve coefficient field which is generalized by a fuzzy correlation length

    Low rank surrogates for polymorphic fields with application to fuzzy-stochastic partial differential equations

    Get PDF
    We consider a general form of fuzzy-stochastic PDEs depending on the interaction of probabilistic and non-probabilistic ("possibilistic") influences. Such a combined modelling of aleatoric and epistemic uncertainties for instance can be applied beneficially in an engineering context for real-world applications, where probabilistic modelling and expert knowledge has to be accounted for. We examine existence and well-definedness of polymorphic PDEs in appropriate function spaces. The fuzzy-stochastic dependence is described in a high-dimensional parameter space, thus easily leading to an exponential complexity in practical computations. To aleviate this severe obstacle in practise, a compressed low-rank approximation of the problem formulation and the solution is derived. This is based on the Hierarchical Tucker format which is constructed with solution samples by a non-intrusive tensor reconstruction algorithm. The performance of the proposed model order reduction approach is demonstrated with two examples. One of these is the ubiquitous groundwater flow model with Karhunen-Loeve coefficient field which is generalized by a fuzzy correlation length

    Efficient Hardware Architectures for Accelerating Deep Neural Networks: Survey

    Get PDF
    In the modern-day era of technology, a paradigm shift has been witnessed in the areas involving applications of Artificial Intelligence (AI), Machine Learning (ML), and Deep Learning (DL). Specifically, Deep Neural Networks (DNNs) have emerged as a popular field of interest in most AI applications such as computer vision, image and video processing, robotics, etc. In the context of developed digital technologies and the availability of authentic data and data handling infrastructure, DNNs have been a credible choice for solving more complex real-life problems. The performance and accuracy of a DNN is a way better than human intelligence in certain situations. However, it is noteworthy that the DNN is computationally too cumbersome in terms of the resources and time to handle these computations. Furthermore, general-purpose architectures like CPUs have issues in handling such computationally intensive algorithms. Therefore, a lot of interest and efforts have been invested by the research fraternity in specialized hardware architectures such as Graphics Processing Unit (GPU), Field Programmable Gate Array (FPGA), Application Specific Integrated Circuit (ASIC), and Coarse Grained Reconfigurable Array (CGRA) in the context of effective implementation of computationally intensive algorithms. This paper brings forward the various research works carried out on the development and deployment of DNNs using the aforementioned specialized hardware architectures and embedded AI accelerators. The review discusses the detailed description of the specialized hardware-based accelerators used in the training and/or inference of DNN. A comparative study based on factors like power, area, and throughput, is also made on the various accelerators discussed. Finally, future research and development directions are discussed, such as future trends in DNN implementation on specialized hardware accelerators. This review article is intended to serve as a guide for hardware architectures for accelerating and improving the effectiveness of deep learning research.publishedVersio
    corecore