Search CORE

397 research outputs found

NAIS-Net: Stable Deep Networks from Non-Autonomous Differential Equations

Author: Ciccone Marco
Gallieri Marco
Gomez Faustino
Masci Jonathan
Osendorfer Christian
Publication venue
Publication date: 01/01/2018
Field of study

This paper introduces Non-Autonomous Input-Output Stable Network (NAIS-Net), a very deep architecture where each stacked processing block is derived from a time-invariant non-autonomous dynamical system. Non-autonomy is implemented by skip connections from the block input to each of the unrolled processing stages and allows stability to be enforced so that blocks can be unrolled adaptively to a pattern-dependent processing depth. NAIS-Net induces non-trivial, Lipschitz input-output maps, even for an infinite unroll length. We prove that the network is globally asymptotically stable so that for every initial condition there is exactly one input-dependent equilibrium assuming tanh units, and multiple stable equilibria for ReL units. An efficient implementation that enforces the stability under derived conditions for both fully-connected and convolutional layers is also presented. Experimental results show how NAIS-Net exhibits stability in practice, yielding a significant reduction in generalization gap compared to ResNets.Comment: NIPS 201

arXiv.org e-Print Archive

Archivio istituzionale della ricerca - Politecnico di Milano

Machine Learning for Informed Representation Learning

Author: Samarin Maxim
Publication venue
Publication date: 01/01/2022
Field of study

The way we view reality and reason about the processes surrounding us is intimately connected to our perception and the representations we form about our observations and experiences. The popularity of machine learning and deep learning techniques in that regard stems from their ability to form useful representations by learning from large sets of observations. Typical application examples include image recognition or language processing for which artificial neural networks are powerful tools to extract regularity patterns or relevant statistics. In this thesis, we leverage and further develop this representation learning capability to address relevant but challenging real-world problems in geoscience and chemistry, to learn representations in an informed manner relevant to the task at hand, and reason about representation learning in neural networks, in general. Firstly, we develop an approach for efficient and scalable semantic segmentation of degraded soil in alpine grasslands in remotely-sensed images based on convolutional neural networks. To this end, we consider different grassland erosion phenomena in several Swiss valleys. We find that we are able to monitor soil degradation consistent with state-of-the-art methods in geoscience and can improve detection of affected areas. Furthermore, our approach provides a scalable method for large-scale analysis which is infeasible with established methods. Secondly, we address the question of how to identify suitable latent representations to enable generation of novel objects with selected properties. For this, we introduce a new deep generative model in the context of manifold learning and disentanglement. Our model improves targeted generation of novel objects by making use of property cycle consistency in property-relevant and property-invariant latent subspaces. We demonstrate the improvements on the generation of molecules with desired physical or chemical properties. Furthermore, we show that our model facilitates interpretability and exploration of the latent representation. Thirdly, in the context of recent advances in deep learning theory and the neural tangent kernel, we empirically investigate the learning of feature representations in standard convolutional neural networks and corresponding random feature models given by the linearisation of the neural networks. We find that performance differences between standard and linearised networks generally increase with the difficulty of the task but decrease with the considered width or over-parametrisation of these networks. Our results indicate interesting implications for feature learning and random feature models as well as the generalisation performance of highly over-parametrised neural networks. In summary, we employ and study feature learning in neural networks and review how we may use informed representation learning for challenging tasks

edoc

Using Linear Regression for Iteratively Training Neural Networks

Author: Khadilkar Harshad
Publication venue
Publication date: 11/07/2023
Field of study

We present a simple linear regression based approach for learning the weights and biases of a neural network, as an alternative to standard gradient based backpropagation. The present work is exploratory in nature, and we restrict the description and experiments to (i) simple feedforward neural networks, (ii) scalar (single output) regression problems, and (iii) invertible activation functions. However, the approach is intended to be extensible to larger, more complex architectures. The key idea is the observation that the input to every neuron in a neural network is a linear combination of the activations of neurons in the previous layer, as well as the parameters (weights and biases) of the layer. If we are able to compute the ideal total input values to every neuron by working backwards from the output, we can formulate the learning problem as a linear least squares problem which iterates between updating the parameters and the activation values. We present an explicit algorithm that implements this idea, and we show that (at least for simple problems) the approach is more stable and faster than gradient-based backpropagation.Comment: 9 page

arXiv.org e-Print Archive

Practical Computational Power of Linear Transformers and Their Recurrent and Self-Referential Extensions

Author: Csordás Róbert
Irie Kazuki
Schmidhuber Jürgen
Publication venue
Publication date: 24/10/2023
Field of study

Recent studies of the computational power of recurrent neural networks (RNNs) reveal a hierarchy of RNN architectures, given real-time and finite-precision assumptions. Here we study auto-regressive Transformers with linearised attention, a.k.a. linear Transformers (LTs) or Fast Weight Programmers (FWPs). LTs are special in the sense that they are equivalent to RNN-like sequence processors with a fixed-size state, while they can also be expressed as the now-popular self-attention networks. We show that many well-known results for the standard Transformer directly transfer to LTs/FWPs. Our formal language recognition experiments demonstrate how recently proposed FWP extensions such as recurrent FWPs and self-referential weight matrices successfully overcome certain limitations of the LT, e.g., allowing for generalisation on the parity problem. Our code is public.Comment: Accepted to EMNLP 2023 (short paper

arXiv.org e-Print Archive

Wide Field Imaging. I. Applications of Neural Networks to object detection and star/galaxy classification

Author: Arnaboldi
Baldi
Bazell
Bertin
Ferguson
Fritzke
G. Gargiulo
G. Longo
Godwin
Infante
Infante
Jarvis
Jutten
Karhunen
Karhunen
Koh
Kohonen
Lanzetta
Lipovetsky
Lloyd
Martinetz
Miller
N. Capuano
Naim
Odewahn
Oja
Oja
Oja
Pal
Plumbley
R. Tagliaferri
Rose
S. Andreon
Sanger
Tagliaferri
Tagliaferri
Tagliaferri
Publication venue: 'Wiley'
Publication date: 01/01/2000
Field of study

[Abriged] Astronomical Wide Field Imaging performed with new large format CCD detectors poses data reduction problems of unprecedented scale which are difficult to deal with traditional interactive tools. We present here NExt (Neural Extractor): a new Neural Network (NN) based package capable to detect objects and to perform both deblending and star/galaxy classification in an automatic way. Traditionally, in astronomical images, objects are first discriminated from the noisy background by searching for sets of connected pixels having brightnesses above a given threshold and then they are classified as stars or as galaxies through diagnostic diagrams having variables choosen accordingly to the astronomer's taste and experience. In the extraction step, assuming that images are well sampled, NExt requires only the simplest a priori definition of "what an object is" (id est, it keeps all structures composed by more than one pixels) and performs the detection via an unsupervised NN approaching detection as a clustering problem which has been thoroughly studied in the artificial intelligence literature. In order to obtain an objective and reliable classification, instead of using an arbitrarily defined set of features, we use a NN to select the most significant features among the large number of measured ones, and then we use their selected features to perform the classification task. In order to optimise the performances of the system we implemented and tested several different models of NN. The comparison of the NExt performances with those of the best detection and classification package known to the authors (SExtractor) shows that NExt is at least as effective as the best traditional packages.Comment: MNRAS, in press. Paper with higher resolution images is available at http://www.na.astro.it/~andreon/listapub.htm

arXiv.org e-Print Archive

CiteSeerX

Crossref

Archivio della Ricerca - Università della Basilicata

Archivio della Ricerca - Università di Salerno

CERN Document Server

Recommended from our members

An intelligent system for risk classification of stock investment projects

Author: Kalganova T
Khan T
Serguieva A
Publication venue: 'Cambridge University Press (CUP)'
Publication date: 01/01/2003
Field of study

The proposed paper demonstrates that a hybrid fuzzy neural network can serve as a risk classifier of stock investment projects. The training algorithm for the regular part of the network is based on bidirectional incremental evolution proving more efficient than direct evolution. The approach is compared with other crisp and soft investment appraisal and trading techniques, while building a multimodel domain representation for an intelligent decision support system. Thus the advantages of each model are utilised while looking at the investment problem from different perspectives. The empirical results are based on UK companies traded on the London Stock Exchange

Brunel University Research Archive

Stationary solution of the ring-spinning balloon in zero air drag using a RBFN based mesh-free method

Author: Fraser W. Barrie
Phillips David G.
Tran Canh-Dung
Publication venue: 'Informa UK Limited'
Publication date: 01/02/2010
Field of study

A technique for numerical analysis of the dynamics of the ring-spinning balloon based on the Radial Basis Function Networks (RBFNs) is presented in this paper. This method uses a 'universal approximator' based on neural network methodology to solve the differential governing equations which are derived from the conditions of the dynamic equilibrium of the yarn to determine the shape of balloon yarn. The method needs only a coarse finite collocation points without any finite element-type discretisation of the domain and its boundary for numerical solution of the governing differential equations. This paper will report a first assessment of the validity and efficiency of the present mesh-less method in predicting the balloon shape across a wide range of spinning conditions

University of Southern Queensland ePrints