Search CORE

1,272 research outputs found

A modular network treatment of Baars' Global Workspace consciousness model

Author: Wallace Rodrick
Publication venue
Publication date: 01/01/2005
Field of study

Network theory provides an alternative to the renormalization and phase transition methods used in Wallace's (2005a) treatment of Baars' Global Workspace model. Like the earlier study, the new analysis produces the workplace itself, the tunable threshold of consciousness, and the essential role for embedding contexts, in an explicitly analytic 'necessary conditions' manner which suffers neither the mereological fallacy inherent to brain-only theories nor the sufficiency indeterminacy of neural network or agent-based simulations. This suggests that the new approach, and the earlier, represent different analytically solvable limits in a broad continuum of possible models, analogous to the differences between bond and site percolation or between the two and many-body limits of classical mechanics. The development significantly extends the theoretical foundations for an empirical general cognitive model (GCM) based on the Shannon-McMillan Theorem. Patterned after the general linear model which reflects the Central Limit Theorem, the proposed technique should be both useful for the reduction of expermiental data on consciousness and in the design of devices with capacities which may transcend those of conventional machines and provide new perspectives on the varieties of biological consciousness

CiteSeerX

CogPrints Cognitive Sciences Eprint Archive

Learning curves of generic features maps for realistic datasets with a teacher-student model

Author: Cui Hugo
Gerbelot Cédric
Goldt Sebastian
Krzakala Florent
Loureiro Bruno
Mézard Marc
Zdeborová Lenka
Publication venue
Publication date: 14/12/2021
Field of study

Teacher-student models provide a framework in which the typical-case performance of high-dimensional supervised learning can be described in closed form. The assumptions of Gaussian i.i.d. input data underlying the canonical teacher-student model may, however, be perceived as too restrictive to capture the behaviour of realistic data sets. In this paper, we introduce a Gaussian covariate generalisation of the model where the teacher and student can act on different spaces, generated with fixed, but generic feature maps. While still solvable in a closed form, this generalization is able to capture the learning curves for a broad range of realistic data sets, thus redeeming the potential of the teacher-student framework. Our contribution is then two-fold: First, we prove a rigorous formula for the asymptotic training loss and generalisation error. Second, we present a number of situations where the learning curve of the model captures the one of a realistic data set learned with kernel regression and classification, with out-of-the-box feature maps such as random projections or scattering transforms, or with pre-learned ones - such as the features learned by training multi-layer neural networks. We discuss both the power and the limitations of the framework.Comment: v3: NeurIPS camera-read

arXiv.org e-Print Archive

Surprises in High-Dimensional Ridgeless Least Squares Interpolation

Author: Hastie Trevor
Montanari Andrea
Rosset Saharon
Tibshirani Ryan J.
Publication venue
Publication date: 07/12/2020
Field of study

Interpolators -- estimators that achieve zero training error -- have attracted growing attention in machine learning, mainly because state-of-the art neural networks appear to be models of this type. In this paper, we study minimum

\ell_2

norm (``ridgeless'') interpolation in high-dimensional least squares regression. We consider two different models for the feature distribution: a linear model, where the feature vectors

x_i \in {\mathbb R}^p

are obtained by applying a linear transform to a vector of i.i.d.\ entries,

x_i = \Sigma^{1/2} z_i

(with

z_i \in {\mathbb R}^p

); and a nonlinear model, where the feature vectors are obtained by passing the input through a random one-layer neural network,

x_i = \varphi(W z_i)

(with

z_i \in {\mathbb R}^d

W \in {\mathbb R}^{p \times d}

a matrix of i.i.d.\ entries, and

\varphi

an activation function acting componentwise on

W z_i

). We recover -- in a precise quantitative way -- several phenomena that have been observed in large-scale neural networks and kernel machines, including the "double descent" behavior of the prediction risk, and the potential benefits of overparametrization.Comment: 68 pages; 16 figures. This revision contains non-asymptotic version of earlier results, and results for general coefficient

arXiv.org e-Print Archive

Invariance of Weight Distributions in Rectified MLPs

Author: Gallagher Marcus
Roosta-Khorasani Farbod
Tsuchida Russell
Publication venue
Publication date: 01/01/2018
Field of study

An interesting approach to analyzing neural networks that has received renewed attention is to examine the equivalent kernel of the neural network. This is based on the fact that a fully connected feedforward network with one hidden layer, a certain weight distribution, an activation function, and an infinite number of neurons can be viewed as a mapping into a Hilbert space. We derive the equivalent kernels of MLPs with ReLU or Leaky ReLU activations for all rotationally-invariant weight distributions, generalizing a previous result that required Gaussian weight distributions. Additionally, the Central Limit Theorem is used to show that for certain activation functions, kernels corresponding to layers with weight distributions having

0

mean and finite absolute third moment are asymptotically universal, and are well approximated by the kernel corresponding to layers with spherical Gaussian weights. In deep networks, as depth increases the equivalent kernel approaches a pathological fixed point, which can be used to argue why training randomly initialized networks can be difficult. Our results also have implications for weight initialization.Comment: ICML 201

arXiv.org e-Print Archive

University of Queensland eSpace

FastMMD: Ensemble of Circular Discrepancy for Efficient Two-Sample Test

Author: Meng Deyu
Zhao Ji
Publication venue: 'MIT Press - Journals'
Publication date: 18/06/2015
Field of study

The maximum mean discrepancy (MMD) is a recently proposed test statistic for two-sample test. Its quadratic time complexity, however, greatly hampers its availability to large-scale applications. To accelerate the MMD calculation, in this study we propose an efficient method called FastMMD. The core idea of FastMMD is to equivalently transform the MMD with shift-invariant kernels into the amplitude expectation of a linear combination of sinusoid components based on Bochner's theorem and Fourier transform (Rahimi & Recht, 2007). Taking advantage of sampling of Fourier transform, FastMMD decreases the time complexity for MMD calculation from

O(N^2 d)

O(L N d)

, where

N

and

d

are the size and dimension of the sample set, respectively. Here

L

is the number of basis functions for approximating kernels which determines the approximation accuracy. For kernels that are spherically invariant, the computation can be further accelerated to

O(L N \log d)

by using the Fastfood technique (Le et al., 2013). The uniform convergence of our method has also been theoretically proved in both unbiased and biased estimates. We have further provided a geometric explanation for our method, namely ensemble of circular discrepancy, which facilitates us to understand the insight of MMD, and is hopeful to help arouse more extensive metrics for assessing two-sample test. Experimental results substantiate that FastMMD is with similar accuracy as exact MMD, while with faster computation speed and lower variance than the existing MMD approximation methods

arXiv.org e-Print Archive

CiteSeerX