1,272 research outputs found
A modular network treatment of Baars' Global Workspace consciousness model
Network theory provides an alternative to the renormalization and phase transition methods used in Wallace's (2005a) treatment of Baars' Global Workspace model. Like the earlier study, the new analysis produces the workplace itself, the tunable threshold of consciousness, and the essential role for embedding contexts, in an explicitly analytic 'necessary conditions' manner which suffers neither the mereological fallacy inherent to brain-only theories nor the sufficiency indeterminacy of neural network or agent-based simulations. This suggests that the new approach, and the earlier, represent different analytically solvable limits in a broad continuum of possible models, analogous to the differences between bond and site percolation or between the two and many-body limits of classical mechanics. The development significantly extends the theoretical foundations for an empirical general cognitive model (GCM) based on the Shannon-McMillan Theorem. Patterned after the general linear model which reflects the Central Limit Theorem, the proposed technique should be both useful for the reduction of expermiental data on consciousness and in the design of devices with capacities which may transcend those of conventional machines and provide new perspectives on the varieties of biological consciousness
Learning curves of generic features maps for realistic datasets with a teacher-student model
Teacher-student models provide a framework in which the typical-case
performance of high-dimensional supervised learning can be described in closed
form. The assumptions of Gaussian i.i.d. input data underlying the canonical
teacher-student model may, however, be perceived as too restrictive to capture
the behaviour of realistic data sets. In this paper, we introduce a Gaussian
covariate generalisation of the model where the teacher and student can act on
different spaces, generated with fixed, but generic feature maps. While still
solvable in a closed form, this generalization is able to capture the learning
curves for a broad range of realistic data sets, thus redeeming the potential
of the teacher-student framework. Our contribution is then two-fold: First, we
prove a rigorous formula for the asymptotic training loss and generalisation
error. Second, we present a number of situations where the learning curve of
the model captures the one of a realistic data set learned with kernel
regression and classification, with out-of-the-box feature maps such as random
projections or scattering transforms, or with pre-learned ones - such as the
features learned by training multi-layer neural networks. We discuss both the
power and the limitations of the framework.Comment: v3: NeurIPS camera-read
Surprises in High-Dimensional Ridgeless Least Squares Interpolation
Interpolators -- estimators that achieve zero training error -- have
attracted growing attention in machine learning, mainly because state-of-the
art neural networks appear to be models of this type. In this paper, we study
minimum norm (``ridgeless'') interpolation in high-dimensional least
squares regression. We consider two different models for the feature
distribution: a linear model, where the feature vectors
are obtained by applying a linear transform to a vector of i.i.d.\ entries,
(with ); and a nonlinear model,
where the feature vectors are obtained by passing the input through a random
one-layer neural network, (with ,
a matrix of i.i.d.\ entries, and an
activation function acting componentwise on ). We recover -- in a
precise quantitative way -- several phenomena that have been observed in
large-scale neural networks and kernel machines, including the "double descent"
behavior of the prediction risk, and the potential benefits of
overparametrization.Comment: 68 pages; 16 figures. This revision contains non-asymptotic version
of earlier results, and results for general coefficient
Invariance of Weight Distributions in Rectified MLPs
An interesting approach to analyzing neural networks that has received
renewed attention is to examine the equivalent kernel of the neural network.
This is based on the fact that a fully connected feedforward network with one
hidden layer, a certain weight distribution, an activation function, and an
infinite number of neurons can be viewed as a mapping into a Hilbert space. We
derive the equivalent kernels of MLPs with ReLU or Leaky ReLU activations for
all rotationally-invariant weight distributions, generalizing a previous result
that required Gaussian weight distributions. Additionally, the Central Limit
Theorem is used to show that for certain activation functions, kernels
corresponding to layers with weight distributions having mean and finite
absolute third moment are asymptotically universal, and are well approximated
by the kernel corresponding to layers with spherical Gaussian weights. In deep
networks, as depth increases the equivalent kernel approaches a pathological
fixed point, which can be used to argue why training randomly initialized
networks can be difficult. Our results also have implications for weight
initialization.Comment: ICML 201
FastMMD: Ensemble of Circular Discrepancy for Efficient Two-Sample Test
The maximum mean discrepancy (MMD) is a recently proposed test statistic for
two-sample test. Its quadratic time complexity, however, greatly hampers its
availability to large-scale applications. To accelerate the MMD calculation, in
this study we propose an efficient method called FastMMD. The core idea of
FastMMD is to equivalently transform the MMD with shift-invariant kernels into
the amplitude expectation of a linear combination of sinusoid components based
on Bochner's theorem and Fourier transform (Rahimi & Recht, 2007). Taking
advantage of sampling of Fourier transform, FastMMD decreases the time
complexity for MMD calculation from to , where and
are the size and dimension of the sample set, respectively. Here is the
number of basis functions for approximating kernels which determines the
approximation accuracy. For kernels that are spherically invariant, the
computation can be further accelerated to by using the Fastfood
technique (Le et al., 2013). The uniform convergence of our method has also
been theoretically proved in both unbiased and biased estimates. We have
further provided a geometric explanation for our method, namely ensemble of
circular discrepancy, which facilitates us to understand the insight of MMD,
and is hopeful to help arouse more extensive metrics for assessing two-sample
test. Experimental results substantiate that FastMMD is with similar accuracy
as exact MMD, while with faster computation speed and lower variance than the
existing MMD approximation methods
- …