1,272 research outputs found

    A modular network treatment of Baars' Global Workspace consciousness model

    Get PDF
    Network theory provides an alternative to the renormalization and phase transition methods used in Wallace's (2005a) treatment of Baars' Global Workspace model. Like the earlier study, the new analysis produces the workplace itself, the tunable threshold of consciousness, and the essential role for embedding contexts, in an explicitly analytic 'necessary conditions' manner which suffers neither the mereological fallacy inherent to brain-only theories nor the sufficiency indeterminacy of neural network or agent-based simulations. This suggests that the new approach, and the earlier, represent different analytically solvable limits in a broad continuum of possible models, analogous to the differences between bond and site percolation or between the two and many-body limits of classical mechanics. The development significantly extends the theoretical foundations for an empirical general cognitive model (GCM) based on the Shannon-McMillan Theorem. Patterned after the general linear model which reflects the Central Limit Theorem, the proposed technique should be both useful for the reduction of expermiental data on consciousness and in the design of devices with capacities which may transcend those of conventional machines and provide new perspectives on the varieties of biological consciousness

    Learning curves of generic features maps for realistic datasets with a teacher-student model

    Full text link
    Teacher-student models provide a framework in which the typical-case performance of high-dimensional supervised learning can be described in closed form. The assumptions of Gaussian i.i.d. input data underlying the canonical teacher-student model may, however, be perceived as too restrictive to capture the behaviour of realistic data sets. In this paper, we introduce a Gaussian covariate generalisation of the model where the teacher and student can act on different spaces, generated with fixed, but generic feature maps. While still solvable in a closed form, this generalization is able to capture the learning curves for a broad range of realistic data sets, thus redeeming the potential of the teacher-student framework. Our contribution is then two-fold: First, we prove a rigorous formula for the asymptotic training loss and generalisation error. Second, we present a number of situations where the learning curve of the model captures the one of a realistic data set learned with kernel regression and classification, with out-of-the-box feature maps such as random projections or scattering transforms, or with pre-learned ones - such as the features learned by training multi-layer neural networks. We discuss both the power and the limitations of the framework.Comment: v3: NeurIPS camera-read

    Surprises in High-Dimensional Ridgeless Least Squares Interpolation

    Full text link
    Interpolators -- estimators that achieve zero training error -- have attracted growing attention in machine learning, mainly because state-of-the art neural networks appear to be models of this type. In this paper, we study minimum 2\ell_2 norm (``ridgeless'') interpolation in high-dimensional least squares regression. We consider two different models for the feature distribution: a linear model, where the feature vectors xiRpx_i \in {\mathbb R}^p are obtained by applying a linear transform to a vector of i.i.d.\ entries, xi=Σ1/2zix_i = \Sigma^{1/2} z_i (with ziRpz_i \in {\mathbb R}^p); and a nonlinear model, where the feature vectors are obtained by passing the input through a random one-layer neural network, xi=φ(Wzi)x_i = \varphi(W z_i) (with ziRdz_i \in {\mathbb R}^d, WRp×dW \in {\mathbb R}^{p \times d} a matrix of i.i.d.\ entries, and φ\varphi an activation function acting componentwise on WziW z_i). We recover -- in a precise quantitative way -- several phenomena that have been observed in large-scale neural networks and kernel machines, including the "double descent" behavior of the prediction risk, and the potential benefits of overparametrization.Comment: 68 pages; 16 figures. This revision contains non-asymptotic version of earlier results, and results for general coefficient

    Invariance of Weight Distributions in Rectified MLPs

    Full text link
    An interesting approach to analyzing neural networks that has received renewed attention is to examine the equivalent kernel of the neural network. This is based on the fact that a fully connected feedforward network with one hidden layer, a certain weight distribution, an activation function, and an infinite number of neurons can be viewed as a mapping into a Hilbert space. We derive the equivalent kernels of MLPs with ReLU or Leaky ReLU activations for all rotationally-invariant weight distributions, generalizing a previous result that required Gaussian weight distributions. Additionally, the Central Limit Theorem is used to show that for certain activation functions, kernels corresponding to layers with weight distributions having 00 mean and finite absolute third moment are asymptotically universal, and are well approximated by the kernel corresponding to layers with spherical Gaussian weights. In deep networks, as depth increases the equivalent kernel approaches a pathological fixed point, which can be used to argue why training randomly initialized networks can be difficult. Our results also have implications for weight initialization.Comment: ICML 201

    FastMMD: Ensemble of Circular Discrepancy for Efficient Two-Sample Test

    Full text link
    The maximum mean discrepancy (MMD) is a recently proposed test statistic for two-sample test. Its quadratic time complexity, however, greatly hampers its availability to large-scale applications. To accelerate the MMD calculation, in this study we propose an efficient method called FastMMD. The core idea of FastMMD is to equivalently transform the MMD with shift-invariant kernels into the amplitude expectation of a linear combination of sinusoid components based on Bochner's theorem and Fourier transform (Rahimi & Recht, 2007). Taking advantage of sampling of Fourier transform, FastMMD decreases the time complexity for MMD calculation from O(N2d)O(N^2 d) to O(LNd)O(L N d), where NN and dd are the size and dimension of the sample set, respectively. Here LL is the number of basis functions for approximating kernels which determines the approximation accuracy. For kernels that are spherically invariant, the computation can be further accelerated to O(LNlogd)O(L N \log d) by using the Fastfood technique (Le et al., 2013). The uniform convergence of our method has also been theoretically proved in both unbiased and biased estimates. We have further provided a geometric explanation for our method, namely ensemble of circular discrepancy, which facilitates us to understand the insight of MMD, and is hopeful to help arouse more extensive metrics for assessing two-sample test. Experimental results substantiate that FastMMD is with similar accuracy as exact MMD, while with faster computation speed and lower variance than the existing MMD approximation methods
    corecore