12 research outputs found

    Replica method for eigenvalues of real Wishart product matrices

    Full text link
    We show how the replica method can be used to compute the asymptotic eigenvalue spectrum of a real Wishart product matrix. For unstructured factors, this provides a compact, elementary derivation of a polynomial condition on the Stieltjes transform first proved by M{\"u}ller [IEEE Trans. Inf. Theory. 48, 2086-2091 (2002)]. We then show how this computation can be extended to ensembles where the factors have correlated rows. Finally, we derive polynomial conditions on the average values of the minimum and maximum eigenvalues, which match the results obtained by Akemann, Ipsen, and Kieburg [Phys. Rev. E 88, 052118 (2013)] for the complex Wishart product ensemble.Comment: 35 pages, 4 figure

    Asymptotics of representation learning in finite Bayesian neural networks

    Full text link
    Recent works have suggested that finite Bayesian neural networks may sometimes outperform their infinite cousins because finite networks can flexibly adapt their internal representations. However, our theoretical understanding of how the learned hidden layer representations of finite networks differ from the fixed representations of infinite networks remains incomplete. Perturbative finite-width corrections to the network prior and posterior have been studied, but the asymptotics of learned features have not been fully characterized. Here, we argue that the leading finite-width corrections to the average feature kernels for any Bayesian network with linear readout and Gaussian likelihood have a largely universal form. We illustrate this explicitly for three tractable network architectures: deep linear fully-connected and convolutional networks, and networks with a single nonlinear hidden layer. Our results begin to elucidate how task-relevant learning signals shape the hidden layer representations of wide Bayesian neural networks.Comment: 13+28 pages, 4 figures; v3: extensive revision with improved exposition and new section on CNNs, accepted to NeurIPS 2021; v4: minor updates to supplemen

    Long Sequence Hopfield Memory

    Full text link
    Sequence memory is an essential attribute of natural and artificial intelligence that enables agents to encode, store, and retrieve complex sequences of stimuli and actions. Computational models of sequence memory have been proposed where recurrent Hopfield-like neural networks are trained with temporally asymmetric Hebbian rules. However, these networks suffer from limited sequence capacity (maximal length of the stored sequence) due to interference between the memories. Inspired by recent work on Dense Associative Memories, we expand the sequence capacity of these models by introducing a nonlinear interaction term, enhancing separation between the patterns. We derive novel scaling laws for sequence capacity with respect to network size, significantly outperforming existing scaling laws for models based on traditional Hopfield networks, and verify these theoretical results with numerical simulation. Moreover, we introduce a generalized pseudoinverse rule to recall sequences of highly correlated patterns. Finally, we extend this model to store sequences with variable timing between states' transitions and describe a biologically-plausible implementation, with connections to motor neuroscience.Comment: NeurIPS 2023 Camera-Ready, 41 page

    Learning curves for deep structured Gaussian feature models

    Full text link
    In recent years, significant attention in deep learning theory has been devoted to analyzing the generalization performance of models with multiple layers of Gaussian random features. However, few works have considered the effect of feature anisotropy; most assume that features are generated using independent and identically distributed Gaussian weights. Here, we derive learning curves for models with many layers of structured Gaussian features. We show that allowing correlations between the rows of the first layer of features can aid generalization, while structure in later layers is generally detrimental. Our results shed light on how weight structure affects generalization in a simple class of solvable models.Comment: 28 pages, 3 figure

    Tripod gait example data.

    No full text
    MATLAB version 7.3 .mat file, created using MATLAB 9.4
    corecore