25 research outputs found

    Amortizing intractable inference in large language models

    Full text link
    Autoregressive large language models (LLMs) compress knowledge from their training data through next-token conditional distributions. This limits tractable querying of this knowledge to start-to-end autoregressive sampling. However, many tasks of interest -- including sequence continuation, infilling, and other forms of constrained generation -- involve sampling from intractable posterior distributions. We address this limitation by using amortized Bayesian inference to sample from these intractable posteriors. Such amortization is algorithmically achieved by fine-tuning LLMs via diversity-seeking reinforcement learning algorithms: generative flow networks (GFlowNets). We empirically demonstrate that this distribution-matching paradigm of LLM fine-tuning can serve as an effective alternative to maximum-likelihood training and reward-maximizing policy optimization. As an important application, we interpret chain-of-thought reasoning as a latent variable modeling problem and demonstrate that our approach enables data-efficient adaptation of LLMs to tasks that require multi-step rationalization and tool use.Comment: 23 pages; code: https://github.com/GFNOrg/gfn-lm-tunin

    Consciousness in Artificial Intelligence: Insights from the Science of Consciousness

    Full text link
    Whether current or near-term AI systems could be conscious is a topic of scientific interest and increasing public concern. This report argues for, and exemplifies, a rigorous and empirically grounded approach to AI consciousness: assessing existing AI systems in detail, in light of our best-supported neuroscientific theories of consciousness. We survey several prominent scientific theories of consciousness, including recurrent processing theory, global workspace theory, higher-order theories, predictive processing, and attention schema theory. From these theories we derive "indicator properties" of consciousness, elucidated in computational terms that allow us to assess AI systems for these properties. We use these indicator properties to assess several recent AI systems, and we discuss how future systems might implement them. Our analysis suggests that no current AI systems are conscious, but also suggests that there are no obvious technical barriers to building AI systems which satisfy these indicators

    Kinematical analysis of the nutation speed reducer

    Get PDF
    This paper discusses the development of a Nutating Speed Reducer (NSR) which is characterized by high reduction ratio, high tooth contact ratio, very high torque to weight/volume ratio, quiet and smooth operation under load and very high efficiency. All of these advantages are due to the presence of conjugate face-gear pairs, which incorporate each other, which called nutating/rotating gear mechanism. Details of the NSR, its kinematics, gear tooth load capacity, and mesh efficiency are explained. The NSR component speeds and speed reduction ratios of the NSR are calculated. Effect of the varying nutation angles on the geometry of the NSR is discussed and compared

    Mythe de la femme chez Vigny

    No full text

    Relationship between model eigenspectra and encoding performance.

    No full text
    Each curve shows the eigenspectrum of one layer from one DNN. The x-axis is the index of principal components, sorted in decreasing order of variance, and the y-axis is the variance along each principal component (scaled by a constant in order to align all curves for comparison). The black line is a reference for a power law function that decays as , where i is the principal component index. This power law of was hypothesized in [40] to be a theoretical upper limit on the latent dimensionality of smooth representations. a: Eigenspectra are color-coded as a function of the corresponding encoding performance each model achieved. Models with more slowly decaying eigenspectra (i.e., higher latent dimensionality) are better predictors of cortical activity, with top-performing models approaching the theoretical upper bound on dimensionality proposed in [40]. Encoding performance is for the IT brain region, which had the strongest relationship with ED among regions we considered. b: Eigenspectra are color-coded as a function of their corresponding ED. Since ED is a summary statistic of an eigenspectrum meant to quantify its rate of decay, models with more slowly decaying eigenspectra tend to have higher ED.</p

    ED and representational similarity analysis.

    No full text
    Comparing ED to neural data using representational similarity analysis instead of encoding performance. (PDF)</p

    Other correlates of DNN encoding model performance.

    No full text
    Comparison of encoding performance to DNN features other than ED, such as classification performance and other geometric statistics of the representations. (PDF)</p

    Theory of latent dimensionality and encoding performance.

    No full text
    Detailed description of the theory of ED and encoding performance sketched out in Section Dimensionality and alignment in computational brain models. (PDF)</p

    Additional simulation results.

    No full text
    Results for additional analyses that we conducted by varying parameters of the simulations described in Section Dimensionality and alignment in computational brain models. (PDF)</p

    Additional analyses of ED and encoding performance.

    No full text
    Analyses that examine the relationship between ED and encoding performance under different conditions from those described in the main paper. (PDF)</p
    corecore