1,190 research outputs found

    Diffusion Policies as an Expressive Policy Class for Offline Reinforcement Learning

    Full text link
    Offline reinforcement learning (RL), which aims to learn an optimal policy using a previously collected static dataset, is an important paradigm of RL. Standard RL methods often perform poorly in this regime due to the function approximation errors on out-of-distribution actions. While a variety of regularization methods have been proposed to mitigate this issue, they are often constrained by policy classes with limited expressiveness that can lead to highly suboptimal solutions. In this paper, we propose representing the policy as a diffusion model, a recent class of highly-expressive deep generative models. We introduce Diffusion Q-learning (Diffusion-QL) that utilizes a conditional diffusion model to represent the policy. In our approach, we learn an action-value function and we add a term maximizing action-values into the training loss of the conditional diffusion model, which results in a loss that seeks optimal actions that are near the behavior policy. We show the expressiveness of the diffusion model-based policy, and the coupling of the behavior cloning and policy improvement under the diffusion model both contribute to the outstanding performance of Diffusion-QL. We illustrate the superiority of our method compared to prior works in a simple 2D bandit example with a multimodal behavior policy. We then show that our method can achieve state-of-the-art performance on the majority of the D4RL benchmark tasks

    Statistical structure of lateral connections in the primary visual cortex

    Get PDF
    The statistical structure of the visual world offers many useful clues for understanding how biological visual systems may understand natural scenes. One particularly important early process in visual object recognition is that of grouping together edges which belong to the same contour. The layout of edges in natural scenes have strong statistical structure. One such statistical property is that edges tend to lie on a common circle, and this 'co-circularity' can predict human performance at contour grouping. We therefore tested the hypothesis that long-range excitatory lateral connections in the primary visual cortex, which are believed to be involved in contour grouping, display a similar co-circular structure.By analyzing data from tree shrews, where information on both lateral connectivity and the overall structure of the orientation map was available, we found a surprising diversity in the relevant statistical structure of the connections. In particular, the extent to which co-circularity was displayed varied significantly.Overall, these data suggest the intriguing possibility that V1 may contain both co-circular and anti-cocircular connections

    Natural scene statistics and the development of the primary visual cortex

    Get PDF
    Vision is the dominant human sensory modality. Due to the relative ease with which both visual input and visual brain areas can be studied and manipulated, vision has become an important window for enlarging our understanding of the biological sensory processing. Whether artificial or biological, visual processing systems must quickly and efficiently make sense of a large volume of noisy, high-dimensional input. To do this they construct statistical models of the input and utilise these models to efficiently encode visual scenes, detect features and construct a model of the world. In this thesis, we combine the study of natural scene statistics with mathematical models, experimental analysis and visual psychophysics to glean a deeper understanding of the development and function of the mammalian primary visual cortex. We start by considering functional models of receptive field development. We find, in agreement with previous work, that unsupervised learning models trained on natural scenes consistently learn that oriented ``edges'' (Gabor-like filters) are the basic features of natural scenes. The similarity between these filters and primary visual cortex receptive fields is strong evidence that primary visual cortex receptive fields are optimal encoders of visual input. We then significantly extend this work by comparing the prediction of unsupervised learning models with the receptive fields of animals reared in unusual visual environments. We find good agreement, which is evidence that aspects of receptive fields are learned during development, rather than innate. We also show that applying such unsupervised learning models to binocular visual input is not a simple extension of monocular visual input. Inter-ocular correlations change the optimal encoding strategy of binocular input so that it depends on edge orientation. Such functional models intriguingly predict an over-representation of vertically oriented receptive fields. After establishing that oriented edges are the basic feature of natural scenes and the unit of primary visual cortex receptive fields, we consider the statistics of edge arrangements in natural scenes. \citet{Sigman2001} showed that edges in natural scenes over short distances tend to be tangent to a common circle, or co-circular. Edge arrangements which contain a dependence between edge position and orientation may be said to have ``reduced symmetry'' as they lack a symmetry in that the edge position and orientation cannot be rotated independently without modifying the statistics of the arrangement. Co-circularity is one specific type of reduced symmetry. We extend previous work on natural scene co-circularity using a noise-resistant measure of co-circularity we develop and show that natural scenes contain significant co-circularity over extremely large angular distances (>14°>14\degree). We also discuss preliminary work into variations in co-circularity statistics by scene type. After establishing that co-circularity is found pervasively in natural scenes, even over large distances, we then return to the structure of the primary visual cortex, but this time at the network level. Previous work has shown that, like edges in natural scenes, V1 orientation preferences maps also have reduced symmetry. However, the details of this dependence between orientation and position have not been examined in detail. We examine cat orientation preference maps from normal, stripe and blind-reared animals and find that, although orientation preference maps do contain reduced symmetry, it is not co-circularity. Moreover, the statistics of reduced symmetry in the maps are not affected by changes to visual input during development. Continuing our examination of V1 network structure, we consider the statistics of lateral connectivity in tree shrew V1. Previous work demonstrated that long-range V1 lateral connections are more common between regions with similar orientation preferences \citep{Bosking1997}. We re-examine this connectivity data using our noise-resistance measure of co-circularity. We find evidence that lateral connections between cells in the primary visual cortex may use two opposite wiring strategies which simultaneously facilitate quick processing of co-circular visual input while increasing the salience of the less expected deviations from co-circularity. Finally, we use the psychophysics of binocular rivalry to test whether co-circularity statistics can affect the functional processing of visual input in humans. We show, using binocular rivalry dominance as an objective measure of salience, that randomly arranged edges are more salient than edge arrangements which contain co-circularity. This is evidence that early visual processing may be functionally utilising edge arrangement statistics. In concurrence with our findings about lateral connections, this may indicate a general strategy of increasing the salience of unexpected visual input. Overall, we demonstrate that early visual coding uses natural scene statistics extensively. We show that oriented edges are a key currency in early visual processing. We find that the arrangement of edges in natural scenes contain rich statistical structure which influences wiring in the primary visual cortex during development and produces measurable changes in the salience of visual stimuli

    Hyperbolic Deep Reinforcement Learning

    Full text link
    We propose a new class of deep reinforcement learning (RL) algorithms that model latent representations in hyperbolic space. Sequential decision-making requires reasoning about the possible future consequences of current behavior. Consequently, capturing the relationship between key evolving features for a given task is conducive to recovering effective policies. To this end, hyperbolic geometry provides deep RL models with a natural basis to precisely encode this inherently hierarchical information. However, applying existing methodologies from the hyperbolic deep learning literature leads to fatal optimization instabilities due to the non-stationarity and variance characterizing RL gradient estimators. Hence, we design a new general method that counteracts such optimization challenges and enables stable end-to-end learning with deep hyperbolic representations. We empirically validate our framework by applying it to popular on-policy and off-policy RL algorithms on the Procgen and Atari 100K benchmarks, attaining near universal performance and generalization benefits. Given its natural fit, we hope future RL research will consider hyperbolic representations as a standard tool.Comment: Preprin

    The combinatorics of neurite self-avoidance

    Get PDF
    During neural development in Drosophila, the ability of neurite branches to recognize whether they are from the same or different neurons depends crucially on the molecule Dscam1. In particular, this recognition depends on the stochastic acquisition of a unique combination of Dscam1 isoforms out of a large set of possible isoforms. To properly interpret these findings, it is crucial to understand the combinatorics involved, which has previously been attempted only using stochastic simulations for some specific parameter combinations. Here we present closed-form solutions for the general case. These reveal the relationships among the key variables and how these constrain possible biological scenarios

    The Earliest Dipodomyine Heteromyid in North America and the Phylogenetic Relationships of Geomorph Rodent

    Get PDF
    Dipodomyine heteromyids (kangaroo rats and mice) are a diverse group of arid-adapted ricochetal rodents of North America. Here, a new genus and species of a large dipodomyine is reported from early Miocene-aged deposits of the John Day Formation in Oregon that represents the earliest record of the subfamily. The taxon is known from a single specimen consisting of a nearly complete skull, dentary, partial pes, and caudal vertebra. The specimen is characterized by a mosaic of ancestral and highly derived cranial features of heteromyids. Specifically, the dental morphology and some cranial characteristics are similar to early heteromyids, but other aspects of morphology, including the exceptionally inflated auditory bullae, are more similar to known dipodomyines. This specimen was included in a phylogenetic analysis comprising 96 characters and the broadest sampling of living and extinct geomorph rodents of any morphological phylogenetic analysis to date. Results support the monophyly of crown-group Heteromyidae exclusive of Geomyidae and place the new taxon within Dipodomyinae. The new heteromyid is the largest known member of the family. Analyses suggest that large body size evolved several times within Heteromyidae. Overall, the morphology of the new heteromyid supports a mosaic evolution of the open-habitat adaptations that characterize kangaroo rats and mice, with the inflation of the auditory bulla appearing early in the group, and bipedality/ricochetal locomotion appearing later. We hypothesize that cooling and drying conditions in the late Oligocene and early Miocene favored adaptations for life in more open habitats, resulting in increased locomotor specialization in this lineage over time from a terrestrial ancestor
    • …
    corecore