29 research outputs found

    Learning long-range spatial dependencies with horizontal gated-recurrent units

    Full text link
    Progress in deep learning has spawned great successes in many engineering applications. As a prime example, convolutional neural networks, a type of feedforward neural networks, are now approaching -- and sometimes even surpassing -- human accuracy on a variety of visual recognition tasks. Here, however, we show that these neural networks and their recent extensions struggle in recognition tasks where co-dependent visual features must be detected over long spatial ranges. We introduce the horizontal gated-recurrent unit (hGRU) to learn intrinsic horizontal connections -- both within and across feature columns. We demonstrate that a single hGRU layer matches or outperforms all tested feedforward hierarchical baselines including state-of-the-art architectures which have orders of magnitude more free parameters. We further discuss the biological plausibility of the hGRU in comparison to anatomical data from the visual cortex as well as human behavioral data on a classic contour detection task.Comment: Published at NeurIPS 2018 https://papers.nips.cc/paper/7300-learning-long-range-spatial-dependencies-with-horizontal-gated-recurrent-unit

    Fixing the problems of deep neural networks will require better training data and learning algorithms

    Full text link
    Bowers and colleagues argue that DNNs are poor models of biological vision because they often learn to rival human accuracy by relying on strategies that differ markedly from those of humans. We show that this problem is worsening as DNNs are becoming larger-scale and increasingly more accurate, and prescribe methods for building DNNs that can reliably model biological vision.Comment: Published as a commentary in Behavioral and Brain Science

    A revised framework for human scene recognition

    Get PDF
    Thesis advisor: Sean P. MacEvoyFor humans, healthy and productive living depends on navigating through the world and behaving appropriately along the way. But in order to do this, humans must first recognize their visual surroundings. The technical difficulty of this task is hard to comprehend: the number of possible scenes that can fall on the retina approaches infinity, and yet humans often effortlessly and rapidly recognize their surroundings. Understanding how humans accomplish this task has long been a goal of psychology and neuroscience, and more recently, has proven useful in inspiring and constraining the development of new algorithms for artificial intelligence (AI). In this thesis I begin by reviewing the current state of scene recognition research, drawing upon evidence from each of these areas, and discussing an unchallenged assumption in the literature: that scene recognition emerges from independently processing information about scenes’ local visual features (i.e. the kinds of objects they contain) and global visual features (i.e., spatial parameters. ). Over the course of several projects, I challenge this assumption with a new framework for scene recognition that indicates a crucial role for information sharing between these resources. Development and validation of this framework will expand our understanding of scene recognition in humans and provide new avenues for research by expanding these concepts to other domains spanning psychology, neuroscience, and AI.Thesis (PhD) — Boston College, 2016.Submitted to: Boston College. Graduate School of Arts and Sciences.Discipline: Psychology

    Large-scale discovery of visual features for object recognition

    Get PDF
    A central goal in vision science is to identify features that are important for object and scene recognition. Reverse correlation methods have been used to uncover features important for recognizing faces and other stimuli with low intra-class variability. However, these methods are less successful when applied to natural scenes with variability in their appearance. To rectify this, we developed Clicktionary, a web-based game for identifying features for recognizing real-world objects. Pairs of participants play together in different roles to identify objects: A “teacher” reveals image regions diagnostic of the object’s category while a “student” tries to recognize the object. Aggregating game data across players yields importance maps for objects, where each pixel is scored by its contribution to recognition. We found that these importance maps are consistent across participants and identify object features that are distinct from those used by deep convolutional networks (DCNs) for object recognition or those predicted by salience maps derived from both human participants and models. We also extended Clicktionary to support large-scale feature map discovery (http://clickme.ai), whereby human teachers play with DCN students. This has generated a dataset of tens of thousands of unique images, which we incorporate into DCN training routines to make them emphasize these features. This procedure changes DCN object representations, reducing the reliance on background information and highlighting similar features as humans. Human feature importance maps identified by Clicktionary and our DCN models trained with this information will enable a richer understanding of the foundations of object recognition

    Performance-optimized deep neural networks are evolving into worse models of inferotemporal visual cortex

    Full text link
    One of the most impactful findings in computational neuroscience over the past decade is that the object recognition accuracy of deep neural networks (DNNs) correlates with their ability to predict neural responses to natural images in the inferotemporal (IT) cortex. This discovery supported the long-held theory that object recognition is a core objective of the visual cortex, and suggested that more accurate DNNs would serve as better models of IT neuron responses to images. Since then, deep learning has undergone a revolution of scale: billion parameter-scale DNNs trained on billions of images are rivaling or outperforming humans at visual tasks including object recognition. Have today's DNNs become more accurate at predicting IT neuron responses to images as they have grown more accurate at object recognition? Surprisingly, across three independent experiments, we find this is not the case. DNNs have become progressively worse models of IT as their accuracy has increased on ImageNet. To understand why DNNs experience this trade-off and evaluate if they are still an appropriate paradigm for modeling the visual system, we turn to recordings of IT that capture spatially resolved maps of neuronal activity elicited by natural images. These neuronal activity maps reveal that DNNs trained on ImageNet learn to rely on different visual features than those encoded by IT and that this problem worsens as their accuracy increases. We successfully resolved this issue with the neural harmonizer, a plug-and-play training routine for DNNs that aligns their learned representations with humans. Our results suggest that harmonized DNNs break the trade-off between ImageNet accuracy and neural prediction accuracy that assails current DNNs and offer a path to more accurate models of biological vision
    corecore