121 research outputs found

    PeopleNet: A Novel People Counting Framework for Head-Mounted Moving Camera Videos

    Get PDF
    Traditional crowd counting (optical flow or feature matching) techniques have been upgraded to deep learning (DL) models due to their lack of automatic feature extraction and low-precision outcomes. Most of these models were tested on surveillance scene crowd datasets captured by stationary shooting equipment. It is very challenging to perform people counting from the videos shot with a head-mounted moving camera; this is mainly due to mixing the temporal information of the moving crowd with the induced camera motion. This study proposed a transfer learning-based PeopleNet model to tackle this significant problem. For this, we have made some significant changes to the standard VGG16 model, by disabling top convolutional blocks and replacing its standard fully connected layers with some new fully connected and dense layers. The strong transfer learning capability of the VGG16 network yields in-depth insights of the PeopleNet into the good quality of density maps resulting in highly accurate crowd estimation. The performance of the proposed model has been tested over a self-generated image database prepared from moving camera video clips, as there is no public and benchmark dataset for this work. The proposed framework has given promising results on various crowd categories such as dense, sparse, average, etc. To ensure versatility, we have done self and cross-evaluation on various crowd counting models and datasets, which proves the importance of the PeopleNet model in adverse defense of society

    Change blindness: eradication of gestalt strategies

    Get PDF
    Arrays of eight, texture-defined rectangles were used as stimuli in a one-shot change blindness (CB) task where there was a 50% chance that one rectangle would change orientation between two successive presentations separated by an interval. CB was eliminated by cueing the target rectangle in the first stimulus, reduced by cueing in the interval and unaffected by cueing in the second presentation. This supports the idea that a representation was formed that persisted through the interval before being 'overwritten' by the second presentation (Landman et al, 2003 Vision Research 43149–164]. Another possibility is that participants used some kind of grouping or Gestalt strategy. To test this we changed the spatial position of the rectangles in the second presentation by shifting them along imaginary spokes (by ±1 degree) emanating from the central fixation point. There was no significant difference seen in performance between this and the standard task [F(1,4)=2.565, p=0.185]. This may suggest two things: (i) Gestalt grouping is not used as a strategy in these tasks, and (ii) it gives further weight to the argument that objects may be stored and retrieved from a pre-attentional store during this task

    Visual Cortex

    Get PDF
    The neurosciences have experienced tremendous and wonderful progress in many areas, and the spectrum encompassing the neurosciences is expansive. Suffice it to mention a few classical fields: electrophysiology, genetics, physics, computer sciences, and more recently, social and marketing neurosciences. Of course, this large growth resulted in the production of many books. Perhaps the visual system and the visual cortex were in the vanguard because most animals do not produce their own light and offer thus the invaluable advantage of allowing investigators to conduct experiments in full control of the stimulus. In addition, the fascinating evolution of scientific techniques, the immense productivity of recent research, and the ensuing literature make it virtually impossible to publish in a single volume all worthwhile work accomplished throughout the scientific world. The days when a single individual, as Diderot, could undertake the production of an encyclopedia are gone forever. Indeed most approaches to studying the nervous system are valid and neuroscientists produce an almost astronomical number of interesting data accompanied by extremely worthy hypotheses which in turn generate new ventures in search of brain functions. Yet, it is fully justified to make an encore and to publish a book dedicated to visual cortex and beyond. Many reasons validate a book assembling chapters written by active researchers. Each has the opportunity to bind together data and explore original ideas whose fate will not fall into the hands of uncompromising reviewers of traditional journals. This book focuses on the cerebral cortex with a large emphasis on vision. Yet it offers the reader diverse approaches employed to investigate the brain, for instance, computer simulation, cellular responses, or rivalry between various targets and goal directed actions. This volume thus covers a large spectrum of research even though it is impossible to include all topics in the extremely diverse field of neurosciences

    Learning, Moving, And Predicting With Global Motion Representations

    Get PDF
    In order to effectively respond to and influence the world they inhabit, animals and other intelligent agents must understand and predict the state of the world and its dynamics. An agent that can characterize how the world moves is better equipped to engage it. Current methods of motion computation rely on local representations of motion (such as optical flow) or simple, rigid global representations (such as camera motion). These methods are useful, but they are difficult to estimate reliably and limited in their applicability to real-world settings, where agents frequently must reason about complex, highly nonrigid motion over long time horizons. In this dissertation, I present methods developed with the goal of building more flexible and powerful notions of motion needed by agents facing the challenges of a dynamic, nonrigid world. This work is organized around a view of motion as a global phenomenon that is not adequately addressed by local or low-level descriptions, but that is best understood when analyzed at the level of whole images and scenes. I develop methods to: (i) robustly estimate camera motion from noisy optical flow estimates by exploiting the global, statistical relationship between the optical flow field and camera motion under projective geometry; (ii) learn representations of visual motion directly from unlabeled image sequences using learning rules derived from a formulation of image transformation in terms of its group properties; (iii) predict future frames of a video by learning a joint representation of the instantaneous state of the visual world and its motion, using a view of motion as transformations of world state. I situate this work in the broader context of ongoing computational and biological investigations into the problem of estimating motion for intelligent perception and action

    NASA JSC neural network survey results

    Get PDF
    A survey of Artificial Neural Systems in support of NASA's (Johnson Space Center) Automatic Perception for Mission Planning and Flight Control Research Program was conducted. Several of the world's leading researchers contributed papers containing their most recent results on artificial neural systems. These papers were broken into categories and descriptive accounts of the results make up a large part of this report. Also included is material on sources of information on artificial neural systems such as books, technical reports, software tools, etc

    Self-Organization of Spiking Neural Networks for Visual Object Recognition

    Get PDF
    On one hand, the visual system has the ability to differentiate between very similar objects. On the other hand, we can also recognize the same object in images that vary drastically, due to different viewing angle, distance, or illumination. The ability to recognize the same object under different viewing conditions is called invariant object recognition. Such object recognition capabilities are not immediately available after birth, but are acquired through learning by experience in the visual world. In many viewing situations different views of the same object are seen in a tem- poral sequence, e.g. when we are moving an object in our hands while watching it. This creates temporal correlations between successive retinal projections that can be used to associate different views of the same object. Theorists have therefore pro- posed a synaptic plasticity rule with a built-in memory trace (trace rule). In this dissertation I present spiking neural network models that offer possible explanations for learning of invariant object representations. These models are based on the following hypotheses: 1. Instead of a synaptic trace rule, persistent firing of recurrently connected groups of neurons can serve as a memory trace for invariance learning. 2. Short-range excitatory lateral connections enable learning of self-organizing topographic maps that represent temporal as well as spatial correlations. 3. When trained with sequences of object views, such a network can learn repre- sentations that enable invariant object recognition by clustering different views of the same object within a local neighborhood. 4. Learning of representations for very similar stimuli can be enabled by adaptive inhibitory feedback connections. The study presented in chapter 3.1 details an implementation of a spiking neural network to test the first three hypotheses. This network was tested with stimulus sets that were designed in two feature dimensions to separate the impact of tempo- ral and spatial correlations on learned topographic maps. The emerging topographic maps showed patterns that were dependent on the temporal order of object views during training. Our results show that pooling over local neighborhoods of the to- pographic map enables invariant recognition. Chapter 3.2 focuses on the fourth hypothesis. There we examine how the adaptive feedback inhibition (AFI) can improve the ability of a network to discriminate between very similar patterns. The results show that with AFI learning is faster, and the network learns selective representations for stimuli with higher levels of overlap than without AFI. Results of chapter 3.1 suggest a functional role for topographic object representa- tions that are known to exist in the inferotemporal cortex, and suggests a mechanism for the development of such representations. The AFI model implements one aspect of predictive coding: subtraction of a prediction from the actual input of a system. The successful implementation in a biologically plausible network of spiking neurons shows that predictive coding can play a role in cortical circuits

    Neural dynamics of invariant object recognition: relative disparity, binocular fusion, and predictive eye movements

    Full text link
    How does the visual cortex learn invariant object categories as an observer scans a depthful scene? Two neural processes that contribute to this ability are modeled in this thesis. The first model clarifies how an object is represented in depth. Cortical area V1 computes absolute disparity, which is the horizontal difference in retinal location of an image in the left and right foveas. Many cells in cortical area V2 compute relative disparity, which is the difference in absolute disparity of two visible features. Relative, but not absolute, disparity is unaffected by the distance of visual stimuli from an observer, and by vergence eye movements. A laminar cortical model of V2 that includes shunting lateral inhibition of disparity-sensitive layer 4 cells causes a peak shift in cell responses that transforms absolute disparity from V1 into relative disparity in V2. The second model simulates how the brain maintains stable percepts of a 3D scene during binocular movements. The visual cortex initiates the formation of a 3D boundary and surface representation by binocularly fusing corresponding features from the left and right retinotopic images. However, after each saccadic eye movement, every scenic feature projects to a different combination of retinal positions than before the saccade. Yet the 3D representation, resulting from the prior fusion, is stable through the post-saccadic re-fusion. One key to stability is predictive remapping: the system anticipates the new retinal positions of features entailed by eye movements by using gain fields that are updated by eye movement commands. The 3D ARTSCAN model developed here simulates how perceptual, attentional, and cognitive interactions across different brain regions within the What and Where visual processing streams interact to coordinate predictive remapping, stable 3D boundary and surface perception, spatial attention, and the learning of object categories that are invariant to changes in an object's retinal projections. Such invariant learning helps the system to avoid treating each new view of the same object as a distinct object to be learned. The thesis hereby shows how a process that enables invariant object category learning can be extended to also enable stable 3D scene perception
    • …
    corecore