1,805 research outputs found

    A Spiking Neural Network Based Cortex-Like Mechanism and Application to Facial Expression Recognition

    Get PDF
    In this paper, we present a quantitative, highly structured cortex-simulated model, which can be simply described as feedforward, hierarchical simulation of ventral stream of visual cortex using biologically plausible, computationally convenient spiking neural network system. The motivation comes directly from recent pioneering works on detailed functional decomposition analysis of the feedforward pathway of the ventral stream of visual cortex and developments on artificial spiking neural networks (SNNs). By combining the logical structure of the cortical hierarchy and computing power of the spiking neuron model, a practical framework has been presented. As a proof of principle, we demonstrate our system on several facial expression recognition tasks. The proposed cortical-like feedforward hierarchy framework has the merit of capability of dealing with complicated pattern recognition problems, suggesting that, by combining the cognitive models with modern neurocomputational approaches, the neurosystematic approach to the study of cortex-like mechanism has the potential to extend our knowledge of brain mechanisms underlying the cognitive analysis and to advance theoretical models of how we recognize face or, more specifically, perceive other people’s facial expression in a rich, dynamic, and complex environment, providing a new starting point for improved models of visual cortex-like mechanism

    A specialized face-processing model inspired by the organization of monkey face patches explains several face-specific phenomena observed in humans

    Get PDF
    Converging reports indicate that face images are processed through specialized neural networks in the brain –i.e. face patches in monkeys and the fusiform face area (FFA) in humans. These studies were designed to find out how faces are processed in visual system compared to other objects. Yet, the underlying mechanism of face processing is not completely revealed. Here, we show that a hierarchical computational model, inspired by electrophysiological evidence on face processing in primates, is able to generate representational properties similar to those observed in monkey face patches (posterior, middle and anterior patches). Since the most important goal of sensory neuroscience is linking the neural responses with behavioral outputs, we test whether the proposed model, which is designed to account for neural responses in monkey face patches, is also able to predict well-documented behavioral face phenomena observed in humans. We show that the proposed model satisfies several cognitive face effects such as: composite face effect and the idea of canonical face views. Our model provides insights about the underlying computations that transfer visual information from posterior to anterior face patches

    Multi-Domain Norm-referenced Encoding Enables Data Efficient Transfer Learning of Facial Expression Recognition

    Full text link
    People can innately recognize human facial expressions in unnatural forms, such as when depicted on the unusual faces drawn in cartoons or when applied to an animal's features. However, current machine learning algorithms struggle with out-of-domain transfer in facial expression recognition (FER). We propose a biologically-inspired mechanism for such transfer learning, which is based on norm-referenced encoding, where patterns are encoded in terms of difference vectors relative to a domain-specific reference vector. By incorporating domain-specific reference frames, we demonstrate high data efficiency in transfer learning across multiple domains. Our proposed architecture provides an explanation for how the human brain might innately recognize facial expressions on varying head shapes (humans, monkeys, and cartoon avatars) without extensive training. Norm-referenced encoding also allows the intensity of the expression to be read out directly from neural unit activity, similar to face-selective neurons in the brain. Our model achieves a classification accuracy of 92.15\% on the FERG dataset with extreme data efficiency. We train our proposed mechanism with only 12 images, including a single image of each class (facial expression) and one image per domain (avatar). In comparison, the authors of the FERG dataset achieved a classification accuracy of 89.02\% with their FaceExpr model, which was trained on 43,000 images

    Invariant Visual Object and Face Recognition: Neural and Computational Bases, and a Model, VisNet

    Get PDF
    Neurophysiological evidence for invariant representations of objects and faces in the primate inferior temporal visual cortex is described. Then a computational approach to how invariant representations are formed in the brain is described that builds on the neurophysiology. A feature hierarchy model in which invariant representations can be built by self-organizing learning based on the temporal and spatial statistics of the visual input produced by objects as they transform in the world is described. VisNet can use temporal continuity in an associative synaptic learning rule with a short-term memory trace, and/or it can use spatial continuity in continuous spatial transformation learning which does not require a temporal trace. The model of visual processing in the ventral cortical stream can build representations of objects that are invariant with respect to translation, view, size, and also lighting. The model has been extended to provide an account of invariant representations in the dorsal visual system of the global motion produced by objects such as looming, rotation, and object-based movement. The model has been extended to incorporate top-down feedback connections to model the control of attention by biased competition in, for example, spatial and object search tasks. The approach has also been extended to account for how the visual system can select single objects in complex visual scenes, and how multiple objects can be represented in a scene. The approach has also been extended to provide, with an additional layer, for the development of representations of spatial scenes of the type found in the hippocampus

    A Taxonomy of Deep Convolutional Neural Nets for Computer Vision

    Get PDF
    Traditional architectures for solving computer vision problems and the degree of success they enjoyed have been heavily reliant on hand-crafted features. However, of late, deep learning techniques have offered a compelling alternative -- that of automatically learning problem-specific features. With this new paradigm, every problem in computer vision is now being re-examined from a deep learning perspective. Therefore, it has become important to understand what kind of deep networks are suitable for a given problem. Although general surveys of this fast-moving paradigm (i.e. deep-networks) exist, a survey specific to computer vision is missing. We specifically consider one form of deep networks widely used in computer vision - convolutional neural networks (CNNs). We start with "AlexNet" as our base CNN and then examine the broad variations proposed over time to suit different applications. We hope that our recipe-style survey will serve as a guide, particularly for novice practitioners intending to use deep-learning techniques for computer vision.Comment: Published in Frontiers in Robotics and AI (http://goo.gl/6691Bm

    Deep Spiking Neural Network for Video-based Disguise Face Recognition Based on Dynamic Facial Movements

    Get PDF
    With the increasing popularity of social media andsmart devices, the face as one of the key biometrics becomesvital for person identification. Amongst those face recognitionalgorithms, video-based face recognition methods could make useof both temporal and spatial information just as humans do toachieve better classification performance. However, they cannotidentify individuals when certain key facial areas like eyes or noseare disguised by heavy makeup or rubber/digital masks. To thisend, we propose a novel deep spiking neural network architecturein this study. It takes dynamic facial movements, the facial musclechanges induced by speaking or other activities, as the sole input.An event-driven continuous spike-timing dependent plasticitylearning rule with adaptive thresholding is applied to train thesynaptic weights. The experiments on our proposed video-baseddisguise face database (MakeFace DB) demonstrate that theproposed learning method performs very well - it achieves from95% to 100% correct classification rates under various realisticexperimental scenario

    Informationsrouting, Korrespondenzfindung und Objekterkennung im Gehirn

    Get PDF
    The dissertation deals with the general problem of how the brain can establish correspondences between neural patterns stored in different cortical areas. Although an important capability in many cognitive areas like language understanding, abstract reasoning, or motor control, this thesis concentrates on invariant object recognition as application of correspondence finding. One part of the work presents a correspondence-based, neurally plausible system for face recognition. Other parts address the question of visual information routing over several stages by proposing optimal architectures for such routing ('switchyards') and deriving ontogenetic mechanisms for the growth of switchyards. Finally, the idea of multi-stage routing is united with the object recognition system introduced before, making suggestions of how the so far distinct feature-based and correspondence-based approaches to object recognition could be reconciled.Allgemein gesprochen beschäftigt sich die vorliegende Arbeit mit der Frage, wie das Gehirn Korrespondenzen zwischen Aktivitätsmustern finden kann. Dies ist ein zentrales Thema in der visuellen Objekterkennung, hat aber Bedeutung für alle Bereiche der neuronalen Datenverarbeitung vom Hören bis zum abstrakten Denken. Das Korrespondenzfinden sollte invariant gegenüber Veränderungen sein, die das Erscheinungsbild, aber nicht die Bedeutung der Muster ändern. Außerdem sollte es auch funktionieren, wenn die beiden Muster nicht direkt, sondern nur über Zwischenstationen miteinander verbunden sind. Voraussetzungen für das invariante Korrespondenzfinden zwischen Mustern sind einerseits die Existenz sinnvoller Verbindungsstrukturen, und andererseits ein prinzipieller neuronaler Mechanismus zum Finden von Korrespondenzen. Mit einem prinzipiellen Korrespondenzfindungsmechanismus befasst sich Kapitel 2 der Arbeit. Dieser beruht auf dynamischen Links zwischen den Punkten beider Muster, die durch punktuelle ähnlichkeit der Muster und globale Konsistenz mit benachbarten Links aktiviert werden. In mehrschichtigen Systemen können dynamische Links außer zur Korrespondenzfindung auch zum kontrollierten Routing von Information verwendet werden. Unter Verwendung dieser Eigenschaft wird in Kapitel 2 ein Gesichtserkennungssystem entwickelt, das invariant gegenüber Verschiebung und robust gegenüber Verformungen ist und gute Performanz auf Benchmarkdatenbanken In Kapitel 3 wird untersucht, was die sparsamste Methode ist, neuronale Muster so zu verbinden, dass es von jedem Punkt des einen Musters einen Pfad zu jedem Punkt des anderen gibt und visuelle Information von einem Muster zum anderen geroutet werden kann. Dabei wird die Gesamtmenge an benötigten neuronalen Ressourcen, also sowohl Verbindungen als auch merkmalrepräsentierende Einheiten der Zwischenschichten, minimiert. Dies führt zu mehrstufigen Strukturen mit weit gespreizten, aber dünn besetzten Verästelungen, die wir Switchyards nennen. Bei der Interpretation der Ergebnisse zeigt sich, dass Switchyards mit den qualitativen und quantitativen Gegebenheiten im Primatenhirn vereinbar sind, soweit diese bekannt sind. Kapitel 4 beschäftigt sich mit der Frage, wie solche doch recht komplizierten neuronalen Verbindungsstrukturen ontogenetisch entstehen können. Es wird ein möglicher Mechanismus vorgestellt, der auf chemischen Markern basiert. Die Marker werden von den Einheiten der untersten Schicht gebildet und diffundieren durch die entstehenden Verbindungen nach oben. Verbindungen wachsen bevorzugt zwischen Einheiten, die sehr unähnliche chemische Marker enthalten. Die resultierenden Verbindungsstrukturen sind beinahe identisch mit den in Kapitel 3 analytisch hergeleiteten Architekturen und biologisch sogar noch plausibler. Kapitel 5 führt die Ideen der vorangegangenen Kapitel zusammen, um das Korrespondenzfinden zwischen Mustern über mehrstufige Routingstrukturen hinweg zu realisieren. Es wird gezeigt, wie mit Hilfe von Switchyards Korrespondenzen zwischen normalen'' visuellen Mustern gefunden werden können, obwohl anfangs keine der einzelnen Stufen des Switchyards auf beiden Seiten Muster anliegen hat, die miteinander abgeglichen werden könnten. Im Anschluss wird das Prinzip zu einem vollständigen Erkennungssystem ausgebaut, das über mehrere Routingstufen hinweg ein gegebenes Eingangsmuster positionsinvariant einem mehrerer gespeicherter Muster zuordnen kann
    corecore