375 research outputs found

    Biologically-inspired hierarchical architectures for object recognition

    Get PDF
    PhD ThesisThe existing methods for machine vision translate the three-dimensional objects in the real world into two-dimensional images. These methods have achieved acceptable performances in recognising objects. However, the recognition performance drops dramatically when objects are transformed, for instance, the background, orientation, position in the image, and scale. The human’s visual cortex has evolved to form an efficient invariant representation of objects from within a scene. The superior performance of human can be explained by the feed-forward multi-layer hierarchical structure of human visual cortex, in addition to, the utilisation of different fields of vision depending on the recognition task. Therefore, the research community investigated building systems that mimic the hierarchical architecture of the human visual cortex as an ultimate objective. The aim of this thesis can be summarised as developing hierarchical models of the visual processing that tackle the remaining challenges of object recognition. To enhance the existing models of object recognition and to overcome the above-mentioned issues, three major contributions are made that can be summarised as the followings 1. building a hierarchical model within an abstract architecture that achieves good performances in challenging image object datasets; 2. investigating the contribution for each region of vision for object and scene images in order to increase the recognition performance and decrease the size of the processed data; 3. further enhance the performance of all existing models of object recognition by introducing hierarchical topologies that utilise the context in which the object is found to determine the identity of the object. Statement ofHigher Committee For Education Development in Iraq (HCED

    Sparse Modeling for Image and Vision Processing

    Get PDF
    In recent years, a large amount of multi-disciplinary research has been conducted on sparse models and their applications. In statistics and machine learning, the sparsity principle is used to perform model selection---that is, automatically selecting a simple model among a large collection of them. In signal processing, sparse coding consists of representing data with linear combinations of a few dictionary elements. Subsequently, the corresponding tools have been widely adopted by several scientific communities such as neuroscience, bioinformatics, or computer vision. The goal of this monograph is to offer a self-contained view of sparse modeling for visual recognition and image processing. More specifically, we focus on applications where the dictionary is learned and adapted to data, yielding a compact representation that has been successful in various contexts.Comment: 205 pages, to appear in Foundations and Trends in Computer Graphics and Visio

    Representation Learning: A Review and New Perspectives

    Full text link
    The success of machine learning algorithms generally depends on data representation, and we hypothesize that this is because different representations can entangle and hide more or less the different explanatory factors of variation behind the data. Although specific domain knowledge can be used to help design representations, learning with generic priors can also be used, and the quest for AI is motivating the design of more powerful representation-learning algorithms implementing such priors. This paper reviews recent work in the area of unsupervised feature learning and deep learning, covering advances in probabilistic models, auto-encoders, manifold learning, and deep networks. This motivates longer-term unanswered questions about the appropriate objectives for learning good representations, for computing representations (i.e., inference), and the geometrical connections between representation learning, density estimation and manifold learning

    Deep Convolutional Neural Networks Outperform Feature-Based But Not Categorical Models in Explaining Object Similarity Judgments.

    Get PDF
    Recent advances in Deep convolutional Neural Networks (DNNs) have enabled unprecedentedly accurate computational models of brain representations, and present an exciting opportunity to model diverse cognitive functions. State-of-the-art DNNs achieve human-level performance on object categorisation, but it is unclear how well they capture human behavior on complex cognitive tasks. Recent reports suggest that DNNs can explain significant variance in one such task, judging object similarity. Here, we extend these findings by replicating them for a rich set of object images, comparing performance across layers within two DNNs of different depths, and examining how the DNNs' performance compares to that of non-computational "conceptual" models. Human observers performed similarity judgments for a set of 92 images of real-world objects. Representations of the same images were obtained in each of the layers of two DNNs of different depths (8-layer AlexNet and 16-layer VGG-16). To create conceptual models, other human observers generated visual-feature labels (e.g., "eye") and category labels (e.g., "animal") for the same image set. Feature labels were divided into parts, colors, textures and contours, while category labels were divided into subordinate, basic, and superordinate categories. We fitted models derived from the features, categories, and from each layer of each DNN to the similarity judgments, using representational similarity analysis to evaluate model performance. In both DNNs, similarity within the last layer explains most of the explainable variance in human similarity judgments. The last layer outperforms almost all feature-based models. Late and mid-level layers outperform some but not all feature-based models. Importantly, categorical models predict similarity judgments significantly better than any DNN layer. Our results provide further evidence for commonalities between DNNs and brain representations. Models derived from visual features other than object parts perform relatively poorly, perhaps because DNNs more comprehensively capture the colors, textures and contours which matter to human object perception. However, categorical models outperform DNNs, suggesting that further work may be needed to bring high-level semantic representations in DNNs closer to those extracted by humans. Modern DNNs explain similarity judgments remarkably well considering they were not trained on this task, and are promising models for many aspects of human cognition

    Über die Selbstorganisation einer hierarchischen GedĂ€chtnisstruktur fĂŒr kompositionelle ObjektreprĂ€sentation im visuellen Kortex

    Get PDF
    At present, there is a huge lag between the artificial and the biological information processing systems in terms of their capability to learn. This lag could be certainly reduced by gaining more insight into the higher functions of the brain like learning and memory. For instance, primate visual cortex is thought to provide the long-term memory for the visual objects acquired by experience. The visual cortex handles effortlessly arbitrary complex objects by decomposing them rapidly into constituent components of much lower complexity along hierarchically organized visual pathways. How this processing architecture self-organizes into a memory domain that employs such compositional object representation by learning from experience remains to a large extent a riddle. The study presented here approaches this question by proposing a functional model of a self-organizing hierarchical memory network. The model is based on hypothetical neuronal mechanisms involved in cortical processing and adaptation. The network architecture comprises two consecutive layers of distributed, recurrently interconnected modules. Each module is identified with a localized cortical cluster of fine-scale excitatory subnetworks. A single module performs competitive unsupervised learning on the incoming afferent signals to form a suitable representation of the locally accessible input space. The network employs an operating scheme where ongoing processing is made of discrete successive fragments termed decision cycles, presumably identifiable with the fast gamma rhythms observed in the cortex. The cycles are synchronized across the distributed modules that produce highly sparse activity within each cycle by instantiating a local winner-take-all-like operation. Equipped with adaptive mechanisms of bidirectional synaptic plasticity and homeostatic activity regulation, the network is exposed to natural face images of different persons. The images are presented incrementally one per cycle to the lower network layer as a set of Gabor filter responses extracted from local facial landmarks. The images are presented without any person identity labels. In the course of unsupervised learning, the network creates simultaneously vocabularies of reusable local face appearance elements, captures relations between the elements by linking associatively those parts that encode the same face identity, develops the higher-order identity symbols for the memorized compositions and projects this information back onto the vocabularies in generative manner. This learning corresponds to the simultaneous formation of bottom-up, lateral and top-down synaptic connectivity within and between the network layers. In the mature connectivity state, the network holds thus full compositional description of the experienced faces in form of sparse memory traces that reside in the feed-forward and recurrent connectivity. Due to the generative nature of the established representation, the network is able to recreate the full compositional description of a memorized face in terms of all its constituent parts given only its higher-order identity symbol or a subset of its parts. In the test phase, the network successfully proves its ability to recognize identity and gender of the persons from alternative face views not shown before. An intriguing feature of the emerging memory network is its ability to self-generate activity spontaneously in absence of the external stimuli. In this sleep-like off-line mode, the network shows a self-sustaining replay of the memory content formed during the previous learning. Remarkably, the recognition performance is tremendously boosted after this off-line memory reprocessing. The performance boost is articulated stronger on those face views that deviate more from the original view shown during the learning. This indicates that the off-line memory reprocessing during the sleep-like state specifically improves the generalization capability of the memory network. The positive effect turns out to be surprisingly independent of synapse-specific plasticity, relying completely on the synapse-unspecific, homeostatic activity regulation across the memory network. The developed network demonstrates thus functionality not shown by any previous neuronal modeling approach. It forms and maintains a memory domain for compositional, generative object representation in unsupervised manner through experience with natural visual images, using both on- ("wake") and off-line ("sleep") learning regimes. This functionality offers a promising departure point for further studies, aiming for deeper insight into the learning mechanisms employed by the brain and their consequent implementation in the artificial adaptive systems for solving complex tasks not tractable so far.GegenwĂ€rtig besteht immer noch ein enormer Abstand zwischen der LernfĂ€higkeit von kĂŒnstlichen und biologischen Informationsverarbeitungssystemen. Dieser Abstand ließe sich durch eine bessere Einsicht in die höheren Funktionen des Gehirns wie Lernen und GedĂ€chtnis verringern. Im visuellen Kortex etwa werden die Objekte innerhalb kĂŒrzester Zeit entlang der hierarchischen Verarbeitungspfade in ihre Bestandteile zerlegt und so durch eine Komposition von Elementen niedrigerer KomplexitĂ€t dargestellt. Bereits bekannte Objekte werden so aus dem LangzeitgedĂ€chtnis abgerufen und wiedererkannt. Wie eine derartige kompositionell-hierarchische GedĂ€chtnisstruktur durch die visuelle Erfahrung zustande kommen kann, ist noch weitgehend ungeklĂ€rt. Um dieser Frage nachzugehen, wird hier ein funktionelles Modell eines lernfĂ€higen rekurrenten neuronalen Netzwerkes vorgestellt. Im Netzwerk werden neuronale Mechanismen implementiert, die der kortikalen Verarbeitung und PlastizitĂ€t zugrunde liegen. Die hierarchische Architektur des Netzwerkes besteht aus zwei nacheinander geschalteten Schichten, die jede eine Anzahl von verteilten, rekurrent vernetzten Modulen beherbergen. Ein Modul umfasst dabei mehrere funktionell separate Subnetzwerke. Jedes solches Modul ist imstande, aus den eintreffenden Signalen eine geeignete ReprĂ€sentation fĂŒr den lokalen Eingaberaum unĂŒberwacht zu lernen. Die fortlaufende Verarbeitung im Netzwerk setzt sich zusammen aus diskreten Fragmenten, genannt Entscheidungszyklen, die man mit den schnellen kortikalen Rhythmen im gamma-Frequenzbereich in Verbindung setzen kann. Die Zyklen sind synchronisiert zwischen den verteilten Modulen. Innerhalb eines Zyklus wird eine lokal umgrenzte winner-take-all-Ă€hnliche Operation in Modulen durchgefĂŒhrt. Die KompetitionsstĂ€rke wĂ€chst im Laufe des Zyklus an. Diese Operation aktiviert in AbhĂ€ngigkeit von den Eingabesignalen eine sehr kleine Anzahl von Einheiten und verstĂ€rkt sie auf Kosten der anderen, um den dargebotenen Reiz in der NetzwerkaktivitĂ€t abzubilden. Ausgestattet mit adaptiven Mechanismen der bidirektionalen synaptischen PlastizitĂ€t und der homöostatischen AktivitĂ€tsregulierung, erhĂ€lt das Netzwerk natĂŒrliche Gesichtsbilder von verschiedenen Personen dargeboten. Die Bilder werden der unteren Netzwerkschicht, je ein Bild pro Zyklus, als Ansammlung von Gaborfilterantworten aus lokalen Gesichtslandmarken zugefĂŒhrt, ohne Information ĂŒber die PersonenidentitĂ€t zur VerfĂŒgung zu stellen. Im Laufe der unĂŒberwachten Lernprozedur formt das Netzwerk die Verbindungsstruktur derart, dass die Gesichter aller dargebotenen Personen im Netzwerk in Form von dĂŒnn besiedelten GedĂ€chtnisspuren abgelegt werden. Hierzu werden gleichzeitig vorwĂ€rtsgerichtete (bottom-up) und rekurrente (lateral, top-down) synaptische Verbindungen innerhalb und zwischen den Schichten gelernt. Im reifen Verbindungszustand werden infolge dieses Lernens die einzelnen Gesichter als Komposition ihrer Bestandteile auf generative Art gespeichert. Dank der generativen Art der gelernten Struktur reichen schon allein das höhere IdentitĂ€tssymbol oder eine kleine Teilmenge von zugehörigen Gesichtselementen, um alle Bestandteile der gespeicherten Gesichter aus dem GedĂ€chtnis abzurufen. In der Testphase kann das Netzwerk erfolgreich sowohl die IdentitĂ€t als auch das Geschlecht von Personen aus vorher nicht gezeigten Gesichtsansichten erkennen. Eine bemerkenswerte Eigenschaft der entstandenen GedĂ€chtnisarchitektur ist ihre FĂ€higkeit, ohne Darbietung von externen Stimuli spontan AktivitĂ€tsmuster zu generieren und die im GedĂ€chtnis abgelegten Inhalte in diesem schlafĂ€hnlichen "off-line" Regime wiederzugeben. Interessanterweise ergibt sich aus der Schlafphase ein direkter Vorteil fĂŒr die GedĂ€chtnisfunktion. Dieser Vorteil macht sich durch eine drastisch verbesserte Erkennungsrate nach der Schlafphase bemerkbar, wenn das Netwerk mit den zuvor nicht dargebotenen Ansichten von den bereits bekannten Personen konfrontiert wird. Die Leistungsverbesserung nach der Schlafphase ist umso deutlicher, je stĂ€rker die Alternativansichten vom Original abweichen. Dieser positive Effekt ist zudem komplett unabhĂ€ngig von der synapsenspezifischen PlastizitĂ€t und kann allein durch die synapsenunspezifische, homöostatische Regulation der AktivitĂ€t im Netzwerk erklĂ€rt werden. Das entwickelte Netzwerk demonstriert so eine im Bereich der neuronalen Modellierung bisher nicht gezeigte FunktionalitĂ€t. Es kann unĂŒberwacht eine GedĂ€chtnisdomĂ€ne fĂŒr kompositionelle, generative ObjektreprĂ€sentation durch die Erfahrung mit natĂŒrlichen Bildern sowohl im reizgetriebenen, wachĂ€hnlichen Zustand als auch im reizabgekoppelten, schlafĂ€hnlichen Zustand formen und verwalten. Diese FunktionalitĂ€t bietet einen vielversprechenden Ausgangspunkt fĂŒr weitere Studien, die die neuronalen Lernmechanismen des Gehirns ins Visier nehmen und letztendlich deren konsequente Umsetzung in technischen, adaptiven Systemen anstreben
    • 

    corecore