Search CORE

375 research outputs found

Objects and scenes classification with selective use of central and peripheral image content

Author: Alameer Ali
Degenaar Patrick
Nazarpour Kianoush
Publication venue: 'Elsevier BV'
Publication date: 01/01/2020
Field of study

University of Salford Institutional Repository

Edinburgh Research Explorer

Biologically-inspired hierarchical architectures for object recognition

Author: Alameer Ali Munthr Abdulkareem
Publication venue: Newcastle University
Publication date: 01/01/2018
Field of study

PhD ThesisThe existing methods for machine vision translate the three-dimensional objects in the real world into two-dimensional images. These methods have achieved acceptable performances in recognising objects. However, the recognition performance drops dramatically when objects are transformed, for instance, the background, orientation, position in the image, and scale. The human’s visual cortex has evolved to form an efficient invariant representation of objects from within a scene. The superior performance of human can be explained by the feed-forward multi-layer hierarchical structure of human visual cortex, in addition to, the utilisation of different fields of vision depending on the recognition task. Therefore, the research community investigated building systems that mimic the hierarchical architecture of the human visual cortex as an ultimate objective. The aim of this thesis can be summarised as developing hierarchical models of the visual processing that tackle the remaining challenges of object recognition. To enhance the existing models of object recognition and to overcome the above-mentioned issues, three major contributions are made that can be summarised as the followings 1. building a hierarchical model within an abstract architecture that achieves good performances in challenging image object datasets; 2. investigating the contribution for each region of vision for object and scene images in order to increase the recognition performance and decrease the size of the processed data; 3. further enhance the performance of all existing models of object recognition by introducing hierarchical topologies that utilise the context in which the object is found to determine the identity of the object. Statement ofHigher Committee For Education Development in Iraq (HCED

Newcastle University eTheses

Sparse Modeling for Image and Vision Processing

Author: Ecole Normale Supérieure
Francis Bach
Francis Bach
Hal Id Hal
Jean Ponce
Jean Ponce
Julien Mairal
Julien Mairal
Sparse Modeling Image
Vision Processing
Publication venue
Publication date: 01/01/2014
Field of study

In recent years, a large amount of multi-disciplinary research has been conducted on sparse models and their applications. In statistics and machine learning, the sparsity principle is used to perform model selection---that is, automatically selecting a simple model among a large collection of them. In signal processing, sparse coding consists of representing data with linear combinations of a few dictionary elements. Subsequently, the corresponding tools have been widely adopted by several scientific communities such as neuroscience, bioinformatics, or computer vision. The goal of this monograph is to offer a self-contained view of sparse modeling for visual recognition and image processing. More specifically, we focus on applications where the dictionary is learned and adapted to data, yielding a compact representation that has been successful in various contexts.Comment: 205 pages, to appear in Foundations and Trends in Computer Graphics and Visio

arXiv.org e-Print Archive

CiteSeerX

Hal - Université Grenoble Alpes

INRIA a CCSD electronic archive server

Representation Learning: A Review and New Perspectives

Author: Bengio Yoshua
Courville Aaron
Vincent Pascal
Publication venue
Publication date: 01/01/2014
Field of study

The success of machine learning algorithms generally depends on data representation, and we hypothesize that this is because different representations can entangle and hide more or less the different explanatory factors of variation behind the data. Although specific domain knowledge can be used to help design representations, learning with generic priors can also be used, and the quest for AI is motivating the design of more powerful representation-learning algorithms implementing such priors. This paper reviews recent work in the area of unsupervised feature learning and deep learning, covering advances in probabilistic models, auto-encoders, manifold learning, and deep networks. This motivates longer-term unanswered questions about the appropriate objectives for learning good representations, for computing representations (i.e., inference), and the geometrical connections between representation learning, density estimation and manifold learning

arXiv.org e-Print Archive

CiteSeerX

Deep Convolutional Neural Networks Outperform Feature-Based But Not Categorical Models in Explaining Object Similarity Judgments.

Author: Kamila M. Jozwik
Kamila M. Jozwik
Katherine R. Storrs
Marieke Mur
Nikolaus Kriegeskorte
Publication venue: Frontiers in Psychology
Publication date: 01/10/2017
Field of study

Recent advances in Deep convolutional Neural Networks (DNNs) have enabled unprecedentedly accurate computational models of brain representations, and present an exciting opportunity to model diverse cognitive functions. State-of-the-art DNNs achieve human-level performance on object categorisation, but it is unclear how well they capture human behavior on complex cognitive tasks. Recent reports suggest that DNNs can explain significant variance in one such task, judging object similarity. Here, we extend these findings by replicating them for a rich set of object images, comparing performance across layers within two DNNs of different depths, and examining how the DNNs' performance compares to that of non-computational "conceptual" models. Human observers performed similarity judgments for a set of 92 images of real-world objects. Representations of the same images were obtained in each of the layers of two DNNs of different depths (8-layer AlexNet and 16-layer VGG-16). To create conceptual models, other human observers generated visual-feature labels (e.g., "eye") and category labels (e.g., "animal") for the same image set. Feature labels were divided into parts, colors, textures and contours, while category labels were divided into subordinate, basic, and superordinate categories. We fitted models derived from the features, categories, and from each layer of each DNN to the similarity judgments, using representational similarity analysis to evaluate model performance. In both DNNs, similarity within the last layer explains most of the explainable variance in human similarity judgments. The last layer outperforms almost all feature-based models. Late and mid-level layers outperform some but not all feature-based models. Importantly, categorical models predict similarity judgments significantly better than any DNN layer. Our results provide further evidence for commonalities between DNNs and brain representations. Models derived from visual features other than object parts perform relatively poorly, perhaps because DNNs more comprehensively capture the colors, textures and contours which matter to human object perception. However, categorical models outperform DNNs, suggesting that further work may be needed to bring high-level semantic representations in DNNs closer to those extracted by humans. Modern DNNs explain similarity judgments remarkably well considering they were not trained on this task, and are promising models for many aspects of human cognition

Scholarship@Western

Directory of Open Access Journals

Apollo (Cambridge)

Über die Selbstorganisation einer hierarchischen Gedächtnisstruktur für kompositionelle Objektrepräsentation im visuellen Kortex

Author: Jitsev Evgueni
Publication venue
Publication date: 11/01/2011
Field of study

At present, there is a huge lag between the artificial and the biological information processing systems in terms of their capability to learn. This lag could be certainly reduced by gaining more insight into the higher functions of the brain like learning and memory. For instance, primate visual cortex is thought to provide the long-term memory for the visual objects acquired by experience. The visual cortex handles effortlessly arbitrary complex objects by decomposing them rapidly into constituent components of much lower complexity along hierarchically organized visual pathways. How this processing architecture self-organizes into a memory domain that employs such compositional object representation by learning from experience remains to a large extent a riddle. The study presented here approaches this question by proposing a functional model of a self-organizing hierarchical memory network. The model is based on hypothetical neuronal mechanisms involved in cortical processing and adaptation. The network architecture comprises two consecutive layers of distributed, recurrently interconnected modules. Each module is identified with a localized cortical cluster of fine-scale excitatory subnetworks. A single module performs competitive unsupervised learning on the incoming afferent signals to form a suitable representation of the locally accessible input space. The network employs an operating scheme where ongoing processing is made of discrete successive fragments termed decision cycles, presumably identifiable with the fast gamma rhythms observed in the cortex. The cycles are synchronized across the distributed modules that produce highly sparse activity within each cycle by instantiating a local winner-take-all-like operation. Equipped with adaptive mechanisms of bidirectional synaptic plasticity and homeostatic activity regulation, the network is exposed to natural face images of different persons. The images are presented incrementally one per cycle to the lower network layer as a set of Gabor filter responses extracted from local facial landmarks. The images are presented without any person identity labels. In the course of unsupervised learning, the network creates simultaneously vocabularies of reusable local face appearance elements, captures relations between the elements by linking associatively those parts that encode the same face identity, develops the higher-order identity symbols for the memorized compositions and projects this information back onto the vocabularies in generative manner. This learning corresponds to the simultaneous formation of bottom-up, lateral and top-down synaptic connectivity within and between the network layers. In the mature connectivity state, the network holds thus full compositional description of the experienced faces in form of sparse memory traces that reside in the feed-forward and recurrent connectivity. Due to the generative nature of the established representation, the network is able to recreate the full compositional description of a memorized face in terms of all its constituent parts given only its higher-order identity symbol or a subset of its parts. In the test phase, the network successfully proves its ability to recognize identity and gender of the persons from alternative face views not shown before. An intriguing feature of the emerging memory network is its ability to self-generate activity spontaneously in absence of the external stimuli. In this sleep-like off-line mode, the network shows a self-sustaining replay of the memory content formed during the previous learning. Remarkably, the recognition performance is tremendously boosted after this off-line memory reprocessing. The performance boost is articulated stronger on those face views that deviate more from the original view shown during the learning. This indicates that the off-line memory reprocessing during the sleep-like state specifically improves the generalization capability of the memory network. The positive effect turns out to be surprisingly independent of synapse-specific plasticity, relying completely on the synapse-unspecific, homeostatic activity regulation across the memory network. The developed network demonstrates thus functionality not shown by any previous neuronal modeling approach. It forms and maintains a memory domain for compositional, generative object representation in unsupervised manner through experience with natural visual images, using both on- ("wake") and off-line ("sleep") learning regimes. This functionality offers a promising departure point for further studies, aiming for deeper insight into the learning mechanisms employed by the brain and their consequent implementation in the artificial adaptive systems for solving complex tasks not tractable so far.Gegenwärtig besteht immer noch ein enormer Abstand zwischen der Lernfähigkeit von künstlichen und biologischen Informationsverarbeitungssystemen. Dieser Abstand ließe sich durch eine bessere Einsicht in die höheren Funktionen des Gehirns wie Lernen und Gedächtnis verringern. Im visuellen Kortex etwa werden die Objekte innerhalb kürzester Zeit entlang der hierarchischen Verarbeitungspfade in ihre Bestandteile zerlegt und so durch eine Komposition von Elementen niedrigerer Komplexität dargestellt. Bereits bekannte Objekte werden so aus dem Langzeitgedächtnis abgerufen und wiedererkannt. Wie eine derartige kompositionell-hierarchische Gedächtnisstruktur durch die visuelle Erfahrung zustande kommen kann, ist noch weitgehend ungeklärt. Um dieser Frage nachzugehen, wird hier ein funktionelles Modell eines lernfähigen rekurrenten neuronalen Netzwerkes vorgestellt. Im Netzwerk werden neuronale Mechanismen implementiert, die der kortikalen Verarbeitung und Plastizität zugrunde liegen. Die hierarchische Architektur des Netzwerkes besteht aus zwei nacheinander geschalteten Schichten, die jede eine Anzahl von verteilten, rekurrent vernetzten Modulen beherbergen. Ein Modul umfasst dabei mehrere funktionell separate Subnetzwerke. Jedes solches Modul ist imstande, aus den eintreffenden Signalen eine geeignete Repräsentation für den lokalen Eingaberaum unüberwacht zu lernen. Die fortlaufende Verarbeitung im Netzwerk setzt sich zusammen aus diskreten Fragmenten, genannt Entscheidungszyklen, die man mit den schnellen kortikalen Rhythmen im gamma-Frequenzbereich in Verbindung setzen kann. Die Zyklen sind synchronisiert zwischen den verteilten Modulen. Innerhalb eines Zyklus wird eine lokal umgrenzte winner-take-all-ähnliche Operation in Modulen durchgeführt. Die Kompetitionsstärke wächst im Laufe des Zyklus an. Diese Operation aktiviert in Abhängigkeit von den Eingabesignalen eine sehr kleine Anzahl von Einheiten und verstärkt sie auf Kosten der anderen, um den dargebotenen Reiz in der Netzwerkaktivität abzubilden. Ausgestattet mit adaptiven Mechanismen der bidirektionalen synaptischen Plastizität und der homöostatischen Aktivitätsregulierung, erhält das Netzwerk natürliche Gesichtsbilder von verschiedenen Personen dargeboten. Die Bilder werden der unteren Netzwerkschicht, je ein Bild pro Zyklus, als Ansammlung von Gaborfilterantworten aus lokalen Gesichtslandmarken zugeführt, ohne Information über die Personenidentität zur Verfügung zu stellen. Im Laufe der unüberwachten Lernprozedur formt das Netzwerk die Verbindungsstruktur derart, dass die Gesichter aller dargebotenen Personen im Netzwerk in Form von dünn besiedelten Gedächtnisspuren abgelegt werden. Hierzu werden gleichzeitig vorwärtsgerichtete (bottom-up) und rekurrente (lateral, top-down) synaptische Verbindungen innerhalb und zwischen den Schichten gelernt. Im reifen Verbindungszustand werden infolge dieses Lernens die einzelnen Gesichter als Komposition ihrer Bestandteile auf generative Art gespeichert. Dank der generativen Art der gelernten Struktur reichen schon allein das höhere Identitätssymbol oder eine kleine Teilmenge von zugehörigen Gesichtselementen, um alle Bestandteile der gespeicherten Gesichter aus dem Gedächtnis abzurufen. In der Testphase kann das Netzwerk erfolgreich sowohl die Identität als auch das Geschlecht von Personen aus vorher nicht gezeigten Gesichtsansichten erkennen. Eine bemerkenswerte Eigenschaft der entstandenen Gedächtnisarchitektur ist ihre Fähigkeit, ohne Darbietung von externen Stimuli spontan Aktivitätsmuster zu generieren und die im Gedächtnis abgelegten Inhalte in diesem schlafähnlichen "off-line" Regime wiederzugeben. Interessanterweise ergibt sich aus der Schlafphase ein direkter Vorteil für die Gedächtnisfunktion. Dieser Vorteil macht sich durch eine drastisch verbesserte Erkennungsrate nach der Schlafphase bemerkbar, wenn das Netwerk mit den zuvor nicht dargebotenen Ansichten von den bereits bekannten Personen konfrontiert wird. Die Leistungsverbesserung nach der Schlafphase ist umso deutlicher, je stärker die Alternativansichten vom Original abweichen. Dieser positive Effekt ist zudem komplett unabhängig von der synapsenspezifischen Plastizität und kann allein durch die synapsenunspezifische, homöostatische Regulation der Aktivität im Netzwerk erklärt werden. Das entwickelte Netzwerk demonstriert so eine im Bereich der neuronalen Modellierung bisher nicht gezeigte Funktionalität. Es kann unüberwacht eine Gedächtnisdomäne für kompositionelle, generative Objektrepräsentation durch die Erfahrung mit natürlichen Bildern sowohl im reizgetriebenen, wachähnlichen Zustand als auch im reizabgekoppelten, schlafähnlichen Zustand formen und verwalten. Diese Funktionalität bietet einen vielversprechenden Ausgangspunkt für weitere Studien, die die neuronalen Lernmechanismen des Gehirns ins Visier nehmen und letztendlich deren konsequente Umsetzung in technischen, adaptiven Systemen anstreben

Hochschulschriftenserver - Universität Frankfurt am Main