13 research outputs found
Biologically-inspired hierarchical architectures for object recognition
PhD ThesisThe existing methods for machine vision translate the three-dimensional
objects in the real world into two-dimensional images. These methods
have achieved acceptable performances in recognising objects. However,
the recognition performance drops dramatically when objects are transformed, for instance, the background, orientation, position in the image,
and scale. The human’s visual cortex has evolved to form an efficient
invariant representation of objects from within a scene. The superior
performance of human can be explained by the feed-forward multi-layer
hierarchical structure of human visual cortex, in addition to, the utilisation of different fields of vision depending on the recognition task.
Therefore, the research community investigated building systems that
mimic the hierarchical architecture of the human visual cortex as an
ultimate objective.
The aim of this thesis can be summarised as developing hierarchical
models of the visual processing that tackle the remaining challenges of
object recognition. To enhance the existing models of object recognition
and to overcome the above-mentioned issues, three major contributions
are made that can be summarised as the followings
1. building a hierarchical model within an abstract architecture that
achieves good performances in challenging image object datasets;
2. investigating the contribution for each region of vision for object
and scene images in order to increase the recognition performance
and decrease the size of the processed data;
3. further enhance the performance of all existing models of object
recognition by introducing hierarchical topologies that utilise the
context in which the object is found to determine the identity of
the object.
Statement ofHigher Committee For Education Development in Iraq (HCED
Methods and Apparatus for Autonomous Robotic Control
Sensory processing of visual, auditory, and other sensor information (e.g., visual imagery, LIDAR, RADAR) is conventionally based on "stovepiped," or isolated processing, with little interactions between modules. Biological systems, on the other hand, fuse multi-sensory information to identify nearby objects of interest more quickly, more efficiently, and with higher signal-to-noise ratios. Similarly, examples of the OpenSense technology disclosed herein use neurally inspired processing to identify and locate objects in a robot's environment. This enables the robot to navigate its environment more quickly and with lower computational and power requirements
Humanoid Robots
For many years, the human being has been trying, in all ways, to recreate the complex mechanisms that form the human body. Such task is extremely complicated and the results are not totally satisfactory. However, with increasing technological advances based on theoretical and experimental researches, man gets, in a way, to copy or to imitate some systems of the human body. These researches not only intended to create humanoid robots, great part of them constituting autonomous systems, but also, in some way, to offer a higher knowledge of the systems that form the human body, objectifying possible applications in the technology of rehabilitation of human beings, gathering in a whole studies related not only to Robotics, but also to Biomechanics, Biomimmetics, Cybernetics, among other areas. This book presents a series of researches inspired by this ideal, carried through by various researchers worldwide, looking for to analyze and to discuss diverse subjects related to humanoid robots. The presented contributions explore aspects about robotic hands, learning, language, vision and locomotion
Recommended from our members
Computational models of the human visual cortex: on individual differences and ecologically valid input statistics
Perception relies on cortical processes in response to sensory stimuli. Visual input entering the
eyes ascends a cascade of processing steps from the retina to high-level regions of the cortex.
Vision science investigates these transformations that give rise to high-level processing of
visual objects, such as object recognition. In this thesis I investigate computational models
of the human visual cortex with regard to their ability to predict cortical responses to visual
objects. In particular, I describe two factors playing an important role in using deep neural
networks (DNNs) to better understand cortical functioning: the initial weight state and
ecologically more valid input statistics.
In Chapter 1 of this thesis I will introduce relevant literature pertaining to deep neural
networks as a modeling framework for the visual cortex. Next, I will lay out the motivation
for the research questions investigated in this thesis and described in detail in Chapters 2, 3,
and 4.
Chapter 2 focuses on the impact of the initial weight state of a model on its ability
to predict cortical representations. I describe work in which we demonstrate that two
DNN instances identical in every aspect but their initial weights, yield very dissimilar
representations. Relying on single network instances to predict cortical activation patterns
in response to sensory stimuli poses a problem for computational neuroscience: depending
on the initial set of weights the ability to mirror the cortical representations of these stimuli
might vary. Thus, results based on single (“off-the-shelf”) model instances - as commonly
used in computational neuroscience - may not generalize. In contrast, using multiple DNN
instances might alleviate this problem as they allow insights in the variability of a given
model architecture to predict cortical representations. These individual differences between
model instances suggest that to allow results to generalize more easily the model instances
should be treated similar to human experimental participants.
In Chapter 3 I focus on ecologically more valid input statistics (in the form of training
images) aiming to improve a model’s ability to predict cortical representations. The most
successful models of the human visual cortex to date are DNNs trained on object recognition
tasks designed with machine learning goals in mind. However, the image sets used for training
these DNNs are often not ecologically realistic. For example, training on the most-widely used image set in computational neuroscience (ImageNet Large Scale Visual Recognition
Challenge (ILSVRC) 2012) requires the fine-grained distinction of 120 dog breeds, but does
not contain visual object categories encountered frequently in everyday human life (e.g.
woman, man, or child). This suggests that taking into account the human visual experience
when training models of the human visual cortex on a categorization task might help to
predict cortical representations. In this Chapter I describe the creation of a set of images
aimed at mimicking the human visual diet: ecoset. Ecoset contains more than 1.5 million
images from 565 basic level categories and is the largest image set specifically designed for
computational neuroscience to date. Ecoset is freely available to allow the community to test
their own hypotheses of models trained with input statistics matched to the human visual
environment.
In Chapter 4 we build on the results from the previous two Chapters. Using multiple
DNN instances I investigate whether a brain-inspired model architecture (vNet) trained on
ecologically more valid input statistics (ecoset) might improve its ability to predict cortical
representations. I first demonstrate that ecoset might improve an architecture’s ability to
mirror cortical representations. Furthermore, ecoset-trained vNet also outperforms state-ofthe-
art computer vision and computational neuroscience models in terms of mirroring cortical
representations in the human brain. Thus, incorporating biological and ecological aspects,
such as brain-inspired architectural features and ecologically more valid input statistics, into
computational models may yield better predictions of response patterns in the human visual
cortex.
Treating DNN instances similar to human experimental participants and considering
ecological and biological factors for building these DNNs may be an important step towards
better models of the human visual cortex. Such models might allow a better understanding of
the cortical processes underlying high-level vision in the human brain.Cambridge Trust - Vice Chancellor's Award 2015
Cambridge Philosophical Society
MRC Cognition and Brain Sciences Uni
Pre-processing, classification and semantic querying of large-scale Earth observation spaceborne/airborne/terrestrial image databases: Process and product innovations.
By definition of Wikipedia, “big data is the term adopted for a collection of data sets so large and complex that it becomes difficult to process using on-hand database management tools or traditional data processing applications. The big data challenges typically include capture, curation, storage, search, sharing, transfer, analysis and visualization”.
Proposed by the intergovernmental Group on Earth Observations (GEO), the visionary goal of the Global Earth Observation System of Systems (GEOSS) implementation plan for years 2005-2015 is systematic transformation of multisource Earth Observation (EO) “big data” into timely, comprehensive and operational EO value-adding products and services, submitted to the GEO Quality Assurance Framework for Earth Observation (QA4EO) calibration/validation (Cal/Val) requirements. To date the GEOSS mission cannot be considered fulfilled by the remote sensing (RS) community. This is tantamount to saying that past and existing EO image understanding systems (EO-IUSs) have been outpaced by the rate of collection of EO sensory big data, whose quality and quantity are ever-increasing. This true-fact is supported by several observations. For example, no European Space Agency (ESA) EO Level 2 product has ever been systematically generated at the ground segment. By definition, an ESA EO Level 2 product comprises a single-date multi-spectral (MS) image radiometrically calibrated into surface reflectance (SURF) values corrected for geometric, atmospheric, adjacency and topographic effects, stacked with its data-derived scene classification map (SCM), whose thematic legend is general-purpose, user- and application-independent and includes quality layers, such as cloud and cloud-shadow. Since no GEOSS exists to date, present EO content-based image retrieval (CBIR) systems lack EO image understanding capabilities. Hence, no semantic CBIR (SCBIR) system exists to date either, where semantic querying is synonym of semantics-enabled knowledge/information discovery in multi-source big image databases.
In set theory, if set A is a strict superset of (or strictly includes) set B, then A B. This doctoral project moved from the working hypothesis that SCBIR computer vision (CV), where vision is synonym of scene-from-image reconstruction and understanding EO image understanding (EO-IU) in operating mode, synonym of GEOSS ESA EO Level 2 product human vision. Meaning that necessary not sufficient pre-condition for SCBIR is CV in operating mode, this working hypothesis has two corollaries. First, human visual perception, encompassing well-known visual illusions such as Mach bands illusion, acts as lower bound of CV within the multi-disciplinary domain of cognitive science, i.e., CV is conditioned to include a computational model of human vision. Second, a necessary not sufficient pre-condition for a yet-unfulfilled GEOSS development is systematic generation at the ground segment of ESA EO Level 2 product.
Starting from this working hypothesis the overarching goal of this doctoral project was to contribute in research and technical development (R&D) toward filling an analytic and pragmatic information gap from EO big sensory data to EO value-adding information products and services. This R&D objective was conceived to be twofold. First, to develop an original EO-IUS in operating mode, synonym of GEOSS, capable of systematic ESA EO Level 2 product generation from multi-source EO imagery. EO imaging sources vary in terms of: (i) platform, either spaceborne, airborne or terrestrial, (ii) imaging sensor, either: (a) optical, encompassing radiometrically calibrated or uncalibrated images, panchromatic or color images, either true- or false color red-green-blue (RGB), multi-spectral (MS), super-spectral (SS) or hyper-spectral (HS) images, featuring spatial resolution from low (> 1km) to very high (< 1m), or (b) synthetic aperture radar (SAR), specifically, bi-temporal RGB SAR imagery.
The second R&D objective was to design and develop a prototypical implementation of an integrated closed-loop EO-IU for semantic querying (EO-IU4SQ) system as a GEOSS proof-of-concept in support of SCBIR. The proposed closed-loop EO-IU4SQ system prototype consists of two subsystems for incremental learning. A primary (dominant, necessary not sufficient) hybrid (combined deductive/top-down/physical model-based and inductive/bottom-up/statistical model-based) feedback EO-IU subsystem in operating mode requires no human-machine interaction to automatically transform in linear time a single-date MS image into an ESA EO Level 2 product as initial condition. A secondary (dependent) hybrid feedback EO Semantic Querying (EO-SQ) subsystem is provided with a graphic user interface (GUI) to streamline human-machine interaction in support of spatiotemporal EO big data analytics and SCBIR operations. EO information products generated as output by the closed-loop EO-IU4SQ system monotonically increase their value-added with closed-loop iterations
The Future of Humanoid Robots
This book provides state of the art scientific and engineering research findings and developments in the field of humanoid robotics and its applications. It is expected that humanoids will change the way we interact with machines, and will have the ability to blend perfectly into an environment already designed for humans. The book contains chapters that aim to discover the future abilities of humanoid robots by presenting a variety of integrated research in various scientific and engineering fields, such as locomotion, perception, adaptive behavior, human-robot interaction, neuroscience and machine learning. The book is designed to be accessible and practical, with an emphasis on useful information to those working in the fields of robotics, cognitive science, artificial intelligence, computational methods and other fields of science directly or indirectly related to the development and usage of future humanoid robots. The editor of the book has extensive R&D experience, patents, and publications in the area of humanoid robotics, and his experience is reflected in editing the content of the book
BEYOND MULTI-TARGET TRACKING: STATISTICAL PATTERN ANALYSIS OF PEOPLE AND GROUPS
Ogni giorno milioni e milioni di videocamere monitorano la vita quotidiana delle persone, registrando e collezionando una grande quantit\ue0 di dati. Questi dati possono essere molto utili per scopi di video-sorveglianza: dalla rilevazione di comportamenti anomali all'analisi del traffico urbano nelle strade. Tuttavia i dati collezionati vengono usati raramente, in quanto non \ue8 pensabile che un operatore umano riesca a esaminare manualmente e prestare attenzione a una tale quantit\ue0 di dati simultaneamente.
Per questo motivo, negli ultimi anni si \ue8 verificato un incremento della richiesta di strumenti per l'analisi automatica di dati acquisiti da sistemi di video-sorveglianza in modo da estrarre informazione di pi\uf9 alto livello (per esempio, John, Sam e Anne stanno camminando in gruppo al parco giochi vicino alla stazione) a partire dai dati a disposizione che sono solitamente a basso livello e ridondati (per esempio, una sequenza di immagini). L'obiettivo principale di questa tesi \ue8 quello di proporre soluzioni e algoritmi automatici che permettono di estrarre informazione ad alto livello da una zona di interesse che viene monitorata da telecamere. Cos\uec i dati sono rappresentati in modo da essere facilmente interpretabili e analizzabili da qualsiasi persona. In particolare, questo lavoro \ue8 focalizzato sull'analisi di persone e i loro comportamenti sociali collettivi.
Il titolo della tesi, beyond multi-target tracking, evidenzia lo scopo del lavoro: tutti i metodi proposti in questa tesi che si andranno ad analizzare hanno come comune denominatore il target tracking. Inoltre andremo oltre le tecniche standard per arrivare a una rappresentazione del dato a pi\uf9 alto livello. Per prima cosa, analizzeremo il problema del target tracking in quanto \ue8 alle basi di questo lavoro. In pratica, target tracking significa stimare la posizione di ogni oggetto di interesse in un immagine e la sua traiettoria nel tempo. Analizzeremo il problema da due prospettive complementari: 1) il punto di vista ingegneristico, dove l'obiettivo \ue8 quello di creare algoritmi che ottengono i risultati migliori per il problema in esame. 2) Il punto di vista della neuroscienza: motivati dalle teorie che cercano di spiegare il funzionamento del sistema percettivo umano, proporremo in modello attenzionale per tracking e il riconoscimento di oggetti e persone.
Il secondo problema che andremo a esplorare sar\ue0 l'estensione del tracking alla situazione dove pi\uf9 telecamere sono disponibili. L'obiettivo \ue8 quello di mantenere un identificatore univoco per ogni persona nell'intera rete di telecamere. In altre parole, si vuole riconoscere gli individui che vengono monitorati in posizioni e telecamere diverse considerando un database di candidati. Tale problema \ue8 chiamato in letteratura re-indetificazione di persone. In questa tesi, proporremo un modello standard di come affrontare il problema. In questo modello, presenteremo dei nuovi descrittori di aspetto degli individui, in quanto giocano un ruolo importante allo scopo di ottenere i risultati migliori.
Infine raggiungeremo il livello pi\uf9 alto di rappresentazione dei dati che viene affrontato in questa tesi, che \ue8 l'analisi di interazioni sociali tra persone. In particolare, ci focalizzeremo in un tipo specifico di interazione: il raggruppamento di persone. Proporremo dei metodi di visione computazionale che sfruttano nozioni di psicologia sociale per rilevare gruppi di persone. Inoltre, analizzeremo due modelli probabilistici che affrontano il problema di tracking (congiunto) di gruppi e individui.Every day millions and millions of surveillance cameras monitor the world, recording and collecting huge amount of data. The collected data can be extremely useful: from the behavior analysis to prevent unpleasant events, to the analysis of the traffic. However, these valuable data is seldom used, because of the amount of information that the human operator has to manually attend and examine. It would be like looking for a needle in the haystack.
The automatic analysis of data is becoming mandatory for extracting summarized high-level information (e.g., John, Sam and Anne are walking together in group at the playground near the station) from the available redundant low-level data (e.g., an image sequence).
The main goal of this thesis is to propose solutions and automatic algorithms that perform high-level analysis of a camera-monitored environment. In this way, the data are summarized in a high-level representation for a better understanding.
In particular, this work is focused on the analysis of moving people and their collective behaviors.
The title of the thesis, beyond multi-target tracking, mirrors the purpose of the work: we will propose methods that have the target tracking as common denominator, and go beyond the standard techniques in order to provide a high-level description of the data.
First, we investigate the target tracking problem as it is the basis of all the next work. Target tracking estimates the position of each target in the image and its trajectory over time. We analyze the problem from two complementary perspectives: 1) the engineering point of view, where we deal with problem in order to obtain the best results in terms of accuracy and performance. 2) The neuroscience point of view, where we propose an attentional model for tracking and recognition of objects and people, motivated by theories of the human perceptual system.
Second, target tracking is extended to the camera network case, where the goal is to keep a unique identifier for each person in the whole network, i.e., to perform person re-identification. The goal is to recognize individuals in diverse locations over different non-overlapping camera views or also the same camera, considering a large set of candidates.
In this context, we propose a pipeline and appearance-based descriptors that enable us to define in a proper way the problem and to reach the-state-of-the-art results.
Finally, the higher level of description investigated in this thesis is the analysis (discovery and tracking) of social interaction between people. In particular, we focus on finding small groups of people. We introduce methods that embed notions of social psychology into computer vision algorithms. Then, we extend the detection of social interaction over time, proposing novel probabilistic models that deal with (joint) individual-group tracking
Aspects of algorithms and dynamics of cellular paradigms
Els paradigmes cel·lulars, com les xarxes neuronals cel·lulars (CNN, en anglès) i els autòmats cel·lulars (CA, en anglès), sĂłn una eina excel·lent de cĂ lcul, al ser equivalents a una mĂ quina universal de Turing. La introducciĂł de la mĂ quina universal CNN (CNN-UM, en anglès) ha permès desenvolupar hardware, el nucli computacional del qual funciona segons la filosofia cel·lular; aquest hardware ha trobat aplicaciĂł en diversos camps al llarg de la darrera dècada. Malgrat això, encara hi ha moltes preguntes a obertes sobre com definir els algoritmes d'una CNN-UM i com estudiar la dinĂ mica dels autòmats cel·lulars. En aquesta tesis es tracten els dos problemes: primer, es demostra que es possible acotar l'espai dels algoritmes per a la CNN-UM i explorar-lo grĂ cies a les tècniques genètiques; i segon, s'expliquen els fonaments de l'estudi dels CA per mitjĂ de la dinĂ mica no lineal (segons la definiciĂł de Chua) i s'il·lustra com aquesta tècnica ha permès trobar resultats innovadors.Los paradigmas celulares, como las redes neuronales celulares (CNN, eninglĂ©s) y los autĂłmatas celulares (CA, en inglĂ©s), son una excelenteherramienta de cálculo, al ser equivalentes a una maquina universal deTuring. La introducciĂłn de la maquina universal CNN (CNN-UM, eninglĂ©s) ha permitido desarrollar hardware cuyo nĂşcleo computacionalfunciona segĂşn la filosofĂa celular; dicho hardware ha encontradoaplicaciĂłn en varios campos a lo largo de la ultima dĂ©cada. Sinembargo, hay aun muchas preguntas abiertas sobre como definir losalgoritmos de una CNN-UM y como estudiar la dinámica de los autĂłmatascelular. En esta tesis se tratan ambos problemas: primero se demuestraque es posible acotar el espacio de los algoritmos para la CNN-UM yexplorarlo gracias a tĂ©cnicas genĂ©ticas; segundo, se explican losfundamentos del estudio de los CA por medio de la dinámica no lineal(segĂşn la definiciĂłn de Chua) y se ilustra como esta tĂ©cnica hapermitido encontrar resultados novedosos.Cellular paradigms, like Cellular Neural Networks (CNNs) and Cellular Automata (CA) are an excellent tool to perform computation, since they are equivalent to a Universal Turing machine. The introduction of the Cellular Neural Network - Universal Machine (CNN-UM) allowed us to develop hardware whose computational core works according to the principles of cellular paradigms; such a hardware has found application in a number of fields throughout the last decade. Nevertheless, there are still many open questions about how to define algorithms for a CNN-UM, and how to study the dynamics of Cellular Automata. In this dissertation both problems are tackled: first, we prove that it is possible to bound the space of all algorithms of CNN-UM and explore it through genetic techniques; second, we explain the fundamentals of the nonlinear perspective of CA (according to Chua's definition), and we illustrate how this technique has allowed us to find novel results
Engineering for a Changing World: 59th IWK, Ilmenau Scientific Colloquium, Technische Universität Ilmenau, September 11-15, 2017 : programme
In 2017, the Ilmenau Scientific Colloquium is again organised by the Department of Mechanical Engineering. The title of this year’s conference “Engineering for a Changing World” refers to limited natural resources of our planet, to massive changes in cooperation between continents, countries, institutions and people – enabled by the increased implementation of information technology as the probably most dominant driver in many fields. The Colloquium, complemented by workshops, is characterised by the following topics, but not limited to them:
– Precision Engineering and Metrology
– Industry 4.0 and Digitalisation in Mechanical Engineering
– Mechatronics, Biomechatronics and Mechanism Technology
– Systems Technology
– Innovative Metallic Materials
The topics are oriented on key strategic aspects of research and teaching in Mechanical Engineering at our university
Smart vision in system-on-chip applications
In the last decade the ability to design and manufacture integrated circuits with higher transistor densities has led to the integration of complete systems on a single silicon die. These are commonly referred to as System-on-Chip (SoC). As SoCs processes can incorporate multiple technologies it is now feasible to produce single chip camera systems with embedded image processing, known as Imager-on-Chips (IoC). The development of IoCs is complicated due to the mixture of digital and analog components and the high cost of prototyping these designs using silicon processes. There are currently no re-usable prototyping platforms that specifically address the needs of IoC development. This thesis details a new prototyping platform specifically for use in the development of low-cost mass-market IoC applications. FPGA technology was utilised to implement a frame-based processing architecture suitable for supporting a range of real-time imaging and machine vision applications. To demonstrate the effectiveness of the prototyping platform, an example object counting and highlighting application was developed and functionally verified in real-time. A high-level IoC cost model was formulated to calculate the cost of manufacturing prototyped applications as a single IoC. This highlighted the requirement for careful analysis of optical issues, embedded imager array size and the silicon process used to ensure the desired IoC unit cost was achieved. A modified version of the FPGA architecture, which would result in improving the DSP performance, is also proposed