394 research outputs found

    Spatial and temporal background modelling of non-stationary visual scenes

    Get PDF
    PhDThe prevalence of electronic imaging systems in everyday life has become increasingly apparent in recent years. Applications are to be found in medical scanning, automated manufacture, and perhaps most significantly, surveillance. Metropolitan areas, shopping malls, and road traffic management all employ and benefit from an unprecedented quantity of video cameras for monitoring purposes. But the high cost and limited effectiveness of employing humans as the final link in the monitoring chain has driven scientists to seek solutions based on machine vision techniques. Whilst the field of machine vision has enjoyed consistent rapid development in the last 20 years, some of the most fundamental issues still remain to be solved in a satisfactory manner. Central to a great many vision applications is the concept of segmentation, and in particular, most practical systems perform background subtraction as one of the first stages of video processing. This involves separation of ‘interesting foreground’ from the less informative but persistent background. But the definition of what is ‘interesting’ is somewhat subjective, and liable to be application specific. Furthermore, the background may be interpreted as including the visual appearance of normal activity of any agents present in the scene, human or otherwise. Thus a background model might be called upon to absorb lighting changes, moving trees and foliage, or normal traffic flow and pedestrian activity, in order to effect what might be termed in ‘biologically-inspired’ vision as pre-attentive selection. This challenge is one of the Holy Grails of the computer vision field, and consequently the subject has received considerable attention. This thesis sets out to address some of the limitations of contemporary methods of background segmentation by investigating methods of inducing local mutual support amongst pixels in three starkly contrasting paradigms: (1) locality in the spatial domain, (2) locality in the shortterm time domain, and (3) locality in the domain of cyclic repetition frequency. Conventional per pixel models, such as those based on Gaussian Mixture Models, offer no spatial support between adjacent pixels at all. At the other extreme, eigenspace models impose a structure in which every image pixel bears the same relation to every other pixel. But Markov Random Fields permit definition of arbitrary local cliques by construction of a suitable graph, and 3 are used here to facilitate a novel structure capable of exploiting probabilistic local cooccurrence of adjacent Local Binary Patterns. The result is a method exhibiting strong sensitivity to multiple learned local pattern hypotheses, whilst relying solely on monochrome image data. Many background models enforce temporal consistency constraints on a pixel in attempt to confirm background membership before being accepted as part of the model, and typically some control over this process is exercised by a learning rate parameter. But in busy scenes, a true background pixel may be visible for a relatively small fraction of the time and in a temporally fragmented fashion, thus hindering such background acquisition. However, support in terms of temporal locality may still be achieved by using Combinatorial Optimization to derive shortterm background estimates which induce a similar consistency, but are considerably more robust to disturbance. A novel technique is presented here in which the short-term estimates act as ‘pre-filtered’ data from which a far more compact eigen-background may be constructed. Many scenes entail elements exhibiting repetitive periodic behaviour. Some road junctions employing traffic signals are among these, yet little is to be found amongst the literature regarding the explicit modelling of such periodic processes in a scene. Previous work focussing on gait recognition has demonstrated approaches based on recurrence of self-similarity by which local periodicity may be identified. The present work harnesses and extends this method in order to characterize scenes displaying multiple distinct periodicities by building a spatio-temporal model. The model may then be used to highlight abnormality in scene activity. Furthermore, a Phase Locked Loop technique with a novel phase detector is detailed, enabling such a model to maintain correct synchronization with scene activity in spite of noise and drift of periodicity. This thesis contends that these three approaches are all manifestations of the same broad underlying concept: local support in each of the space, time and frequency domains, and furthermore, that the support can be harnessed practically, as will be demonstrated experimentally

    Information processing in neural systems: oscillations, network topologies and optimal representations

    Full text link
    Tesis doctoral inédita leída en la Universidad Autónoma de Madrid. Escuela Politécnica Superior, Departamento de Ingeniería informática. Fecha de lectura: 1-07-200

    Cognitive-developmental learning for a humanoid robot : a caregiver's gift

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2004.Includes bibliographical references (p. 319-341).(cont.) which are then applied to developmentally acquire new object representations. The humanoid robot therefore sees the world through the caregiver's eyes. Building an artificial humanoid robot's brain, even at an infant's cognitive level, has been a long quest which still lies only in the realm of our imagination. Our efforts towards such a dimly imaginable task are developed according to two alternate and complementary views: cognitive and developmental.The goal of this work is to build a cognitive system for the humanoid robot, Cog, that exploits human caregivers as catalysts to perceive and learn about actions, objects, scenes, people, and the robot itself. This thesis addresses a broad spectrum of machine learning problems across several categorization levels. Actions by embodied agents are used to automatically generate training data for the learning mechanisms, so that the robot develops categorization autonomously. Taking inspiration from the human brain, a framework of algorithms and methodologies was implemented to emulate different cognitive capabilities on the humanoid robot Cog. This framework is effectively applied to a collection of AI, computer vision, and signal processing problems. Cognitive capabilities of the humanoid robot are developmentally created, starting from infant-like abilities for detecting, segmenting, and recognizing percepts over multiple sensing modalities. Human caregivers provide a helping hand for communicating such information to the robot. This is done by actions that create meaningful events (by changing the world in which the robot is situated) thus inducing the "compliant perception" of objects from these human-robot interactions. Self-exploration of the world extends the robot's knowledge concerning object properties. This thesis argues for enculturating humanoid robots using infant development as a metaphor for building a humanoid robot's cognitive abilities. A human caregiver redesigns a humanoid's brain by teaching the humanoid robot as she would teach a child, using children's learning aids such as books, drawing boards, or other cognitive artifacts. Multi-modal object properties are learned using these tools and inserted into several recognition schemes,by Artur Miguel Do Amaral Arsenio.Ph.D

    Feature binding of MPEG-7 Visual Descriptors Using Chaotic Series

    Get PDF
    Due to advanced segmentation and tracking algorithms, a video can be divided into numerous objects. Segmentation and tracking algorithms output different low-level object features, resulting in a high-dimensional feature vector per object. The challenge is to generate feature vector of objects which can be mapped to human understandable description, such as object labels, e.g., person, car. MPEG-7 provides visual descriptors to describe video contents. However, generally the MPEG-7 visual descriptors are highly redundant, and the feature coefficients in these descriptors need to be pre-processed for domain specific application. Ideal case would be if MPEG-7 visual descriptor based feature vector, can be processed similar to some functional simulations of human brain activity. There has been a established link between the analysis of temporal human brain oscillatory signals and chaotic dynamics from the electroencephalography (EEG) of the brain neurons. Neural signals in limited brain activities are found to be behaviorally relevant (previously appeared to be noise) and can be simulated using chaotic series. Chaotic series is referred to as either a finite-difference or an ordinary differential equation, which presents non-random, irregular fluctuations of parameter values over time in a dynamical system. The dynamics in a chaotic series can be high - or low -dimensional, and the dimensionality can be deduced from the topological dimension of the attractor of the chaotic series. An attractor is manifested by the tendency of a non-linear finite difference equation or an ordinary differential equation, under various but delimited conditions, to go to a reproducible active state, and stay there. We propose a feature binding method, using chaotic series, to generate a new feature vector, C-MP7 , to describe video objects. The proposed method considers MPEG-7 visual descriptor coefficients as dynamical systems. Dynamical systems are excited (similar to neuronal excitation) with either high- or low-dimensional chaotic series, and then histogram-based clustering is applied on the simulated chaotic series coefficients to generate C-MP7 . The proposed feature binding offers better feature vector with high-dimensional chaotic series simulation than with low-dimensional chaotic series, over MPEG-7 visual descriptor based feature vector. Diverse video objects are grouped in four generic classes (e.g., has [barbelow]person, has [barbelow]group [barbelow]of [barbelow]persons, has [barbelow]vehicle, and has [barbelow]unknown ) to observe how well C-MP7 describes different video objects compared to MPEG-7 feature vector. In C-MP7 , with high dimensional chaotic series simulation, 1). descriptor coefficients are reduced dynamically up to 37.05% compared to 10% in MPEG-7 , 2) higher variance is achieved than MPEG-7 , 3) multi-class discriminant analysis of C-MP7 with Fisher-criteria shows increased binary class separation for clustered video objects than that of MPEG-7 , and 4) C-MP7 , specifically provides good clustering of video objects for has [barbelow]vehicle class against other classes. To test C-MP7 in an application, we deploy a combination of multiple binary classifiers for video object classification. Related work on video object classification use non-MPEG-7 features. We specifically observe classification of challenging surveillance video objects, e.g., incomplete objects, partial occlusion, background over lapping, scale and resolution variant objects, indoor / outdoor lighting variations. C-MP7 is used to train different classes of video objects. Object classification accuracy is verified with both low-dimensional and high-dimensional chaotic series based feature binding for C-MP7 . Testing of diverse video objects with high-dimensional chaotic series simulation shows, 1) classification accuracy significantly improves on average, 83% compared to the 62% with MPEG-7 , 2) excellent clustering of vehicle objects leads to above 99% accuracy for only vehicles against all other objects, and 3) with diverse video objects, including objects from poor segmentation. C-MP7 is more robust as a feature vector in classification than MPEG-7 . Initial results on sub-group classification for male and female video objects in has [barbelow]person class are also presentated as subjective observations. Earlier, chaos series properties have been used in video processing applications for compression and digital watermarking. To our best knowledge, this work is the first to use chaotic series for video object description and apply it for object classificatio

    Mathematical models of cognitive processes

    Get PDF
    The research activity carried out during the PhD course was focused on the development of mathematical models of some cognitive processes and their validation by means of data present in literature, with a double aim: i) to achieve a better interpretation and explanation of the great amount of data obtained on these processes from different methodologies (electrophysiological recordings on animals, neuropsychological, psychophysical and neuroimaging studies in humans), ii) to exploit model predictions and results to guide future research and experiments. In particular, the research activity has been focused on two different projects: 1) the first one concerns the development of neural oscillators networks, in order to investigate the mechanisms of synchronization of the neural oscillatory activity during cognitive processes, such as object recognition, memory, language, attention; 2) the second one concerns the mathematical modelling of multisensory integration processes (e.g. visual-acoustic), which occur in several cortical and subcortical regions (in particular in a subcortical structure named Superior Colliculus (SC)), and which are fundamental for orienting motor and attentive responses to external world stimuli. This activity has been realized in collaboration with the Center for Studies and Researches in Cognitive Neuroscience of the University of Bologna (in Cesena) and the Department of Neurobiology and Anatomy of the Wake Forest University School of Medicine (NC, USA). PART 1. Objects representation in a number of cognitive functions, like perception and recognition, foresees distribute processes in different cortical areas. One of the main neurophysiological question concerns how the correlation between these disparate areas is realized, in order to succeed in grouping together the characteristics of the same object (binding problem) and in maintaining segregated the properties belonging to different objects simultaneously present (segmentation problem). Different theories have been proposed to address these questions (Barlow, 1972). One of the most influential theory is the so called “assembly coding”, postulated by Singer (2003), according to which 1) an object is well described by a few fundamental properties, processing in different and distributed cortical areas; 2) the recognition of the object would be realized by means of the simultaneously activation of the cortical areas representing its different features; 3) groups of properties belonging to different objects would be kept separated in the time domain. In Chapter 1.1 and in Chapter 1.2 we present two neural network models for object recognition, based on the “assembly coding” hypothesis. These models are networks of Wilson-Cowan oscillators which exploit: i) two high-level “Gestalt Rules” (the similarity and previous knowledge rules), to realize the functional link between elements of different cortical areas representing properties of the same object (binding problem); 2) the synchronization of the neural oscillatory activity in the γ-band (30-100Hz), to segregate in time the representations of different objects simultaneously present (segmentation problem). These models are able to recognize and reconstruct multiple simultaneous external objects, even in difficult case (some wrong or lacking features, shared features, superimposed noise). In Chapter 1.3 the previous models are extended to realize a semantic memory, in which sensory-motor representations of objects are linked with words. To this aim, the network, previously developed, devoted to the representation of objects as a collection of sensory-motor features, is reciprocally linked with a second network devoted to the representation of words (lexical network) Synapses linking the two networks are trained via a time-dependent Hebbian rule, during a training period in which individual objects are presented together with the corresponding words. Simulation results demonstrate that, during the retrieval phase, the network can deal with the simultaneous presence of objects (from sensory-motor inputs) and words (from linguistic inputs), can correctly associate objects with words and segment objects even in the presence of incomplete information. Moreover, the network can realize some semantic links among words representing objects with some shared features. These results support the idea that semantic memory can be described as an integrated process, whose content is retrieved by the co-activation of different multimodal regions. In perspective, extended versions of this model may be used to test conceptual theories, and to provide a quantitative assessment of existing data (for instance concerning patients with neural deficits). PART 2. The ability of the brain to integrate information from different sensory channels is fundamental to perception of the external world (Stein et al, 1993). It is well documented that a number of extraprimary areas have neurons capable of such a task; one of the best known of these is the superior colliculus (SC). This midbrain structure receives auditory, visual and somatosensory inputs from different subcortical and cortical areas, and is involved in the control of orientation to external events (Wallace et al, 1993). SC neurons respond to each of these sensory inputs separately, but is also capable of integrating them (Stein et al, 1993) so that the response to the combined multisensory stimuli is greater than that to the individual component stimuli (enhancement). This enhancement is proportionately greater if the modality-specific paired stimuli are weaker (the principle of inverse effectiveness). Several studies have shown that the capability of SC neurons to engage in multisensory integration requires inputs from cortex; primarily the anterior ectosylvian sulcus (AES), but also the rostral lateral suprasylvian sulcus (rLS). If these cortical inputs are deactivated the response of SC neurons to cross-modal stimulation is no different from that evoked by the most effective of its individual component stimuli (Jiang et al 2001). This phenomenon can be better understood through mathematical models. The use of mathematical models and neural networks can place the mass of data that has been accumulated about this phenomenon and its underlying circuitry into a coherent theoretical structure. In Chapter 2.1 a simple neural network model of this structure is presented; this model is able to reproduce a large number of SC behaviours like multisensory enhancement, multisensory and unisensory depression, inverse effectiveness. In Chapter 2.2 this model was improved by incorporating more neurophysiological knowledge about the neural circuitry underlying SC multisensory integration, in order to suggest possible physiological mechanisms through which it is effected. This endeavour was realized in collaboration with Professor B.E. Stein and Doctor B. Rowland during the 6 months-period spent at the Department of Neurobiology and Anatomy of the Wake Forest University School of Medicine (NC, USA), within the Marco Polo Project. The model includes four distinct unisensory areas that are devoted to a topological representation of external stimuli. Two of them represent subregions of the AES (i.e., FAES, an auditory area, and AEV, a visual area) and send descending inputs to the ipsilateral SC; the other two represent subcortical areas (one auditory and one visual) projecting ascending inputs to the same SC. Different competitive mechanisms, realized by means of population of interneurons, are used in the model to reproduce the different behaviour of SC neurons in conditions of cortical activation and deactivation. The model, with a single set of parameters, is able to mimic the behaviour of SC multisensory neurons in response to very different stimulus conditions (multisensory enhancement, inverse effectiveness, within- and cross-modal suppression of spatially disparate stimuli), with cortex functional and cortex deactivated, and with a particular type of membrane receptors (NMDA receptors) active or inhibited. All these results agree with the data reported in Jiang et al. (2001) and in Binns and Salt (1996). The model suggests that non-linearities in neural responses and synaptic (excitatory and inhibitory) connections can explain the fundamental aspects of multisensory integration, and provides a biologically plausible hypothesis about the underlying circuitry

    Human Activity Recognition and Control of Wearable Robots

    Get PDF
    abstract: Wearable robotics has gained huge popularity in recent years due to its wide applications in rehabilitation, military, and industrial fields. The weakness of the skeletal muscles in the aging population and neurological injuries such as stroke and spinal cord injuries seriously limit the abilities of these individuals to perform daily activities. Therefore, there is an increasing attention in the development of wearable robots to assist the elderly and patients with disabilities for motion assistance and rehabilitation. In military and industrial sectors, wearable robots can increase the productivity of workers and soldiers. It is important for the wearable robots to maintain smooth interaction with the user while evolving in complex environments with minimum effort from the user. Therefore, the recognition of the user's activities such as walking or jogging in real time becomes essential to provide appropriate assistance based on the activity. This dissertation proposes two real-time human activity recognition algorithms intelligent fuzzy inference (IFI) algorithm and Amplitude omega (AωA \omega) algorithm to identify the human activities, i.e., stationary and locomotion activities. The IFI algorithm uses knee angle and ground contact forces (GCFs) measurements from four inertial measurement units (IMUs) and a pair of smart shoes. Whereas, the AωA \omega algorithm is based on thigh angle measurements from a single IMU. This dissertation also attempts to address the problem of online tuning of virtual impedance for an assistive robot based on real-time gait and activity measurement data to personalize the assistance for different users. An automatic impedance tuning (AIT) approach is presented for a knee assistive device (KAD) in which the IFI algorithm is used for real-time activity measurements. This dissertation also proposes an adaptive oscillator method known as amplitude omega adaptive oscillator (AωAOA\omega AO) method for HeSA (hip exoskeleton for superior augmentation) to provide bilateral hip assistance during human locomotion activities. The AωA \omega algorithm is integrated into the adaptive oscillator method to make the approach robust for different locomotion activities. Experiments are performed on healthy subjects to validate the efficacy of the human activities recognition algorithms and control strategies proposed in this dissertation. Both the activity recognition algorithms exhibited higher classification accuracy with less update time. The results of AIT demonstrated that the KAD assistive torque was smoother and EMG signal of Vastus Medialis is reduced, compared to constant impedance and finite state machine approaches. The AωAOA\omega AO method showed real-time learning of the locomotion activities signals for three healthy subjects while wearing HeSA. To understand the influence of the assistive devices on the inherent dynamic gait stability of the human, stability analysis is performed. For this, the stability metrics derived from dynamical systems theory are used to evaluate unilateral knee assistance applied to the healthy participants.Dissertation/ThesisDoctoral Dissertation Aerospace Engineering 201

    Progress toward an understanding of cortical computation

    Get PDF
    The additional data, perspectives, questions, and criticisms contributed by the commentaries strengthen our view that local cortical processors coordinate their activity with the context in which it occurs using contextual fields and synchronized population codes. We therefore predict that whereas the specialization of function has been the keynote of this century the coordination of function will be the keynote of the next
    corecore