1,109 research outputs found

    The hippocampus and cerebellum in adaptively timed learning, recognition, and movement

    Full text link
    The concepts of declarative memory and procedural memory have been used to distinguish two basic types of learning. A neural network model suggests how such memory processes work together as recognition learning, reinforcement learning, and sensory-motor learning take place during adaptive behaviors. To coordinate these processes, the hippocampal formation and cerebellum each contain circuits that learn to adaptively time their outputs. Within the model, hippocampal timing helps to maintain attention on motivationally salient goal objects during variable task-related delays, and cerebellar timing controls the release of conditioned responses. This property is part of the model's description of how cognitive-emotional interactions focus attention on motivationally valued cues, and how this process breaks down due to hippocampal ablation. The model suggests that the hippocampal mechanisms that help to rapidly draw attention to salient cues could prematurely release motor commands were not the release of these commands adaptively timed by the cerebellum. The model hippocampal system modulates cortical recognition learning without actually encoding the representational information that the cortex encodes. These properties avoid the difficulties faced by several models that propose a direct hippocampal role in recognition learning. Learning within the model hippocampal system controls adaptive timing and spatial orientation. Model properties hereby clarify how hippocampal ablations cause amnesic symptoms and difficulties with tasks which combine task delays, novelty detection, and attention towards goal objects amid distractions. When these model recognition, reinforcement, sensory-motor, and timing processes work together, they suggest how the brain can accomplish conditioning of multiple sensory events to delayed rewards, as during serial compound conditioning.Air Force Office of Scientific Research (F49620-92-J-0225, F49620-86-C-0037, 90-0128); Advanced Research Projects Agency (ONR N00014-92-J-4015); Office of Naval Research (N00014-91-J-4100, N00014-92-J-1309, N00014-92-J-1904); National Institute of Mental Health (MH-42900

    DAD-3DHeads: A Large-scale Dense, Accurate and Diverse Dataset for 3D Head Alignment from a Single Image

    Get PDF
    We present DAD-3DHeads, a dense and diverse large-scale dataset, and a robust model for 3D Dense Head Alignment in the wild. It contains annotations of over 3.5K landmarks that accurately represent 3D head shape compared to the ground-truth scans. The data-driven model, DAD-3DNet, trained on our dataset, learns shape, expression, and pose parameters, and performs 3D reconstruction of a FLAME mesh. The model also incorporates a landmark prediction branch to take advantage of rich supervision and co-training of multiple related tasks. Experimentally, DAD-3DNet outperforms or is comparable to the state-of-the-art models in (i) 3D Head Pose Estimation on AFLW2000-3D and BIWI, (ii) 3D Face Shape Reconstruction on NoW and Feng, and (iii) 3D Dense Head Alignment and 3D Landmarks Estimation on DAD-3DHeads dataset. Finally, the diversity of DAD-3DHeads in camera angles, facial expressions, and occlusions enables a benchmark to study in-the-wild generalization and robustness to distribution shifts. The dataset webpage is https://p.farm/research/dad-3dheads

    Perceptual categorization

    Get PDF
    The categorization of external stimuli lies at the heart of cognitive science. Existing models of perceptual categorization assume (a) information about the absolute magnitude of a stimulus is used in the categorization decision, and (b) the representation of a stimulus does not change with experience. The three experimental programs presented here challenge these two assumptions. The experiments in Chapter 2 demonstrate that existing models of categorization are unable to predict the classification of items intermediate between two categories. Chapter 3 provides empirical evidence that categorization responses are heavily influenced by the immediately preceding context, consistent with evidence from absolute identification showing people have very poor access to absolute magnitude information. A memory and contrast model is presented where each categorization decision is based on the perceived difference between the current stimulus and immediately preceding stimuli. This model is shown to account for the data from Chapters 2 and 3. Chapter 4 explores the claim that new features may be created on experience with novel stimuli, and that these features serve to alter the representation of stimuli to facilitate new categorization tasks. An alternative account is offered for existing feature creation evidence. However, experimental work re-establishes a feature creation effect. Consideration is given as to how feature creation and memory and contrast accounts of categorization may be integrated, together with extensive suggestions for the development of these ideas

    Linear Regression and Unsupervised Learning For Tracking and Embodied Robot Control.

    Get PDF
    Computer vision problems, such as tracking and robot navigation, tend to be solved using models of the objects of interest to the problem. These models are often either hard-coded, or learned in a supervised manner. In either case, an engineer is required to identify the visual information that is important to the task, which is both time consuming and problematic. Issues with these engineered systems relate to the ungrounded nature of the knowledge imparted by the engineer, where the systems have no meaning attached to the representations. This leads to systems that are brittle and are prone to failure when expected to act in environments not envisaged by the engineer. The work presented in this thesis removes the need for hard-coded or engineered models of either visual information representations or behaviour. This is achieved by developing novel approaches for learning from example, in both input (percept) and output (action) spaces. This approach leads to the development of novel feature tracking algorithms, and methods for robot control. Applying this approach to feature tracking, unsupervised learning is employed, in real time, to build appearance models of the target that represent the input space structure, and this structure is exploited to partition banks of computationally efficient, linear regression based target displacement estimators. This thesis presents the first application of regression based methods to the problem of simultaneously modeling and tracking a target object. The computationally efficient Linear Predictor (LP) tracker is investigated, along with methods for combining and weighting flocks of LP’s. The tracking algorithms developed operate with accuracy comparable to other state of the art online approaches and with a significant gain in computational efficiency. This is achieved as a result of two specific contributions. First, novel online approaches for the unsupervised learning of modes of target appearance that identify aspects of the target are introduced. Second, a general tracking framework is developed within which the identified aspects of the target are adaptively associated to subsets of a bank of LP trackers. This results in the partitioning of LP’s and the online creation of aspect specific LP flocks that facilitate tracking through significant appearance changes. Applying the approach to the percept action domain, unsupervised learning is employed to discover the structure of the action space, and this structure is used in the formation of meaningful perceptual categories, and to facilitate the use of localised input-output (percept-action) mappings. This approach provides a realisation of an embodied and embedded agent that organises its perceptual space and hence its cognitive process based on interactions with its environment. Central to the proposed approach is the technique of clustering an input-output exemplar set, based on output similarity, and using the resultant input exemplar groupings to characterise a perceptual category. All input exemplars that are coupled to a certain class of outputs form a category - the category of a given affordance, action or function. In this sense the formed perceptual categories have meaning and are grounded in the embodiment of the agent. The approach is shown to identify the relative importance of perceptual features and is able to solve percept-action tasks, defined only by demonstration, in previously unseen situations. Within this percept-action learning framework, two alternative approaches are developed. The first approach employs hierarchical output space clustering of point-to-point mappings, to achieve search efficiency and input and output space generalisation as well as a mechanism for identifying the important variance and invariance in the input space. The exemplar hierarchy provides, in a single structure, a mechanism for classifying previously unseen inputs and generating appropriate outputs. The second approach to a percept-action learning framework integrates the regression mappings used in the feature tracking domain, with the action space clustering and imitation learning techniques developed in the percept-action domain. These components are utilised within a novel percept-action data mining methodology, that is able to discover the visual entities that are important to a specific problem, and to map from these entities onto the action space. Applied to the robot control task, this approach allows for real-time generation of continuous action signals, without the use of any supervision or definition of representations or rules of behaviour

    A Computational Model of Visual Recognition Memory via Grid Cells

    Get PDF
    Models of face, object, and scene recognition traditionally focus on massively parallel processing of low-level features, with higher-order representations emerging at later processing stages. However, visual perception is tightly coupled to eye movements, which are necessarily sequential. Recently, neurons in entorhinal cortex have been reported with grid cell-like firing in response to eye movements, i.e., in visual space. Following the presumed role of grid cells in vector navigation, we propose a model of recognition memory for familiar faces, objects, and scenes, in which grid cells encode translation vectors between salient stimulus features. A sequence of saccadic eye-movement vectors, moving from one salient feature to the expected location of the next, potentially confirms an initial hypothesis (accumulating evidence toward a threshold) about stimulus identity, based on the relative feature layout (i.e., going beyond recognition of individual features). The model provides an explicit neural mechanism for the long-held view that directed saccades support hypothesis-driven, constructive perception and recognition; is compatible with holistic face processing; and constitutes the first quantitative proposal for a role of grid cells in visual recognition. The variance of grid cell activity along saccade trajectories exhibits 6-fold symmetry across 360 degrees akin to recently reported fMRI data. The model suggests that disconnecting grid cells from occipitotemporal inputs may yield prosopagnosia-like symptoms. The mechanism is robust with regard to partial visual occlusion, can accommodate size and position invariance, and suggests a functional explanation for medial temporal lobe involvement in visual memory for relational information and memory-guided attention

    Activity Analysis; Finding Explanations for Sets of Events

    Get PDF
    Automatic activity recognition is the computational process of analysing visual input and reasoning about detections to understand the performed events. In all but the simplest scenarios, an activity involves multiple interleaved events, some related and others independent. The activity in a car park or at a playground would typically include many events. This research assumes the possible events and any constraints between the events can be defined for the given scene. Analysing the activity should thus recognise a complete and consistent set of events; this is referred to as a global explanation of the activity. By seeking a global explanation that satisfies the activity’s constraints, infeasible interpretations can be avoided, and ambiguous observations may be resolved. An activity’s events and any natural constraints are defined using a grammar formalism. Attribute Multiset Grammars (AMG) are chosen because they allow defining hierarchies, as well as attribute rules and constraints. When used for recognition, detectors are employed to gather a set of detections. Parsing the set of detections by the AMG provides a global explanation. To find the best parse tree given a set of detections, a Bayesian network models the probability distribution over the space of possible parse trees. Heuristic and exhaustive search techniques are proposed to find the maximum a posteriori global explanation. The framework is tested for two activities: the activity in a bicycle rack, and around a building entrance. The first case study involves people locking bicycles onto a bicycle rack and picking them up later. The best global explanation for all detections gathered during the day resolves local ambiguities from occlusion or clutter. Intensive testing on 5 full days proved global analysis achieves higher recognition rates. The second case study tracks people and any objects they are carrying as they enter and exit a building entrance. A complete sequence of the person entering and exiting multiple times is recovered by the global explanation

    Image categorisation using parallel network constructs: an emulation of early human colour processing and context evaluation

    Get PDF
    PhD ThesisTraditional geometric scene analysis cannot attempt to address the understanding of human vision. Instead it adopts an algorithmic approach, concentrating on geometric model fitting. Human vision, however, is both quick and accurate but very little is known about how the recognition of objects is performed with such speed and efficiency. It is thought that there must be some process both for coding and storage which can account for these characteristics. In this thesis a more strict emulation of human vision, based on work derived from medical psychology and other fields, is proposed. Human beings must store perceptual information from which to make comparisons, derive structures and classify objects. It is widely thought by cognitive psychologists that some form of symbolic representation is inherent in this storage. Here a mathematical syntax is defined to perform this kind of symbolic description. The symbolic structures must be capable of manipulation and a set of operators is defined for this purpose. The early visual cortex and geniculate body are both inherently parallel in operation and simple in structure. A broadly connectionist emulation of this kind of structure is described, using independent computing elements, which can perform segmentation, re-colouring and generation of the base elements of the description syntax. Primal colour information is then collected by a second network which forms the visual topology, colouring and position information of areas in the image as well as a full description of the scene in terms of a more complex symbolic set. The idea of different visual contexts is introduced and a model is proposed for the accumulation of context rules. This model is then applied to a database of natural images.EPSRC CASE award: Neural Computer Sciences,Southampton
    • …
    corecore