2,887 research outputs found

    Toward a Taxonomy and Computational Models of Abnormalities in Images

    Full text link
    The human visual system can spot an abnormal image, and reason about what makes it strange. This task has not received enough attention in computer vision. In this paper we study various types of atypicalities in images in a more comprehensive way than has been done before. We propose a new dataset of abnormal images showing a wide range of atypicalities. We design human subject experiments to discover a coarse taxonomy of the reasons for abnormality. Our experiments reveal three major categories of abnormality: object-centric, scene-centric, and contextual. Based on this taxonomy, we propose a comprehensive computational model that can predict all different types of abnormality in images and outperform prior arts in abnormality recognition.Comment: To appear in the Thirtieth AAAI Conference on Artificial Intelligence (AAAI 2016

    The Neural Basis of Object-Context Relationships on Aesthetic Judgment

    Get PDF
    The relationship between contextual information and object perception has received considerable attention in neuroimaging studies. In the work reported here, we used functional magnetic resonance imaging (fMRI) to investigate the relationship between aesthetic judgment and images of objects in their normal contextual setting versus images of objects in abnormal contextual settings and the underlying brain activity. When object-context relationships are violated changes in visual perception and aesthetic judgment emerges that exposes the contribution of vision to interpretations shaped by previous experience. We found that effects of context on aesthetic judgment modulates different memory sub-systems, while aesthetic judgment regardless of context recruit medial and lateral aspects of the orbitofrontal cortex, consistent with previous findings. Visual cortical areas traditionally associated with the processing of visual features are recruited in normal contexts, irrespective of aesthetic ratings, while prefrontal areas are significantly more engaged when objects are viewed in unaccustomed settings

    Laminar fMRI: applications for cognitive neuroscience

    Get PDF
    The cortex is a massively recurrent network, characterized by feedforward and feedback connections between brain areas as well as lateral connections within an area. Feedforward, horizontal and feedback responses largely activate separate layers of a cortical unit, meaning they can be dissociated by lamina-resolved neurophysiological techniques. Such techniques are invasive and are therefore rarely used in humans. However, recent developments in high spatial resolution fMRI allow for non-invasive, in vivo measurements of brain responses specific to separate cortical layers. This provides an important opportunity to dissociate between feedforward and feedback brain responses, and investigate communication between brain areas at a more fine- grained level than previously possible in the human species. In this review, we highlight recent studies that successfully used laminar fMRI to isolate layer-specific feedback responses in human sensory cortex. In addition, we review several areas of cognitive neuroscience that stand to benefit from this new technological development, highlighting contemporary hypotheses that yield testable predictions for laminar fMRI. We hope to encourage researchers with the opportunity to embrace this development in fMRI research, as we expect that many future advancements in our current understanding of human brain function will be gained from measuring lamina-specific brain responses

    Activity understanding and unusual event detection in surveillance videos

    Get PDF
    PhDComputer scientists have made ceaseless efforts to replicate cognitive video understanding abilities of human brains onto autonomous vision systems. As video surveillance cameras become ubiquitous, there is a surge in studies on automated activity understanding and unusual event detection in surveillance videos. Nevertheless, video content analysis in public scenes remained a formidable challenge due to intrinsic difficulties such as severe inter-object occlusion in crowded scene and poor quality of recorded surveillance footage. Moreover, it is nontrivial to achieve robust detection of unusual events, which are rare, ambiguous, and easily confused with noise. This thesis proposes solutions for resolving ambiguous visual observations and overcoming unreliability of conventional activity analysis methods by exploiting multi-camera visual context and human feedback. The thesis first demonstrates the importance of learning visual context for establishing reliable reasoning on observed activity in a camera network. In the proposed approach, a new Cross Canonical Correlation Analysis (xCCA) is formulated to discover and quantify time delayed pairwise correlations of regional activities observed within and across multiple camera views. This thesis shows that learning time delayed pairwise activity correlations offers valuable contextual information for (1) spatial and temporal topology inference of a camera network, (2) robust person re-identification, and (3) accurate activity-based video temporal segmentation. Crucially, in contrast to conventional methods, the proposed approach does not rely on either intra-camera or inter-camera object tracking; it can thus be applied to low-quality surveillance videos featuring severe inter-object occlusions. Second, to detect global unusual event across multiple disjoint cameras, this thesis extends visual context learning from pairwise relationship to global time delayed dependency between regional activities. Specifically, a Time Delayed Probabilistic Graphical Model (TD-PGM) is proposed to model the multi-camera activities and their dependencies. Subtle global unusual events are detected and localised using the model as context-incoherent patterns across multiple camera views. In the model, different nodes represent activities in different decomposed re3 gions from different camera views, and the directed links between nodes encoding time delayed dependencies between activities observed within and across camera views. In order to learn optimised time delayed dependencies in a TD-PGM, a novel two-stage structure learning approach is formulated by combining both constraint-based and scored-searching based structure learning methods. Third, to cope with visual context changes over time, this two-stage structure learning approach is extended to permit tractable incremental update of both TD-PGM parameters and its structure. As opposed to most existing studies that assume static model once learned, the proposed incremental learning allows a model to adapt itself to reflect the changes in the current visual context, such as subtle behaviour drift over time or removal/addition of cameras. Importantly, the incremental structure learning is achieved without either exhaustive search in a large graph structure space or storing all past observations in memory, making the proposed solution memory and time efficient. Forth, an active learning approach is presented to incorporate human feedback for on-line unusual event detection. Contrary to most existing unsupervised methods that perform passive mining for unusual events, the proposed approach automatically requests supervision for critical points to resolve ambiguities of interest, leading to more robust detection of subtle unusual events. The active learning strategy is formulated as a stream-based solution, i.e. it makes decision on-the-fly on whether to request label for each unlabelled sample observed in sequence. It selects adaptively two active learning criteria, namely likelihood criterion and uncertainty criterion to achieve (1) discovery of unknown event classes and (2) refinement of classification boundary. The effectiveness of the proposed approaches is validated using videos captured from busy public scenes such as underground stations and traffic intersections

    Context-based scene recognition from visual data in smart homes: an Information Fusion approach

    Get PDF
    Ambient Intelligence (AmI) aims at the development of computational systems that process data acquired by sensors embedded in the environment to support users in everyday tasks. Visual sensors, however, have been scarcely used in this kind of applications, even though they provide very valuable information about scene objects: position, speed, color, texture, etc. In this paper, we propose a cognitive framework for the implementation of AmI applications based on visual sensor networks. The framework, inspired by the Information Fusion paradigm, combines a priori context knowledge represented with ontologies with real time single camera data to support logic-based high-level local interpretation of the current situation. In addition, the system is able to automatically generate feedback recommendations to adjust data acquisition procedures. Information about recognized situations is eventually collected by a central node to obtain an overall description of the scene and consequently trigger AmI services. We show the extensible and adaptable nature of the approach with a prototype system in a smart home scenario.This research activity is supported in part by Projects CICYT TIN2008-06742-C02-02/TSI, CICYT TEC2008- 06732-C02-02/TEC, CAM CONTEXTS (S2009/TIC-1485) and DPS2008-07029-C02-02.Publicad

    Computational Modelling of Information Gathering

    Get PDF
    This thesis describes computational modelling of information gathering behaviour under active inference – a framework for describing Bayes optimal behaviour. Under active inference perception, attention and action all serve for same purpose: minimising variational free energy. Variational free energy is an upper bound on surprise and minimising it maximises an agent’s evidence for its survival. An agent achieves this by acquiring information (resolving uncertainty) about the hidden states of the world and uses the acquired information to act on the outcomes it prefers. In this work I placed special emphasis on the resolution of uncertainty about the states of the world. I first created a visual search task called scene construction task. In this task one needs to accumulate evidence for competing hypotheses (different visual scenes) through sequential sampling of a visual scene and categorising it once there is sufficient evidence. I showed that a computational agent attends to the most salient (epistemically valuable) locations in this task. In the next, this task was performed by healthy humans. Healthy people’s exploration strategies provided evidence for uncertainty driven exploration. I also showed how different exploratory behaviours can be characterised using canonical correlation analysis. In the next study I showed how exploration of a visual scene under different instructions could be explained by appealing to the computational mechanisms that may correspond to attention. This entailed manipulating the precision of task irrelevant cues and their hidden causes as a function of instructions. In the final work, I was interested in characterising impulsive behaviour using a patch leaving paradigm. By varying the parameters of the MDP model, I showed that there could be at least three distinct causes of impulsive behaviour, namely a lower depth of planning, a lower capacity to maintain and process information, and an increased perceived value of immediate rewards

    On the functions, mechanisms, and malfunctions of intracortical contextual modulation

    Get PDF
    A broad neuron-centric conception of contextual modulation is reviewed and re-assessed in the light of recent neurobiological studies of amplification, suppression, and synchronization. Behavioural and computational studies of perceptual and higher cognitive functions that depend on these processes are outlined, and evidence that those functions and their neuronal mechanisms are impaired in schizophrenia is summarized. Finally, we compare and assess the long-term biological functions of contextual modulation at the level of computational theory as formalized by the theories of coherent infomax and free energy reduction. We conclude that those theories, together with the many empirical findings reviewed, show how contextual modulation at the neuronal level enables the cortex to flexibly adapt the use of its knowledge to current circumstances by amplifying and grouping relevant activities and by suppressing irrelevant activities

    Video surveillance systems-current status and future trends

    Get PDF
    Within this survey an attempt is made to document the present status of video surveillance systems. The main components of a surveillance system are presented and studied thoroughly. Algorithms for image enhancement, object detection, object tracking, object recognition and item re-identification are presented. The most common modalities utilized by surveillance systems are discussed, putting emphasis on video, in terms of available resolutions and new imaging approaches, like High Dynamic Range video. The most important features and analytics are presented, along with the most common approaches for image / video quality enhancement. Distributed computational infrastructures are discussed (Cloud, Fog and Edge Computing), describing the advantages and disadvantages of each approach. The most important deep learning algorithms are presented, along with the smart analytics that they utilize. Augmented reality and the role it can play to a surveillance system is reported, just before discussing the challenges and the future trends of surveillance
    corecore