1,046 research outputs found

    Neural Encoding and Decoding with Deep Learning for Natural Vision

    Get PDF
    The overarching objective of this work is to bridge neuroscience and artificial intelligence to ultimately build machines that learn, act, and think like humans. In the context of vision, the brain enables humans to readily make sense of the visual world, e.g. recognizing visual objects. Developing human-like machines requires understanding the working principles underlying the human vision. In this dissertation, I ask how the brain encodes and represents dynamic visual information from the outside world, whether brain activity can be directly decoded to reconstruct and categorize what a person is seeing, and whether neuroscience theory can be applied to artificial models to advance computer vision. To address these questions, I used deep neural networks (DNN) to establish encoding and decoding models for describing the relationships between the brain and the visual stimuli. Using the DNN, the encoding models were able to predict the functional magnetic resonance imaging (fMRI) responses throughout the visual cortex given video stimuli; the decoding models were able to reconstruct and categorize the visual stimuli based on fMRI activity. To further advance the DNN model, I have implemented a new bidirectional and recurrent neural network based on the predictive coding theory. As a theory in neuroscience, predictive coding explains the interaction among feedforward, feedback, and recurrent connections. The results showed that this brain-inspired model significantly outperforms feedforward-only DNNs in object recognition. These studies have positive impact on understanding the neural computations under human vision and improving computer vision with the knowledge from neuroscience

    Naturalistic stimuli reveal a dominant role for agentic action in visual representation

    Get PDF
    Abstract Naturalistic, dynamic movies evoke strong, consistent, and information-rich patterns of activity over a broad expanse of cortex and engage multiple perceptual and cognitive systems in parallel. The use of naturalistic stimuli enables functional brain imaging research to explore cognitive domains that are poorly sampled in highly-controlled experiments. These domains include perception and understanding of agentic action, which plays a larger role in visual representation than was appreciated from experiments using static, controlled stimuli

    Using Multivariate Pattern Analysis to Identify Conceptual Knowledge Representation in the Brain

    Get PDF
    Representation of semantic knowledge is an important aspect of cognitive function. The processing of concrete (e.g., book) and abstract (e.g., freedom) semantic concepts show systematic differences on various behavioral measures in both healthy and clinical populations. However, previous studies examining the difference in the neural substrates correlating with abstract and concrete concept representations have reached inconsistent conclusions. This dissertation used multiple novel data analyses approaches on functional magnetic resonance imaging (fMRI) data, to investigate representational differences of abstract and concrete concepts and to provide converging evidence that the representations of abstract and concrete semantic knowledge in the brain rely on different mechanisms. Study 1 used meta-analysis method on a combined sample of 303 participants to quantitatively summarize the published neuroimaging studies on the brain regions with category-specific activations. Results suggested greater engagement of working memory and language system for processing abstract concepts, and greater engagement of the visual perceptual system for processing of concrete concepts, likely via mental imagery. Study 2 showed successful identifications of single trial fMRI data as being associated with the processing of either abstract or concrete concepts based on multivoxel activity patterns in widespread brain areas, suggesting that abstract vs. concrete differences were represented by multiple mechanisms. Study 3 investigated the classification based on condition-specific connectivity patterns. Results showed successful identifications of the connectivity patterns as abstract or concrete for an individual based on the connectivity patterns of other individuals, both by the connectivity for a priory selected seed regions as well as by the whole-brain voxel-by-voxel connectivity patterns. The results indicated the existence of condition-specific connectivity patterns that were consistent across individuals on a whole-brain scale. Moreover, the results also suggested the representation of abstract and concrete concepts differs from the semantic association perspective in addition to differences on coding forms. Study 4 illustrated the application of MVPA as a cross-modal prediction approach, which is a promising method for further investigation of semantic knowledge representation in the brain, by investigating the role of general semantic system on person-specific knowledge. Overall, the work described in this dissertation provides converging evidence of the representational difference between abstract and concrete concepts. The differences are suggested to occur at various levels, including the dependence on modality-specific perceptual systems, the organization of associations among different semantic-related systems, and the difficulty and strategy of retrieving contextual information

    Constraint-free Natural Image Reconstruction from fMRI Signals Based on Convolutional Neural Network

    Full text link
    In recent years, research on decoding brain activity based on functional magnetic resonance imaging (fMRI) has made remarkable achievements. However, constraint-free natural image reconstruction from brain activity is still a challenge. The existing methods simplified the problem by using semantic prior information or just reconstructing simple images such as letters and digitals. Without semantic prior information, we present a novel method to reconstruct nature images from fMRI signals of human visual cortex based on the computation model of convolutional neural network (CNN). Firstly, we extracted the units output of viewed natural images in each layer of a pre-trained CNN as CNN features. Secondly, we transformed image reconstruction from fMRI signals into the problem of CNN feature visualizations by training a sparse linear regression to map from the fMRI patterns to CNN features. By iteratively optimization to find the matched image, whose CNN unit features become most similar to those predicted from the brain activity, we finally achieved the promising results for the challenging constraint-free natural image reconstruction. As there was no use of semantic prior information of the stimuli when training decoding model, any category of images (not constraint by the training set) could be reconstructed theoretically. We found that the reconstructed images resembled the natural stimuli, especially in position and shape. The experimental results suggest that hierarchical visual features can effectively express the visual perception process of human brain

    CHORUS Deliverable 2.2: Second report - identification of multi-disciplinary key issues for gap analysis toward EU multimedia search engines roadmap

    Get PDF
    After addressing the state-of-the-art during the first year of Chorus and establishing the existing landscape in multimedia search engines, we have identified and analyzed gaps within European research effort during our second year. In this period we focused on three directions, notably technological issues, user-centred issues and use-cases and socio- economic and legal aspects. These were assessed by two central studies: firstly, a concerted vision of functional breakdown of generic multimedia search engine, and secondly, a representative use-cases descriptions with the related discussion on requirement for technological challenges. Both studies have been carried out in cooperation and consultation with the community at large through EC concertation meetings (multimedia search engines cluster), several meetings with our Think-Tank, presentations in international conferences, and surveys addressed to EU projects coordinators as well as National initiatives coordinators. Based on the obtained feedback we identified two types of gaps, namely core technological gaps that involve research challenges, and “enablers”, which are not necessarily technical research challenges, but have impact on innovation progress. New socio-economic trends are presented as well as emerging legal challenges

    How the brain grasps tools: fMRI & motion-capture investigations

    Get PDF
    Humans’ ability to learn about and use tools is considered a defining feature of our species, with most related neuroimaging investigations involving proxy 2D picture viewing tasks. Using a novel tool grasping paradigm across three experiments, participants grasped 3D-printed tools (e.g., a knife) in ways that were considered to be typical (i.e., by the handle) or atypical (i.e., by the blade) for subsequent use. As a control, participants also performed grasps in corresponding directions on a series of 3D-printed non-tool objects, matched for properties including elongation and object size. Project 1 paired a powerful fMRI block-design with visual localiser Region of Interest (ROI) and searchlight Multivoxel Pattern Analysis (MVPA) approaches. Most remarkably, ROI MVPA revealed that hand-selective, but not anatomically overlapping tool-selective, areas of the left Lateral Occipital Temporal Cortex and Intraparietal Sulcus represented the typicality of tool grasping. Searchlight MVPA found similar evidence within left anterior temporal cortex as well as right parietal and temporal areas. Project 2 measured hand kinematics using motion-capture during a highly similar procedure, finding hallmark grip scaling effects despite the unnatural task demands. Further, slower movements were observed when grasping tools, relative to non-tools, with grip scaling also being poorer for atypical tool, compared to non-tool, grasping. Project 3 used a slow-event related fMRI design to investigate whether representations of typicality were detectable during motor planning, but MVPA was largely unsuccessful, presumably due to a lack of statistical power. Taken together, the representations of typicality identified within areas of the ventral and dorsal, but not ventro-dorsal, pathways have implications for specific predictions made by leading theories about the neural regions supporting human tool-use, including dual visual stream theory and the two-action systems model

    A detection-based pattern recognition framework and its applications

    Get PDF
    The objective of this dissertation is to present a detection-based pattern recognition framework and demonstrate its applications in automatic speech recognition and broadcast news video story segmentation. Inspired by the studies of modern cognitive psychology and real-world pattern recognition systems, a detection-based pattern recognition framework is proposed to provide an alternative solution for some complicated pattern recognition problems. The primitive features are first detected and the task-specific knowledge hierarchy is constructed level by level; then a variety of heterogeneous information sources are combined together and the high-level context is incorporated as additional information at certain stages. A detection-based framework is a â divide-and-conquerâ design paradigm for pattern recognition problems, which will decompose a conceptually difficult problem into many elementary sub-problems that can be handled directly and reliably. Some information fusion strategies will be employed to integrate the evidence from a lower level to form the evidence at a higher level. Such a fusion procedure continues until reaching the top level. Generally, a detection-based framework has many advantages: (1) more flexibility in both detector design and fusion strategies, as these two parts can be optimized separately; (2) parallel and distributed computational components in primitive feature detection. In such a component-based framework, any primitive component can be replaced by a new one while other components remain unchanged; (3) incremental information integration; (4) high level context information as additional information sources, which can be combined with bottom-up processing at any stage. This dissertation presents the basic principles, criteria, and techniques for detector design and hypothesis verification based on the statistical detection and decision theory. In addition, evidence fusion strategies were investigated in this dissertation. Several novel detection algorithms and evidence fusion methods were proposed and their effectiveness was justified in automatic speech recognition and broadcast news video segmentation system. We believe such a detection-based framework can be employed in more applications in the future.Ph.D.Committee Chair: Lee, Chin-Hui; Committee Member: Clements, Mark; Committee Member: Ghovanloo, Maysam; Committee Member: Romberg, Justin; Committee Member: Yuan, Min

    Change blindness: eradication of gestalt strategies

    Get PDF
    Arrays of eight, texture-defined rectangles were used as stimuli in a one-shot change blindness (CB) task where there was a 50% chance that one rectangle would change orientation between two successive presentations separated by an interval. CB was eliminated by cueing the target rectangle in the first stimulus, reduced by cueing in the interval and unaffected by cueing in the second presentation. This supports the idea that a representation was formed that persisted through the interval before being 'overwritten' by the second presentation (Landman et al, 2003 Vision Research 43149–164]. Another possibility is that participants used some kind of grouping or Gestalt strategy. To test this we changed the spatial position of the rectangles in the second presentation by shifting them along imaginary spokes (by ±1 degree) emanating from the central fixation point. There was no significant difference seen in performance between this and the standard task [F(1,4)=2.565, p=0.185]. This may suggest two things: (i) Gestalt grouping is not used as a strategy in these tasks, and (ii) it gives further weight to the argument that objects may be stored and retrieved from a pre-attentional store during this task

    QoS framework for video streaming in home networks

    Get PDF
    In this thesis we present a new SNR scalable video coding scheme. An important advantage of the proposed scheme is that it requires just a standard video decoder for processing each layer. The quality of the delivered video depends on the allocation of bit rates to the base and enhancement layers. For a given total bit rate, the combination with a bigger base layer delivers higher quality. The absence of dependencies between frames in enhancement layers makes the system resilient to losses of arbitrary frames from an enhancement layer. Furthermore, that property can be used in a more controlled fashion. An important characteristic of any video streaming scheme is the ability to handle network bandwidth fluctuations. We made a streaming technique that observes the network conditions and based on the observations reconfigures the layer configuration in order to achieve the best possible quality. A change of the network conditions forces a change in the number of layers or the bit rate of these layers. Knowledge of the network conditions allows delivery of a video of higher quality by choosing an optimal layer configuration. When the network degrades, the amount of data transmitted per second is decreased by skipping frames from an enhancement layer on the sender side. The presented video coding scheme allows skipping any frame from an enhancement layer, thus enabling an efficient real-time control over transmission at the network level and fine-grained control over the decoding of video data. The methodology proposed is not MPEG-2 specific and can be applied to other coding standards. We made a terminal resource manager that enables trade-offs between quality and resource consumption due to the use of scalable video coding in combination with scalable video algorithms. The controller developed for the decoding process optimizes the perceived quality with respect to the CPU power available and the amount of input data. The controller does not depend on the type of scalability technique and can therefore be used with any scalable video. The controller uses the strategy that is created offline by means of a Markov Decision Process. During the evaluation it was found that the correctness of the controller behavior depends on the correctness of parameter settings for MDP, so user tests should be employed to find the optimal settings

    Max-Planck-Institute for Psycholinguistics: Annual Report 2003

    Get PDF
    • …
    corecore