1,395 research outputs found

    Visual Decoding of Targets During Visual Search From Human Eye Fixations

    Full text link
    What does human gaze reveal about a users' intents and to which extend can these intents be inferred or even visualized? Gaze was proposed as an implicit source of information to predict the target of visual search and, more recently, to predict the object class and attributes of the search target. In this work, we go one step further and investigate the feasibility of combining recent advances in encoding human gaze information using deep convolutional neural networks with the power of generative image models to visually decode, i.e. create a visual representation of, the search target. Such visual decoding is challenging for two reasons: 1) the search target only resides in the user's mind as a subjective visual pattern, and can most often not even be described verbally by the person, and 2) it is, as of yet, unclear if gaze fixations contain sufficient information for this task at all. We show, for the first time, that visual representations of search targets can indeed be decoded only from human gaze fixations. We propose to first encode fixations into a semantic representation and then decode this representation into an image. We evaluate our method on a recent gaze dataset of 14 participants searching for clothing in image collages and validate the model's predictions using two human studies. Our results show that 62% (Chance level = 10%) of the time users were able to select the categories of the decoded image right. In our second studies we show the importance of a local gaze encoding for decoding visual search targets of use

    Innovating with Artificial Intelligence: Capturing the Constructive Functional Capabilities of Deep Generative Learning

    Get PDF
    As an emerging species of artificial intelligence, deep generative learning models can generate an unprecedented variety of new outputs. Examples include the creation of music, text-to-image translation, or the imputation of missing data. Similar to other AI models that already evoke significant changes in society and economy, there is a need for structuring the constructive functional capabilities of DGL. To derive and discuss them, we conducted an extensive and structured literature review. Our results reveal a substantial scope of six constructive functional capabilities demonstrating that DGL is not exclusively used to generate unseen outputs. Our paper further guides companies in capturing and evaluating DGL’s potential for innovation. Besides, our paper fosters an understanding of DGL and provides a conceptual basis for further research

    Large-scale Foundation Models and Generative AI for BigData Neuroscience

    Full text link
    Recent advances in machine learning have made revolutionary breakthroughs in computer games, image and natural language understanding, and scientific discovery. Foundation models and large-scale language models (LLMs) have recently achieved human-like intelligence thanks to BigData. With the help of self-supervised learning (SSL) and transfer learning, these models may potentially reshape the landscapes of neuroscience research and make a significant impact on the future. Here we present a mini-review on recent advances in foundation models and generative AI models as well as their applications in neuroscience, including natural language and speech, semantic memory, brain-machine interfaces (BMIs), and data augmentation. We argue that this paradigm-shift framework will open new avenues for many neuroscience research directions and discuss the accompanying challenges and opportunities

    Brain2Pix: Fully convolutional naturalistic video reconstruction from brain activity

    Get PDF
    Reconstructing complex and dynamic visual perception from brain activity remains a major challenge in machine learning applications to neuroscience. Here we present a new method for reconstructing naturalistic images and videos from very large single-participant functional magnetic resonance data that leverages the recent success of image-to-image transformation networks. This is achieved by exploiting spatial information obtained from retinotopic mappings across the visual system. More specifically, we first determine what position each voxel in a particular region of interest would represent in the visual field based on its corresponding receptive field location. Then, the 2D image representation of the brain activity on the visual field is passed to a fully convolutional image-to-image network trained to recover the original stimuli using VGG feature loss with an adversarial regularizer. In our experiments, we show that our method offers a significant improvement over existing video reconstruction technique
    corecore