1,395 research outputs found
Visual Decoding of Targets During Visual Search From Human Eye Fixations
What does human gaze reveal about a users' intents and to which extend can
these intents be inferred or even visualized? Gaze was proposed as an implicit
source of information to predict the target of visual search and, more
recently, to predict the object class and attributes of the search target. In
this work, we go one step further and investigate the feasibility of combining
recent advances in encoding human gaze information using deep convolutional
neural networks with the power of generative image models to visually decode,
i.e. create a visual representation of, the search target. Such visual decoding
is challenging for two reasons: 1) the search target only resides in the user's
mind as a subjective visual pattern, and can most often not even be described
verbally by the person, and 2) it is, as of yet, unclear if gaze fixations
contain sufficient information for this task at all. We show, for the first
time, that visual representations of search targets can indeed be decoded only
from human gaze fixations. We propose to first encode fixations into a semantic
representation and then decode this representation into an image. We evaluate
our method on a recent gaze dataset of 14 participants searching for clothing
in image collages and validate the model's predictions using two human studies.
Our results show that 62% (Chance level = 10%) of the time users were able to
select the categories of the decoded image right. In our second studies we show
the importance of a local gaze encoding for decoding visual search targets of
use
Innovating with Artificial Intelligence: Capturing the Constructive Functional Capabilities of Deep Generative Learning
As an emerging species of artificial intelligence, deep generative learning models can generate an unprecedented variety of new outputs. Examples include the creation of music, text-to-image translation, or the imputation of missing data. Similar to other AI models that already evoke significant changes in society and economy, there is a need for structuring the constructive functional capabilities of DGL. To derive and discuss them, we conducted an extensive and structured literature review. Our results reveal a substantial scope of six constructive functional capabilities demonstrating that DGL is not exclusively used to generate unseen outputs. Our paper further guides companies in capturing and evaluating DGL’s potential for innovation. Besides, our paper fosters an understanding of DGL and provides a conceptual basis for further research
Large-scale Foundation Models and Generative AI for BigData Neuroscience
Recent advances in machine learning have made revolutionary breakthroughs in
computer games, image and natural language understanding, and scientific
discovery. Foundation models and large-scale language models (LLMs) have
recently achieved human-like intelligence thanks to BigData. With the help of
self-supervised learning (SSL) and transfer learning, these models may
potentially reshape the landscapes of neuroscience research and make a
significant impact on the future. Here we present a mini-review on recent
advances in foundation models and generative AI models as well as their
applications in neuroscience, including natural language and speech, semantic
memory, brain-machine interfaces (BMIs), and data augmentation. We argue that
this paradigm-shift framework will open new avenues for many neuroscience
research directions and discuss the accompanying challenges and opportunities
Brain2Pix: Fully convolutional naturalistic video reconstruction from brain activity
Reconstructing complex and dynamic visual perception from brain activity remains a major challenge in machine learning applications to neuroscience. Here we present a new method for reconstructing naturalistic images and videos from very large single-participant functional magnetic resonance data that leverages the recent success of image-to-image transformation networks. This is achieved by exploiting spatial information obtained from retinotopic mappings across the visual system. More specifically, we first determine what position each voxel in a particular region of interest would represent in the visual field based on its corresponding receptive field location. Then, the 2D image representation of the brain activity on the visual field is passed to a fully convolutional image-to-image network trained to recover the original stimuli using VGG feature loss with an adversarial regularizer. In our experiments, we show that our method offers a significant improvement over existing video reconstruction technique
- …