258 research outputs found
Visual selective behavior can be triggered by a feed-forward process
The ventral visual pathway implements object recognition and categorization in a hierarchy of processing areas with neuronal selectivities of increasing complexity. The presence of massive feedback connections within this hierarchy raises the possibility that normal visual processing relies on the use of computational loops. It is not known, however, whether object recognition can be performed at all without such loops (i.e., in a purely feed-forward mode). By analyzing the time course of reaction times in a masked natural scene categorization paradigm, we show that the human visual system can generate selective motor responses based on a single feed-forward pass. We confirm these results using a more constrained letter discrimination task, in which the rapid succession of a target and mask is actually perceived as a distractor. We show that a masked stimulus presented for only 26 msecâand often not consciously perceivedâcan fully determine the earliest selective motor responses: The neural representations of the stimulus and mask are thus kept separated during a short period corresponding to the feed-forward "sweep." Therefore, feedback loops do not appear to be "mandatory" for visual processing. Rather, we found that such loops allow the masked stimulus to reverberate in the visual system and affect behavior for nearly 150 msec after the feed-forward sweep
Competition and selection during visual processing of natural scenes and objects
When a visual scene, containing many discrete objects, is presented to our retinae, only a subset of these objects will be explicitly represented in visual awareness. The number of objects accessing short-term visual memory might be even smaller. Finally, it is not known to what extent âignoredâ objects (those that do not enter visual awareness) will be processed âor recognized. By combining free recall, forced-choice recognition and visual priming paradigms for the same natural visual scenes and subjects, we were able to estimate these numbers, and provide insights as to the fate of objects that are not explicitly recognized in a single fixation. When presented for 250 ms with a scene containing 10 distinct objects, human observers can remember up to 4 objects with full confidence, and between 2 and 3 more when forced to guess. Importantly, the objects that the subjects consistently failed to report elicited a significant negative priming effect when presented in a subsequent task, suggesting that their identity was represented in high-level cortical areas of the visual system, before the corresponding neural activity was suppressed during attentional selection. These results shed light on neural mechanisms of attentional competition, and representational capacity at different levels of the human visual system
Spacing affects some but not all visual searches: Implications for theories of attention and crowding
We investigated the effect of varying interstimulus spacing on an upright among inverted face search and a redâgreen among greenâred bisected disk search. Both tasks are classic examples of serial search; however, spacing affects them very differently: As spacing increased, face discrimination performance improved significantly, whereas performance on the bisected disks remained poor. (No effect of spacing was observed for either a red among green or an L among + search tasks, two classic examples of parallel search.) In a second experiment, we precued the target location so that attention was no longer a limiting factor: Both serial search tasks were now equally affected by spacing, a result we attribute to a more classical form of crowding. The observed spacing effect in visual search suggests that for certain tasks, serial search may result from local neuronal competition between target and distractors, soliciting attentional resources; in other cases, serial search must occur for another reason, for example, because an item-by-item, attention-mediated recognition must take place. We speculate that this distinction may be based on whether or not there exist neuronal populations tuned to the relevant targetâdistractor distinction, and we discuss the possible relations between this spacing effect in visual search and other forms of crowding
Visual Attention: A Rhythmic Process?
SummaryVision involves constant exploration of the environment by eye movements. Recent evidence suggests that a rhythmic form of exploration also occurs under covert attention, in the absence of eye movements. Sustained attention naturally fluctuates, with a periodicity in the theta (4â8 Hz) frequency range
The phase of ongoing EEG oscillations predicts visual perception
Oscillations are ubiquitous in electrical recordings of brain activity. While the amplitude of ongoing oscillatory activity is known to
correlate with various aspects of perception, the influence of oscillatory phase on perception remains unknown. In particular, since phase varies on a much faster timescale than the more sluggish amplitude fluctuations, phase effects could reveal the fine-grained neural mechanisms underlying perception. We presented brief flashes of light at the individual luminance threshold while EEG was recorded.
Although the stimulus on each trial was identical, subjects detected approximately half of the flashes (hits) and entirely missed the other
half (misses). Phase distributions across trials were compared between hits and misses. We found that shortly before stimulus onset, each of the two distributions exhibited significant phase concentration, but at different phase angles. This effect was strongest in the theta and alpha frequency bands. In this timeâfrequency range, oscillatory phase accounted for at least 16% of variability in detection performance and allowed the prediction of performance on the single-trial level. This finding indicates that the visual detection threshold fluctuates over time along with the phase of ongoing EEG activity. The results support the notion that ongoing oscillations shape our perception, possibly by providing a temporal reference frame for neural codes that rely on precise spike timing
Attentional selection of noncontiguous locations: The spotlight is only transiently âsplit"
It is still a matter of debate whether observers can attend simultaneously to more than one location. Using essentially the same paradigm as was used previously by N. P. Bichot, K. R. Cave, and H. Pashler (1999), we demonstrate that their finding of an attentional âsplitâ between separate target locations only reflects the early phase of attentional selection. Our subjects were asked to compare the shapes (circle or square) of 2 oddly colored targets within an array of 8 stimuli. After a varying stimulus onset asynchrony (SOA), 8 letters were flashed at the previous stimulus locations, followed by a mask. For a given SOA, the performance of subjects at reporting letters in each location was taken to reflect the distribution of spatial attention. In particular, by considering the proportion of trials in which none or both of the target letters were reported, we were able to infer the respective amount of attention allocated to each target without knowing, on a trial-by-trial basis which location (if any) was receiving the most attentional resources. Our results show that for SOAs under 100â150 ms, attention can be equally split between the two targets, a conclusion compatible with previous reports. However, with longer SOAs, this attentional division can no longer be sustained and attention ultimately settles at the location of one single stimulus
The Continuous Wagon Wheel Illusion Is Associated with Changes in Electroencephalogram Power at ~13 Hz
Continuously moving objects sometimes appear to spontaneously reverse their motion direction. The mechanisms underlying this bistable phenomenon (the âcontinuous wagon wheel illusionâ) are heavily debated, but one interpretation suggests that motion information is perceived in discrete episodes at a rate between 10 and 15 Hz. Here, we asked observers to report the perceived direction of a continuously rotating wheel while 32-channel electroencephalogram (EEG) was recorded. We then separated periods of perceived true from illusory (reversed) motion and compared the EEG power spectrum under these two perceptually distinct yet physically identical conditions. The only reliable difference was observed âŒ13 Hz over centroparietal electrodes, independent of the temporal frequency of the wheel. Thus, it is likely to reflect internal processes rather than purely stimulus-driven activity. EEG power (13 Hz) decreased before the onset of illusory motion and increased before transitions back to real motion. Using this relationship, it was possible to predict above chance, on a trial-by-trial basis, the direction of the upcoming perceptual transition. These data are compatible with the idea that motion perception occurs in snapshots <100 ms in duration
Transcranial Magnetic Stimulation Reveals Attentional Feedback to Area V1 during Serial Visual Search
Visual search tasks have been used to understand how, where and when attention influences visual processing. Current theories suggest the involvement of a high-level âsaliency mapâ that selects a candidate location to focus attentional resources. For a parallel (or âpop-outâ) task, the first chosen location is systematically the target, but for a serial (or âdifficultâ) task, the system may cycle on a few distractors before finally focusing on the target. This implies that attentional effects upon early visual areas, involving feedback from higher areas, should be visible at longer latencies during serial search. A previous study from Juan & Walsh (2003) had used Transcranial Magnetic Stimulation (TMS) to support this conclusion; however, only a few post-stimulus delays were compared, and no control TMS location was used. Here we applied TMS double-pulses (sub-threshold) to induce a transient inhibition of area V1 at every post-stimulus delay between 100 ms and 500 ms (50 ms steps). The search array was presented either at the location affected by the TMS pulses (previously identified by applying several pulses at supra-threshold intensity to induce phosphene perception), or in the opposite hemifield, which served as a retinotopically-defined control location. Two search tasks were used: a parallel (+ among Ls) and a serial one (T among Ls). TMS specifically impaired the serial, but not the parallel search. We highlight an involvement of V1 in serial search 300 ms after the onset; conversely, V1 did not contribute to parallel search at delays beyond 100 ms. This study supports the idea that serial search differs from parallel search by the presence of additional cycles of a select-and-focus iterative loop between V1 and higher-level areas
Semi-supervised Multimodal Representation Learning through a Global Workspace
Recent deep learning models can efficiently combine inputs from different
modalities (e.g., images and text) and learn to align their latent
representations, or to translate signals from one domain to another (as in
image captioning, or text-to-image generation). However, current approaches
mainly rely on brute-force supervised training over large multimodal datasets.
In contrast, humans (and other animals) can learn useful multimodal
representations from only sparse experience with matched cross-modal data. Here
we evaluate the capabilities of a neural network architecture inspired by the
cognitive notion of a "Global Workspace": a shared representation for two (or
more) input modalities. Each modality is processed by a specialized system
(pretrained on unimodal data, and subsequently frozen). The corresponding
latent representations are then encoded to and decoded from a single shared
workspace. Importantly, this architecture is amenable to self-supervised
training via cycle-consistency: encoding-decoding sequences should approximate
the identity function. For various pairings of vision-language modalities and
across two datasets of varying complexity, we show that such an architecture
can be trained to align and translate between two modalities with very little
need for matched data (from 4 to 7 times less than a fully supervised
approach). The global workspace representation can be used advantageously for
downstream classification tasks and for robust transfer learning. Ablation
studies reveal that both the shared workspace and the self-supervised
cycle-consistency training are critical to the system's performance.Comment: Under revie
- âŠ