82 research outputs found
How our brains utilize real-world structures to create coherent visual experiences
We live in a structured world, where objects rarely exist in isolation but are often surrounded by similar environments. When objects consistently co-occur with certain objects and scene contexts, our neural systems can implicitly extract and learn such regularities in real-world environments. Predictive processing theories propose that our brains can use learned statistical regularities to predict the structure of incoming sensory input across space and time during visual processing. The predictions may allow us to efficiently recognize objects and understand scenes, thus forming coherent visual experiences in natural vision.
In this dissertation, we conducted three studies to explore how our brains use real-world structures to create coherent visual experiences using neuroimaging techniques (EEG & fMRI) and multivariate pattern analyses (MVPA). Study 1 investigated how scene context affects object processing across time by recording EEG signals while participants viewed semantically consistent or inconsistent objects within scenes. The results reveal that semantically consistent scenes facilitate object representations, but this facilitation is task-dependent rather than automatic. In Study 2, we investigated how cortical feedback mediates the integration of visual information across space by manipulating the spatiotemporal coherence of naturalistic video stimuli shown in both visual hemifields. By analytically combining EEG and fMRI data, we demonstrated that spatial integration of naturalistic visual inputs is mediated by cortical feedback in alpha dynamics that fully traverse the visual hierarchy. In Study 3, we further investigated what level of spatiotemporal coherence is needed to trigger such integration-related alpha dynamics. The findings suggest that integration-related alpha dynamics have some flexibility so that they can accommodate information from videos belonging to the same basic-level category. Together, the dissertation provides multimodal evidence demonstrating that contextual information facilitates object perception and scene integration, highlighting the critical role of predictions related to real-world regularities in constructing coherent visual experiences
Anaphoric distance dependencies in visual narrative structure and processing
Linguistic syntax has often been claimed as uniquely complex due to features like anaphoric relations and distance dependencies. However, visual narratives of sequential images, like those in comics, have been argued to use sequencing mechanisms analogous to those in language. These narrative structures include "refiner" panels that "zoom in" on the contents of another panel. Similar to anaphora in language, refiners indexically connect inexplicit referential information in one unit (refiner, pronoun) to a more informative "antecedent" elsewhere in the discourse. Also like in language, refiners can follow their antecedents (anaphoric) or precede them (cataphoric), along with having either proximal or distant connections. We here explore the constraints on visual narrative refiners created by modulating these features of order and distance. Experiment 1 examined participants' preferences for where refiners are placed in a sequence using a forcechoice test, which revealed that refiners are preferred to follow their antecedents and have proximal distances from them. Experiment 2 then showed that distance dependencies lead to slower self -paced viewing times. Finally, measurements of event -related brain potentials (ERPs) in Experiment 3 revealed that these patterns evoke similar brain responses as referential dependencies in language (i.e., N400, LAN, Nref). Across all three studies, the constraints and (neuro)cognitive responses to refiners parallel those shown to anaphora in language, suggesting domain -general constraints on the sequencing of referential dependencies
An electrophysiological investigation of co-referential processes in visual narrative comprehension
Visual narratives make use of various means to convey referential and co-referential meaning, so comprehenders must recognize that different depictions across sequential images represent the same character(s). In this study, we investigated how the order in which different types of panels in visual sequences are presented affects how the unfolding narrative is comprehended. Participants viewed short comic strips while their electroencephalo- gram (EEG) was recorded. We analyzed evoked and induced EEG activity elicited by both full panels (showing a full character) and refiner panels (showing only a zoom of that full panel), and took into account whether they preceded or followed the panel to which they were co-referentially related (i.e., were cataphoric or anaphoric). We found that full panels elicited both larger N300 amplitude and increased gamma-band power compared to refiner panels. Anaphoric panels elicited a sustained negativity compared to cataphoric panels, which appeared to be sensitive to the referential status of the anaphoric panel. In the time-frequency domain, anaphoric panels elicited reduced 8–12 Hz alpha power and increased 45–65 Hz gamma-band power compared to cataphoric panels. These findings are consistent with models in which the processes involved in visual narrative compre- hension partially overlap with those in language comprehension
Body perception in social environments:the neural basis of body expression perception in social threat, social interaction and self-identity
Zooming in on the cognitive neuroscience of visual narrative.
Visual narratives like comics and films often shift between showing full scenes and close, zoomed-in viewpoints. These zooms are similar to the "spotlight of attention" cast across a visual scene in perception. We here measured ERPs to visual narratives (comic strips) that used zoomed-in and full-scene panels either throughout the whole sequence context or at specific critical panels. Zoomed-in panels were automatically generated on the basis of fixations from prior participants' eye movements to the crucial content of panels (Foulsham & Cohn, 2020). We found that these fixation panels evoked a smaller N300 than full-scenes, indicative of reduced cost for object identification, but that they also evoked a slightly larger amplitude N400 response, suggesting a greater cost for accessing semantic memory with constrained content. Panels in sequences where fixation panels persisted across all positions of the sequence also evoked larger posterior P600s, implying that constrained views required more updating or revision processes throughout the sequence. Altogether, these findings suggest that constraining a visual scene to its crucial parts triggers various processes related not only to the density of its information but also to its integration into a sequential context
Mapping the time-course and content of visual predictions with a novel object-scene associative memory paradigm
In the current thesis, we present a series of three ERP experiments investigating the time-course and nature of contextual facilitation effects in visual object processing. In all three experiments, participants studied novel object-scene pairs in a paired associate memory paradigm. At test, we presented the scene first, followed after a delay by the test object, which either matched or mismatched the scene. We manipulated two key factors. 1) In all three experiments, we manipulated the severity of contextual mismatch between the presented object and the scene, including categorical violations as well as more subtle visual distortions. In this way, we probed the level of detail at which participants were reactivating the contextually-congruent target object in response to the scene. 2) We manipulated the scene preview timing parameters both between subjects (Experiments 2.1 and 3.1) and within subjects (Experiment 3.2). Our rationale for doing this was as follows. Rather than assuming that contextual facilitation effects reflect an entirely predictive or reactive/integrative process, we tested the hypothesis that contextual facilitation was predictive in nature. If the contextual facilitation was entirely integrative (i.e., people waited until the object was presented before relating it to the scene context), we might expect that the amount of scene preview time would not modulate contextual facilitation effects. What we found instead is that allowing for additional scene preview time leads to enhanced contextual facilitation effects, suggesting that participants are using the additional time that they are observing the scene alone (beyond 200 ms, which is sufficient to extract the gist of the scene) to prepare to process the upcoming object and determine whether it matches the scene. We strengthened our findings by testing this both between subjects using only two time points, and within subjects using a parametric gradation of preview times (which also allowed us to test if our findings generalized to cases of temporal uncertainty). We also took advantage of our use of ERPs to examine dependent measures tied to specific stages of cognition. We particularly focus our analysis and discussion on contextual priming of higher-level visual features, examining how contextual congruency modulates amplitude of the N300 component under various conditions and timing constraints. We also present a set of novel visual similarity analyses relying on V1-like features, which allow us to test for context effects on visual object understanding in a component-neutral fashion. Lastly, we present analyses of context effects on other components of the waveform: the N400, as an index of semantic priming, and the LPC, as an index of response-related processing. Overall, our findings are consistent with a predictive account, in which participants use scene information to preactivate features of the upcoming object (including higher-level visual form features, as well as semantic features) in order to facilitate visual object understanding. Future work will further disentangle predictive vs. integrative processing accounts of contextual facilitation effects on visual object processing
Top-down modulation of visual processing and knowledge after 250 ms supports object constancy of category decisions
People categorize objects slowly when visual input is highly impoverished instead of optimal. While bottom-up models may explain a decision with optimal input, perceptual hypothesis testing (PHT) theories implicate top-down processes with impoverished input. Brain mechanisms and the time course of PHT are largely unknown. This event-related potential study used a neuroimaging paradigm that implicated prefrontal cortex in top-down modulation of occipitotemporal cortex. Subjects categorized more impoverished and less impoverished real and pseudo objects. PHT theories predict larger impoverishment effects for real than pseudo objects because top-down processes modulate knowledge only for real objects, but different PHT variants predict different timing. Consistent with parietal-prefrontal PHT variants, around 250 ms, the earliest impoverished real object interaction started on an N3 complex, which reflects interactive cortical activity for object cognition. N3 impoverishment effects localized to both prefrontal and occipitotemporal cortex for real objects only. The N3 also showed knowledge effects by 230 ms that localized to occipitotemporal cortex. Later effects reflected (a) word meaning in temporal cortex during the N400, (b) internal evaluation of prior decision and memory processes and secondary higher-order memory involving anterotemporal parts of a default mode network during posterior positivity (P600), and (c) response related activity in posterior cingulate during an anterior slow wave (SW) after 700 ms. Finally, response activity in supplementary motor area during a posterior SW after 900 ms showed impoverishment effects that correlated with RTs. Convergent evidence from studies of vision, memory, and mental imagery which reflects purely top-down inputs, indicates that the N3 reflects the critical top-down processes of PHT. A hybrid multiple-state interactive, PHT and decision theory best explains the visual constancy of object cognition
Neural mechanisms underlying the influence of sequential predictions on scene gist recognition
Doctor of PhilosophyDepartment of Psychological SciencesLester C. LoschkyRapid scene categorization is typically argued to be a purely feed-forward process. Yet, when navigating in our environment, we usually see predictable sequences of scene categories (e.g., offices followed by hallways, parking lots followed by sidewalks, etc.). Previous work showed that scenes are easier to categorize when they are shown in ecologically valid, predictable sequences compared to when they are shown in randomized sequences (Smith & Loschky, 2019). Given the number of stages involved in constructing a scene representation, we asked a novel research question: when in the time course of scene processing do sequential predictions begin to facilitate scene categorization? We addressed this question by measuring the temporal dynamics of scene categorization with electroencephalography. Participants saw scenes in either spatiotemporally coherent sequences (first-person viewpoint of navigating, from, say, an office to a classroom) or their randomized versions. Participants saw 10 scenes, presented in rapid serial visual presentation (RSVP), on each trial, while we recorded their visually event related potentials (vERPs). They categorized 1 of the 10 scenes from an 8 alternative forced choice (AFC) array of scene category labels. We first compared event related potentials evoked by scenes in coherent and randomized sequences. In a subsequent, more detailed analysis, we constructed scene category decoders based on the temporally resolved neural activity. Using confusion matrices, we tracked how well the pattern of errors from neural decoders explain the behavioral responses over time and compared this ability when scenes were shown in coherent or randomized sequences. We found reduced vERP amplitudes for targets in coherent sequences roughly 150 milliseconds after scene onset, when vERPs first index rapid scene categorization, and during the N400 component, suggesting a reduced semantic integration cost in coherent sequences. Critically, we also found that confusions made by neural decoders and human responses correlate more strongly in coherent sequences, beginning around 100 milliseconds. Taken together, these results suggest that predictions of upcoming scene categories influence even the earliest stages of scene processing, affecting both the extraction of visual properties and meaning
- …
