961,753 research outputs found

    Functional representation of vision within the mind: A visual consciousness model based in 3D default space

    Get PDF
    The human eyes and brain, which have finite boundaries, create a ‘‘virtual’’ space within our central nervous system that interprets and perceives a space that appears boundless and infinite. Using insights from studies on the visual system, we propose a novel fast processing mechanism involving the eyes, visual pathways, and cortex where external vision is imperceptibly processed in our brain in real time creating an internal representation of external space that appears as an external view. We introduce the existence of a three-dimension default space consisting of intrapersonal body space that serves as the framework where visual and non-visual sensory information is sensed and experienced. We propose that the thalamus integrates processed information from corticothalamic feedback loops and fills-in the neural component of 3D default space with an internal visual representation of external space, leading to the experience of visual consciousness. This visual space inherently evades perception so we have introduced three easy clinical tests that can assist in experiencing this visual space. We also review visual neuroanatomical pathways, binocular vision, neurological disorders, and visual phenomenon to elucidate how the representation of external visible space is recreated within the mind

    A Visual Language for Web Querying and Reasoning

    Get PDF
    As XML is increasingly being used to represent information on the Web, query and reasoning languages for such data are needed. This article argues that in contrast to the navigational approach taken in particular by XPath and XQuery, a positional approach as used in the language Xcerpt is better suited for a straightforward visual representation. The constructs of the pattern- and rule-based query language Xcerpt are introduced and it is shown how the visual representation visXcerpt renders these constructs to form a visual query language for XML

    Digital Timeline: German Romanticism

    Get PDF
    This assignment served as a review for a class on German Romanticism. The information on the syllabus was not organized thematically; students were able to review and chronologically organize this information and have a better visual representation of the content they reviewed

    Visual Decoding of Targets During Visual Search From Human Eye Fixations

    Full text link
    What does human gaze reveal about a users' intents and to which extend can these intents be inferred or even visualized? Gaze was proposed as an implicit source of information to predict the target of visual search and, more recently, to predict the object class and attributes of the search target. In this work, we go one step further and investigate the feasibility of combining recent advances in encoding human gaze information using deep convolutional neural networks with the power of generative image models to visually decode, i.e. create a visual representation of, the search target. Such visual decoding is challenging for two reasons: 1) the search target only resides in the user's mind as a subjective visual pattern, and can most often not even be described verbally by the person, and 2) it is, as of yet, unclear if gaze fixations contain sufficient information for this task at all. We show, for the first time, that visual representations of search targets can indeed be decoded only from human gaze fixations. We propose to first encode fixations into a semantic representation and then decode this representation into an image. We evaluate our method on a recent gaze dataset of 14 participants searching for clothing in image collages and validate the model's predictions using two human studies. Our results show that 62% (Chance level = 10%) of the time users were able to select the categories of the decoded image right. In our second studies we show the importance of a local gaze encoding for decoding visual search targets of use

    Multimodal Compact Bilinear Pooling for Visual Question Answering and Visual Grounding

    Full text link
    Modeling textual or visual information with vector representations trained from large language or visual datasets has been successfully explored in recent years. However, tasks such as visual question answering require combining these vector representations with each other. Approaches to multimodal pooling include element-wise product or sum, as well as concatenation of the visual and textual representations. We hypothesize that these methods are not as expressive as an outer product of the visual and textual vectors. As the outer product is typically infeasible due to its high dimensionality, we instead propose utilizing Multimodal Compact Bilinear pooling (MCB) to efficiently and expressively combine multimodal features. We extensively evaluate MCB on the visual question answering and grounding tasks. We consistently show the benefit of MCB over ablations without MCB. For visual question answering, we present an architecture which uses MCB twice, once for predicting attention over spatial features and again to combine the attended representation with the question representation. This model outperforms the state-of-the-art on the Visual7W dataset and the VQA challenge.Comment: Accepted to EMNLP 201
    corecore