27,218 research outputs found

    The Right (Angled) Perspective: Improving the Understanding of Road Scenes Using Boosted Inverse Perspective Mapping

    Full text link
    Many tasks performed by autonomous vehicles such as road marking detection, object tracking, and path planning are simpler in bird's-eye view. Hence, Inverse Perspective Mapping (IPM) is often applied to remove the perspective effect from a vehicle's front-facing camera and to remap its images into a 2D domain, resulting in a top-down view. Unfortunately, however, this leads to unnatural blurring and stretching of objects at further distance, due to the resolution of the camera, limiting applicability. In this paper, we present an adversarial learning approach for generating a significantly improved IPM from a single camera image in real time. The generated bird's-eye-view images contain sharper features (e.g. road markings) and a more homogeneous illumination, while (dynamic) objects are automatically removed from the scene, thus revealing the underlying road layout in an improved fashion. We demonstrate our framework using real-world data from the Oxford RobotCar Dataset and show that scene understanding tasks directly benefit from our boosted IPM approach.Comment: equal contribution of first two authors, 8 full pages, 6 figures, accepted at IV 201

    Speaker-following Video Subtitles

    Full text link
    We propose a new method for improving the presentation of subtitles in video (e.g. TV and movies). With conventional subtitles, the viewer has to constantly look away from the main viewing area to read the subtitles at the bottom of the screen, which disrupts the viewing experience and causes unnecessary eyestrain. Our method places on-screen subtitles next to the respective speakers to allow the viewer to follow the visual content while simultaneously reading the subtitles. We use novel identification algorithms to detect the speakers based on audio and visual information. Then the placement of the subtitles is determined using global optimization. A comprehensive usability study indicated that our subtitle placement method outperformed both conventional fixed-position subtitling and another previous dynamic subtitling method in terms of enhancing the overall viewing experience and reducing eyestrain

    Collocating Interface Objects: Zooming into Maps

    Get PDF
    May, Dean and Barnard [10] used a theoretically based model to argue that objects in a wide range of interfaces should be collocated following screen changes such as a zoom-in to detail. Many existing online maps do not follow this principle, but move a clicked point to the centre of the subsequent display, leaving the user looking at an unrelated location. This paper presents three experiments showing that collocating the point clicked on a map so that the detailed location appears in the place previously occupied by the overview location makes the map easier to use, reducing eye movements and interaction duration. We discuss the benefit of basing design principles on theoretical models so that they can be applied to novel situations, and so designers can infer when to use and not use them

    Investigating the effectiveness of an efficient label placement method using eye movement data

    Get PDF
    This paper focuses on improving the efficiency and effectiveness of dynamic and interactive maps in relation to the user. A label placement method with an improved algorithmic efficiency is presented. Since this algorithm has an influence on the actual placement of the name labels on the map, it is tested if this efficient algorithms also creates more effective maps: how well is the information processed by the user. We tested 30 participants while they were working on a dynamic and interactive map display. Their task was to locate geographical names on each of the presented maps. Their eye movements were registered together with the time at which a given label was found. The gathered data reveal no difference in the user's response times, neither in the number and the duration of the fixations between both map designs. The results of this study show that the efficiency of label placement algorithms can be improved without disturbing the user's cognitive map. Consequently, we created a more efficient map without affecting its effectiveness towards the user

    Individual Differences in the Allocation of Visual Attention during Navigation

    Get PDF
    There are large individual differences in the ability to create an accurate mental representation (i.e., a cognitive map) of a novel environment, yet the factors underlying cognitive map accuracy remain unclear. Given the roles that landmarks and cognitive map accuracy play in successful navigation, the current study examined whether differences in the landmarks that individuals look at while navigating are related to differences in cognitive map accuracy. Participants completed a battery of spatial tests: some that assessed spatial skills prior to a navigation task, and others that tested memory for the environment following exploration of a virtual world. Results indicated that individuals with inaccurate maps had weak perspective-taking abilities, struggled to create shortcuts, and remembered fewer landmarks despite having looked at target buildings and objects in the environment for the same duration as individuals with accurate cognitive maps. These findings suggest that memory capabilities underlie differences in cognitive map accuracy

    Cortical Dynamics of Contextually-Cued Attentive Visual Learning and Search: Spatial and Object Evidence Accumulation

    Full text link
    How do humans use predictive contextual information to facilitate visual search? How are consistently paired scenic objects and positions learned and used to more efficiently guide search in familiar scenes? For example, a certain combination of objects can define a context for a kitchen and trigger a more efficient search for a typical object, such as a sink, in that context. A neural model, ARTSCENE Search, is developed to illustrate the neural mechanisms of such memory-based contextual learning and guidance, and to explain challenging behavioral data on positive/negative, spatial/object, and local/distant global cueing effects during visual search. The model proposes how global scene layout at a first glance rapidly forms a hypothesis about the target location. This hypothesis is then incrementally refined by enhancing target-like objects in space as a scene is scanned with saccadic eye movements. The model clarifies the functional roles of neuroanatomical, neurophysiological, and neuroimaging data in visual search for a desired goal object. In particular, the model simulates the interactive dynamics of spatial and object contextual cueing in the cortical What and Where streams starting from early visual areas through medial temporal lobe to prefrontal cortex. After learning, model dorsolateral prefrontal cortical cells (area 46) prime possible target locations in posterior parietal cortex based on goalmodulated percepts of spatial scene gist represented in parahippocampal cortex, whereas model ventral prefrontal cortical cells (area 47/12) prime possible target object representations in inferior temporal cortex based on the history of viewed objects represented in perirhinal cortex. The model hereby predicts how the cortical What and Where streams cooperate during scene perception, learning, and memory to accumulate evidence over time to drive efficient visual search of familiar scenes.CELEST, an NSF Science of Learning Center (SBE-0354378); SyNAPSE program of Defense Advanced Research Projects Agency (HR0011-09-3-0001, HR0011-09-C-0011

    Tracking Gaze and Visual Focus of Attention of People Involved in Social Interaction

    Get PDF
    The visual focus of attention (VFOA) has been recognized as a prominent conversational cue. We are interested in estimating and tracking the VFOAs associated with multi-party social interactions. We note that in this type of situations the participants either look at each other or at an object of interest; therefore their eyes are not always visible. Consequently both gaze and VFOA estimation cannot be based on eye detection and tracking. We propose a method that exploits the correlation between eye gaze and head movements. Both VFOA and gaze are modeled as latent variables in a Bayesian switching state-space model. The proposed formulation leads to a tractable learning procedure and to an efficient algorithm that simultaneously tracks gaze and visual focus. The method is tested and benchmarked using two publicly available datasets that contain typical multi-party human-robot and human-human interactions.Comment: 15 pages, 8 figures, 6 table
    • …
    corecore