27,218 research outputs found
The Right (Angled) Perspective: Improving the Understanding of Road Scenes Using Boosted Inverse Perspective Mapping
Many tasks performed by autonomous vehicles such as road marking detection,
object tracking, and path planning are simpler in bird's-eye view. Hence,
Inverse Perspective Mapping (IPM) is often applied to remove the perspective
effect from a vehicle's front-facing camera and to remap its images into a 2D
domain, resulting in a top-down view. Unfortunately, however, this leads to
unnatural blurring and stretching of objects at further distance, due to the
resolution of the camera, limiting applicability. In this paper, we present an
adversarial learning approach for generating a significantly improved IPM from
a single camera image in real time. The generated bird's-eye-view images
contain sharper features (e.g. road markings) and a more homogeneous
illumination, while (dynamic) objects are automatically removed from the scene,
thus revealing the underlying road layout in an improved fashion. We
demonstrate our framework using real-world data from the Oxford RobotCar
Dataset and show that scene understanding tasks directly benefit from our
boosted IPM approach.Comment: equal contribution of first two authors, 8 full pages, 6 figures,
accepted at IV 201
Speaker-following Video Subtitles
We propose a new method for improving the presentation of subtitles in video
(e.g. TV and movies). With conventional subtitles, the viewer has to constantly
look away from the main viewing area to read the subtitles at the bottom of the
screen, which disrupts the viewing experience and causes unnecessary eyestrain.
Our method places on-screen subtitles next to the respective speakers to allow
the viewer to follow the visual content while simultaneously reading the
subtitles. We use novel identification algorithms to detect the speakers based
on audio and visual information. Then the placement of the subtitles is
determined using global optimization. A comprehensive usability study indicated
that our subtitle placement method outperformed both conventional
fixed-position subtitling and another previous dynamic subtitling method in
terms of enhancing the overall viewing experience and reducing eyestrain
Collocating Interface Objects: Zooming into Maps
May, Dean and Barnard [10] used a theoretically based model to argue that objects in a wide range of interfaces should be collocated following screen changes such as a zoom-in to detail. Many existing online maps do not follow this principle, but move a clicked point to the centre of the subsequent display, leaving the user looking at an unrelated location. This paper presents three experiments showing that collocating the point clicked on a map so that the detailed location appears in the place previously occupied by the overview location makes the map easier to use, reducing eye movements and interaction duration. We discuss the benefit of basing design principles on theoretical models so that they can be applied to novel situations, and so designers can infer when to use and not use them
Investigating the effectiveness of an efficient label placement method using eye movement data
This paper focuses on improving the efficiency and effectiveness of dynamic and interactive maps in relation to the user. A label placement method with an improved algorithmic efficiency is presented. Since this algorithm has an influence on the actual placement of the name labels on the map, it is tested if this efficient algorithms also creates more effective maps: how well is the information processed by the user. We tested 30 participants while they were working on a dynamic and interactive map display. Their task was to locate geographical names on each of the presented maps. Their eye movements were registered together with the time at which a given label was found. The gathered data reveal no difference in the user's response times, neither in the number and the duration of the fixations between both map designs. The results of this study show that the efficiency of label placement algorithms can be improved without disturbing the user's cognitive map. Consequently, we created a more efficient map without affecting its effectiveness towards the user
Individual Differences in the Allocation of Visual Attention during Navigation
There are large individual differences in the ability to create an accurate mental representation (i.e., a cognitive map) of a novel environment, yet the factors underlying cognitive map accuracy remain unclear. Given the roles that landmarks and cognitive map accuracy play in successful navigation, the current study examined whether differences in the landmarks that individuals look at while navigating are related to differences in cognitive map accuracy. Participants completed a battery of spatial tests: some that assessed spatial skills prior to a navigation task, and others that tested memory for the environment following exploration of a virtual world. Results indicated that individuals with inaccurate maps had weak perspective-taking abilities, struggled to create shortcuts, and remembered fewer landmarks despite having looked at target buildings and objects in the environment for the same duration as individuals with accurate cognitive maps. These findings suggest that memory capabilities underlie differences in cognitive map accuracy
Cortical Dynamics of Contextually-Cued Attentive Visual Learning and Search: Spatial and Object Evidence Accumulation
How do humans use predictive contextual information to facilitate visual search? How are consistently paired scenic objects and positions learned and used to more efficiently guide search in familiar scenes? For example, a certain combination of objects can define a context for a kitchen and trigger a more efficient search for a typical object, such as a sink, in that context. A neural model, ARTSCENE Search, is developed to illustrate the neural mechanisms of such memory-based contextual learning and guidance, and to explain challenging behavioral data on positive/negative, spatial/object, and local/distant global cueing effects during visual search. The model proposes how global scene layout at a first glance rapidly forms a hypothesis about the target location. This hypothesis is then incrementally refined by enhancing target-like objects in space as a scene is scanned with saccadic eye movements. The model clarifies the functional roles of neuroanatomical, neurophysiological, and neuroimaging data in visual search for a desired goal object. In particular, the model simulates the interactive dynamics of spatial and object contextual cueing in the cortical What and Where streams starting from early visual areas through medial temporal lobe to prefrontal cortex. After learning, model dorsolateral prefrontal cortical cells (area 46) prime possible target locations in posterior parietal cortex based on goalmodulated percepts of spatial scene gist represented in parahippocampal cortex, whereas model ventral prefrontal cortical cells (area 47/12) prime possible target object representations in inferior temporal cortex based on the history of viewed objects represented in perirhinal cortex. The model hereby predicts how the cortical What and Where streams cooperate during scene perception, learning, and memory to accumulate evidence over time to drive efficient visual search of familiar scenes.CELEST, an NSF Science of Learning Center (SBE-0354378); SyNAPSE program of Defense Advanced Research Projects Agency (HR0011-09-3-0001, HR0011-09-C-0011
Tracking Gaze and Visual Focus of Attention of People Involved in Social Interaction
The visual focus of attention (VFOA) has been recognized as a prominent
conversational cue. We are interested in estimating and tracking the VFOAs
associated with multi-party social interactions. We note that in this type of
situations the participants either look at each other or at an object of
interest; therefore their eyes are not always visible. Consequently both gaze
and VFOA estimation cannot be based on eye detection and tracking. We propose a
method that exploits the correlation between eye gaze and head movements. Both
VFOA and gaze are modeled as latent variables in a Bayesian switching
state-space model. The proposed formulation leads to a tractable learning
procedure and to an efficient algorithm that simultaneously tracks gaze and
visual focus. The method is tested and benchmarked using two publicly available
datasets that contain typical multi-party human-robot and human-human
interactions.Comment: 15 pages, 8 figures, 6 table
- …