Search CORE

27,218 research outputs found

The Right (Angled) Perspective: Improving the Understanding of Road Scenes Using Boosted Inverse Perspective Mapping

Author: Bruls Tom
Kunze Lars
Newman Paul
Porav Horia
Publication venue
Publication date: 01/01/2019
Field of study

Many tasks performed by autonomous vehicles such as road marking detection, object tracking, and path planning are simpler in bird's-eye view. Hence, Inverse Perspective Mapping (IPM) is often applied to remove the perspective effect from a vehicle's front-facing camera and to remap its images into a 2D domain, resulting in a top-down view. Unfortunately, however, this leads to unnatural blurring and stretching of objects at further distance, due to the resolution of the camera, limiting applicability. In this paper, we present an adversarial learning approach for generating a significantly improved IPM from a single camera image in real time. The generated bird's-eye-view images contain sharper features (e.g. road markings) and a more homogeneous illumination, while (dynamic) objects are automatically removed from the scene, thus revealing the underlying road layout in an improved fashion. We demonstrate our framework using real-world data from the Oxford RobotCar Dataset and show that scene understanding tasks directly benefit from our boosted IPM approach.Comment: equal contribution of first two authors, 8 full pages, 6 figures, accepted at IV 201

arXiv.org e-Print Archive

Crossref

Oxford University Research Archive

Speaker-following Video Subtitles

Author: Hu Yongtao
Kautz Jan
Wang Wenping
Yu Yizhou
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2014
Field of study

We propose a new method for improving the presentation of subtitles in video (e.g. TV and movies). With conventional subtitles, the viewer has to constantly look away from the main viewing area to read the subtitles at the bottom of the screen, which disrupts the viewing experience and causes unnecessary eyestrain. Our method places on-screen subtitles next to the respective speakers to allow the viewer to follow the visual content while simultaneously reading the subtitles. We use novel identification algorithms to detect the speakers based on audio and visual information. Then the placement of the subtitles is determined using global optimization. A comprehensive usability study indicated that our subtitle placement method outperformed both conventional fixed-position subtitling and another previous dynamic subtitling method in terms of enhancing the overall viewing experience and reducing eyestrain

arXiv.org e-Print Archive

HKU Scholars Hub

Collocating Interface Objects: Zooming into Maps

Author: Barnard P.J.
May J.
You M.
Zelinksy G. J
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2014
Field of study

May, Dean and Barnard [10] used a theoretically based model to argue that objects in a wide range of interfaces should be collocated following screen changes such as a zoom-in to detail. Many existing online maps do not follow this principle, but move a clicked point to the centre of the subsequent display, leaving the user looking at an unrelated location. This paper presents three experiments showing that collocating the point clicked on a map so that the detailed location appears in the place previously occupied by the overview location makes the map easier to use, reducing eye movements and interaction duration. We discuss the benefit of basing design principles on theoretical models so that they can be applied to novel situations, and so designers can infer when to use and not use them

OPUS

Crossref

Winchester Research Repository

University of Surrey

Plymouth Electronic Archive and Research Library

Investigating the effectiveness of an efficient label placement method using eye movement data

Author: De Maeyer Philippe
Fack Veerle
Ooms Kristien
Van Assche Eva
Witlox Frank
Publication venue: 'Maney Publishing'
Publication date: 01/01/2012
Field of study

This paper focuses on improving the efficiency and effectiveness of dynamic and interactive maps in relation to the user. A label placement method with an improved algorithmic efficiency is presented. Since this algorithm has an influence on the actual placement of the name labels on the map, it is tested if this efficient algorithms also creates more effective maps: how well is the information processed by the user. We tested 30 participants while they were working on a dynamic and interactive map display. Their task was to locate geographical names on each of the presented maps. Their eye movements were registered together with the time at which a given label was found. The gathered data reveal no difference in the user's response times, neither in the number and the duration of the fixations between both map designs. The results of this study show that the efficiency of label placement algorithms can be improved without disturbing the user's cognitive map. Consequently, we created a more efficient map without affecting its effectiveness towards the user

Ghent University Academic Bibliography

Institutional Repository Universiteit Antwerpen

Individual Differences in the Allocation of Visual Attention during Navigation

Author: Keller Mikayla
Publication venue: Scholarship@Western
Publication date: 24/07/2017
Field of study

There are large individual differences in the ability to create an accurate mental representation (i.e., a cognitive map) of a novel environment, yet the factors underlying cognitive map accuracy remain unclear. Given the roles that landmarks and cognitive map accuracy play in successful navigation, the current study examined whether differences in the landmarks that individuals look at while navigating are related to differences in cognitive map accuracy. Participants completed a battery of spatial tests: some that assessed spatial skills prior to a navigation task, and others that tested memory for the environment following exploration of a virtual world. Results indicated that individuals with inaccurate maps had weak perspective-taking abilities, struggled to create shortcuts, and remembered fewer landmarks despite having looked at target buildings and objects in the environment for the same duration as individuals with accurate cognitive maps. These findings suggest that memory capabilities underlie differences in cognitive map accuracy

Scholarship@Western

Cortical Dynamics of Contextually-Cued Attentive Visual Learning and Search: Spatial and Object Evidence Accumulation

Author: Grossberg Stephen
Huang Tsung-Ren
Publication venue: Boston University Center for Adaptive Systems and Department of Cognitive and Neural Systems
Publication date: 12/08/2009
Field of study

How do humans use predictive contextual information to facilitate visual search? How are consistently paired scenic objects and positions learned and used to more efficiently guide search in familiar scenes? For example, a certain combination of objects can define a context for a kitchen and trigger a more efficient search for a typical object, such as a sink, in that context. A neural model, ARTSCENE Search, is developed to illustrate the neural mechanisms of such memory-based contextual learning and guidance, and to explain challenging behavioral data on positive/negative, spatial/object, and local/distant global cueing effects during visual search. The model proposes how global scene layout at a first glance rapidly forms a hypothesis about the target location. This hypothesis is then incrementally refined by enhancing target-like objects in space as a scene is scanned with saccadic eye movements. The model clarifies the functional roles of neuroanatomical, neurophysiological, and neuroimaging data in visual search for a desired goal object. In particular, the model simulates the interactive dynamics of spatial and object contextual cueing in the cortical What and Where streams starting from early visual areas through medial temporal lobe to prefrontal cortex. After learning, model dorsolateral prefrontal cortical cells (area 46) prime possible target locations in posterior parietal cortex based on goalmodulated percepts of spatial scene gist represented in parahippocampal cortex, whereas model ventral prefrontal cortical cells (area 47/12) prime possible target object representations in inferior temporal cortex based on the history of viewed objects represented in perirhinal cortex. The model hereby predicts how the cortical What and Where streams cooperate during scene perception, learning, and memory to accumulate evidence over time to drive efficient visual search of familiar scenes.CELEST, an NSF Science of Learning Center (SBE-0354378); SyNAPSE program of Defense Advanced Research Projects Agency (HR0011-09-3-0001, HR0011-09-C-0011

CiteSeerX

Crossref

Boston University Institutional Repository (OpenBU)

Tracking Gaze and Visual Focus of Attention of People Involved in Social Interaction

Author: Ba Silèye
Horaud Radu
Massé Benoît
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 21/11/2017
Field of study

The visual focus of attention (VFOA) has been recognized as a prominent conversational cue. We are interested in estimating and tracking the VFOAs associated with multi-party social interactions. We note that in this type of situations the participants either look at each other or at an object of interest; therefore their eyes are not always visible. Consequently both gaze and VFOA estimation cannot be based on eye detection and tracking. We propose a method that exploits the correlation between eye gaze and head movements. Both VFOA and gaze are modeled as latent variables in a Bayesian switching state-space model. The proposed formulation leads to a tractable learning procedure and to an efficient algorithm that simultaneously tracks gaze and visual focus. The method is tested and benchmarked using two publicly available datasets that contain typical multi-party human-robot and human-human interactions.Comment: 15 pages, 8 figures, 6 table

arXiv.org e-Print Archive

Crossref

Hal - Université Grenoble Alpes

INRIA a CCSD electronic archive server

HAL-Rennes 1