8,799 research outputs found
Towards End-to-End Lane Detection: an Instance Segmentation Approach
Modern cars are incorporating an increasing number of driver assist features,
among which automatic lane keeping. The latter allows the car to properly
position itself within the road lanes, which is also crucial for any subsequent
lane departure or trajectory planning decision in fully autonomous cars.
Traditional lane detection methods rely on a combination of highly-specialized,
hand-crafted features and heuristics, usually followed by post-processing
techniques, that are computationally expensive and prone to scalability due to
road scene variations. More recent approaches leverage deep learning models,
trained for pixel-wise lane segmentation, even when no markings are present in
the image due to their big receptive field. Despite their advantages, these
methods are limited to detecting a pre-defined, fixed number of lanes, e.g.
ego-lanes, and can not cope with lane changes. In this paper, we go beyond the
aforementioned limitations and propose to cast the lane detection problem as an
instance segmentation problem - in which each lane forms its own instance -
that can be trained end-to-end. To parametrize the segmented lane instances
before fitting the lane, we further propose to apply a learned perspective
transformation, conditioned on the image, in contrast to a fixed "bird's-eye
view" transformation. By doing so, we ensure a lane fitting which is robust
against road plane changes, unlike existing approaches that rely on a fixed,
pre-defined transformation. In summary, we propose a fast lane detection
algorithm, running at 50 fps, which can handle a variable number of lanes and
cope with lane changes. We verify our method on the tuSimple dataset and
achieve competitive results
Digging Deeper into Egocentric Gaze Prediction
This paper digs deeper into factors that influence egocentric gaze. Instead
of training deep models for this purpose in a blind manner, we propose to
inspect factors that contribute to gaze guidance during daily tasks. Bottom-up
saliency and optical flow are assessed versus strong spatial prior baselines.
Task-specific cues such as vanishing point, manipulation point, and hand
regions are analyzed as representatives of top-down information. We also look
into the contribution of these factors by investigating a simple recurrent
neural model for ego-centric gaze prediction. First, deep features are
extracted for all input video frames. Then, a gated recurrent unit is employed
to integrate information over time and to predict the next fixation. We also
propose an integrated model that combines the recurrent model with several
top-down and bottom-up cues. Extensive experiments over multiple datasets
reveal that (1) spatial biases are strong in egocentric videos, (2) bottom-up
saliency models perform poorly in predicting gaze and underperform spatial
biases, (3) deep features perform better compared to traditional features, (4)
as opposed to hand regions, the manipulation point is a strong influential cue
for gaze prediction, (5) combining the proposed recurrent model with bottom-up
cues, vanishing points and, in particular, manipulation point results in the
best gaze prediction accuracy over egocentric videos, (6) the knowledge
transfer works best for cases where the tasks or sequences are similar, and (7)
task and activity recognition can benefit from gaze prediction. Our findings
suggest that (1) there should be more emphasis on hand-object interaction and
(2) the egocentric vision community should consider larger datasets including
diverse stimuli and more subjects.Comment: presented at WACV 201
- …