2,931 research outputs found
Efficient Pedestrian Detection in Urban Traffic Scenes
Pedestrians are important participants in urban traffic environments, and thus act as an interesting category of objects for autonomous cars. Automatic pedestrian detection is an essential task for protecting pedestrians from collision. In this thesis, we investigate and develop novel approaches by interpreting spatial and temporal characteristics of pedestrians, in three different aspects: shape, cognition and motion. The special up-right human body shape, especially the geometry of the head and shoulder area, is the most discriminative characteristic for pedestrians from other object categories. Inspired by the success of Haar-like features for detecting human faces, which also exhibit a uniform shape structure, we propose to design particular Haar-like features for pedestrians. Tailored to a pre-defined statistical pedestrian shape model, Haar-like templates with multiple modalities are designed to describe local difference of the shape structure. Cognition theories aim to explain how human visual systems process input visual signals in an accurate and fast way. By emulating the center-surround mechanism in human visual systems, we design multi-channel, multi-direction and multi-scale contrast features, and boost them to respond to the appearance of pedestrians. In this way, our detector is considered as a top-down saliency system. In the last part of this thesis, we exploit the temporal characteristics for moving pedestrians and then employ motion information for feature design, as well as for regions of interest (ROIs) selection. Motion segmentation on optical flow fields enables us to select those blobs most probably containing moving pedestrians; a combination of Histogram of Oriented Gradients (HOG) and motion self difference features further enables robust detection. We test our three approaches on image and video data captured in urban traffic scenes, which are rather challenging due to dynamic and complex backgrounds. The achieved results demonstrate that our approaches reach and surpass state-of-the-art performance, and can also be employed for other applications, such as indoor robotics or public surveillance. In this thesis, we investigate and develop novel approaches by interpreting spatial and temporal characteristics of pedestrians, in three different aspects: shape, cognition and motion. The special up-right human body shape, especially the geometry of the head and shoulder area, is the most discriminative characteristic for pedestrians from other object categories. Inspired by the success of Haar-like features for detecting human faces, which also exhibit a uniform shape structure, we propose to design particular Haar-like features for pedestrians. Tailored to a pre-defined statistical pedestrian shape model, Haar-like templates with multiple modalities are designed to describe local difference of the shape structure. Cognition theories aim to explain how human visual systems process input visual signals in an accurate and fast way. By emulating the center-surround mechanism in human visual systems, we design multi-channel, multi-direction and multi-scale contrast features, and boost them to respond to the appearance of pedestrians. In this way, our detector is considered as a top-down saliency system. In the last part of this thesis, we exploit the temporal characteristics for moving pedestrians and then employ motion information for feature design, as well as for regions of interest (ROIs) selection. Motion segmentation on optical flow fields enables us to select those blobs most probably containing moving pedestrians; a combination of Histogram of Oriented Gradients (HOG) and motion self difference features further enables robust detection. We test our three approaches on image and video data captured in urban traffic scenes, which are rather challenging due to dynamic and complex backgrounds. The achieved results demonstrate that our approaches reach and surpass state-of-the-art performance, and can also be employed for other applications, such as indoor robotics or public surveillance
Mono Video-Based AI Corridor for Model-Free Detection of Collision-Relevant Obstacles
The detection of previously unseen, unexpected obstacles on the road is a
major challenge for automated driving systems. Different from the detection of
ordinary objects with pre-definable classes, detecting unexpected obstacles on
the road cannot be resolved by upscaling the sensor technology alone (e.g.,
high resolution video imagers / radar antennas, denser LiDAR scan lines). This
is due to the fact, that there is a wide variety in the types of unexpected
obstacles that also do not share a common appearance (e.g., lost cargo as a
suitcase or bicycle, tire fragments, a tree stem). Also adding object classes
or adding \enquote{all} of these objects to a common \enquote{unexpected
obstacle} class does not scale. In this contribution, we study the feasibility
of using a deep learning video-based lane corridor (called \enquote{AI
ego-corridor}) to ease the challenge by inverting the problem: Instead of
detecting a previously unseen object, the AI ego-corridor detects that the
ego-lane ahead ends. A smart ground-truth definition enables an easy
feature-based classification of an abrupt end of the ego-lane. We propose two
neural network designs and research among other things the potential of
training with synthetic data. We evaluate our approach on a test vehicle
platform. It is shown that the approach is able to detect numerous previously
unseen obstacles at a distance of up to 300 m with a detection rate of 95 %
Semantic Object Prediction and Spatial Sound Super-Resolution with Binaural Sounds
Humans can robustly recognize and localize objects by integrating visual and
auditory cues. While machines are able to do the same now with images, less
work has been done with sounds. This work develops an approach for dense
semantic labelling of sound-making objects, purely based on binaural sounds. We
propose a novel sensor setup and record a new audio-visual dataset of street
scenes with eight professional binaural microphones and a 360 degree camera.
The co-existence of visual and audio cues is leveraged for supervision
transfer. In particular, we employ a cross-modal distillation framework that
consists of a vision `teacher' method and a sound `student' method -- the
student method is trained to generate the same results as the teacher method.
This way, the auditory system can be trained without using human annotations.
We also propose two auxiliary tasks namely, a) a novel task on Spatial Sound
Super-resolution to increase the spatial resolution of sounds, and b) dense
depth prediction of the scene. We then formulate the three tasks into one
end-to-end trainable multi-tasking network aiming to boost the overall
performance. Experimental results on the dataset show that 1) our method
achieves promising results for semantic prediction and the two auxiliary tasks;
and 2) the three tasks are mutually beneficial -- training them together
achieves the best performance and 3) the number and orientations of microphones
are both important. The data and code will be released to facilitate the
research in this new direction.Comment: Project page:
https://www.trace.ethz.ch/publications/2020/sound_perception/index.htm
Milestones in Autonomous Driving and Intelligent Vehicles Part II: Perception and Planning
Growing interest in autonomous driving (AD) and intelligent vehicles (IVs) is
fueled by their promise for enhanced safety, efficiency, and economic benefits.
While previous surveys have captured progress in this field, a comprehensive
and forward-looking summary is needed. Our work fills this gap through three
distinct articles. The first part, a "Survey of Surveys" (SoS), outlines the
history, surveys, ethics, and future directions of AD and IV technologies. The
second part, "Milestones in Autonomous Driving and Intelligent Vehicles Part I:
Control, Computing System Design, Communication, HD Map, Testing, and Human
Behaviors" delves into the development of control, computing system,
communication, HD map, testing, and human behaviors in IVs. This part, the
third part, reviews perception and planning in the context of IVs. Aiming to
provide a comprehensive overview of the latest advancements in AD and IVs, this
work caters to both newcomers and seasoned researchers. By integrating the SoS
and Part I, we offer unique insights and strive to serve as a bridge between
past achievements and future possibilities in this dynamic field.Comment: 17pages, 6figures. IEEE Transactions on Systems, Man, and
Cybernetics: System
- …