59 research outputs found
Combining LiDAR Space Clustering and Convolutional Neural Networks for Pedestrian Detection
Pedestrian detection is an important component for safety of autonomous
vehicles, as well as for traffic and street surveillance. There are extensive
benchmarks on this topic and it has been shown to be a challenging problem when
applied on real use-case scenarios. In purely image-based pedestrian detection
approaches, the state-of-the-art results have been achieved with convolutional
neural networks (CNN) and surprisingly few detection frameworks have been built
upon multi-cue approaches. In this work, we develop a new pedestrian detector
for autonomous vehicles that exploits LiDAR data, in addition to visual
information. In the proposed approach, LiDAR data is utilized to generate
region proposals by processing the three dimensional point cloud that it
provides. These candidate regions are then further processed by a
state-of-the-art CNN classifier that we have fine-tuned for pedestrian
detection. We have extensively evaluated the proposed detection process on the
KITTI dataset. The experimental results show that the proposed LiDAR space
clustering approach provides a very efficient way of generating region
proposals leading to higher recall rates and fewer misses for pedestrian
detection. This indicates that LiDAR data can provide auxiliary information for
CNN-based approaches
Emotion Recognition for Intelligent Tutoring
Abstract. Individual teaching has been considered as the most successful educational form since ancient times. This form still continues its existence nowadays within intelligent systems intended to provide adapted tutoring for each student. Although, recent research has shown that emotions can affect student's learning, adaptation skills of tutoring systems are still imperfect due to weak emotional intelligence. To enhance ongoing research related to the improvement of the tutoring adaptation based on both student's knowledge and emotional state, the paper presents an analysis of emotion recognition methods used in recent developments. Study reveals that sensor-lite approach can serve as a solution to problems related to emotion identification accuracy. To provide ground-truth data for emotional state, we have explored and implemented a selfassessment method
Combining Multiple Views for Visual Speech Recognition
Visual speech recognition is a challenging research problem with a particular
practical application of aiding audio speech recognition in noisy scenarios.
Multiple camera setups can be beneficial for the visual speech recognition
systems in terms of improved performance and robustness. In this paper, we
explore this aspect and provide a comprehensive study on combining multiple
views for visual speech recognition. The thorough analysis covers fusion of all
possible view angle combinations both at feature level and decision level. The
employed visual speech recognition system in this study extracts features
through a PCA-based convolutional neural network, followed by an LSTM network.
Finally, these features are processed in a tandem system, being fed into a
GMM-HMM scheme. The decision fusion acts after this point by combining the
Viterbi path log-likelihoods. The results show that the complementary
information contained in recordings from different view angles improves the
results significantly. For example, the sentence correctness on the test set is
increased from 76% for the highest performing single view () to up to
83% when combining this view with the frontal and view angles
Meet-in-the-middle: Multi-scale upsampling and matching for cross-resolution face recognition
In this paper, we aim to address the large domain gap between high-resolution
face images, e.g., from professional portrait photography, and low-quality
surveillance images, e.g., from security cameras. Establishing an identity
match between disparate sources like this is a classical surveillance face
identification scenario, which continues to be a challenging problem for modern
face recognition techniques. To that end, we propose a method that combines
face super-resolution, resolution matching, and multi-scale template
accumulation to reliably recognize faces from long-range surveillance footage,
including from low quality sources. The proposed approach does not require
training or fine-tuning on the target dataset of real surveillance images.
Extensive experiments show that our proposed method is able to outperform even
existing methods fine-tuned to the SCFace dataset
- …