337 research outputs found
Domain Adaptation with Joint Learning for Generic, Optical Car Part Recognition and Detection Systems (Go-CaRD)
Systems for the automatic recognition and detection of automotive parts are
crucial in several emerging research areas in the development of intelligent
vehicles. They enable, for example, the detection and modelling of interactions
between human and the vehicle. In this paper, we quantitatively and
qualitatively explore the efficacy of deep learning architectures for the
classification and localisation of 29 interior and exterior vehicle regions on
three novel datasets. Furthermore, we experiment with joint and transfer
learning approaches across datasets and point out potential applications of our
systems. Our best network architecture achieves an F1 score of 93.67 % for
recognition, while our best localisation approach utilising state-of-the-art
backbone networks achieve a mAP of 63.01 % for detection. The MuSe-CAR-Part
dataset, which is based on a large variety of human-car interactions in videos,
the weights of the best models, and the code is publicly available to academic
parties for benchmarking and future research.Comment: Demonstration and instructions to obtain data and models:
https://github.com/lstappen/GoCar
In Car Audio
This chapter presents implementations of advanced in Car Audio Applications. The system is composed by three main different applications regarding the In Car listening and communication experience. Starting from a high level description of the algorithms, several implementations on different levels of hardware abstraction are presented, along with empirical results on both the design process undergone and the performance results achieved
A Comparative Analysis of Neural-Based Visual Recognisers for Speech Activity Detection
Recent advances in Neural network has offered great solutions to automation of various detections including speech activity detection (SAD). However, existing literature on SAD highlights different approaches within neural networks, but do not provide a comprehensive comparison of the approaches. This is important because such neural approaches often require hardware-intensive resources.
As a result, the project provides a comparative analysis of three different approaches: classification with still images (CNN), classification based on previous images (CRNN) and classification based on a sequence of images (Seq2Seq). The project aims to find a modest approach-one that provides the highest accuracy but yet does not require expensive computation whilst providing the quickest output prediction times. Such approach can then be adapted for real-time application such as activation of infotainment systems or interactive robots etc.
Results show that within the problem domain (dataset, resources etc.) the use of still images can achieve an accuracy of 97% for SAD. With the addition of RNN, the classification accuracy is increased further by 2%, as both architectures (classification based on previous images and classification of a sequence of images) achieve 99% classification accuracy.
These results show that the use of history/previous images improves accuracy compared to the use of still images. Furthermore, with the RNNs ability of memory, the network can be defined smaller which results in quicker training and prediction times. Experiments also showed that CRNN is almost as accurate as the Seq2Seq architecture (99.1% vs 99.6% classification accuracy, respectively) but faster to train (326s vs 761s per epoch) and 28% faster output predictions (3.7s vs 5.19s per prediction). These results indicate that the CRNN can be a suitable choice for real-time application such as activation of infotainment systems based on classification accuracy, training and prediction times
Overview of the CLEF 2022 JOKER Task 1: Classify and Explain Instances of Wordplay
As a multidisciplinary field of study, humour remains one of the most difficult aspects of
intercultural communication. Understanding humor often involves understanding implicit
cultural references and/or double meanings, which raises the questions of how to detect
and classify instances of this complex phenomenon. This paper provides an overview of
Pilot Task 1 of the CLEF 2022 JOKER track, where participants had to classify and explain
instances of wordplay. We introduce a new classification of wordplay and a new annotation
scheme for wordplay interpretation suitable both for phrase-based wordplay and wordplay in
named entities. We describe the collection of our data, our task setup, and the evaluation
procedure, and we give a brief overview of the participating teams’ approaches and results
- …