7 research outputs found
Lightweight Object Detection Ensemble Framework for Autonomous Vehicles in Challenging Weather Conditions
The computer vision systems driving autonomous vehicles are judged by their ability to detect objects and obstacles in the vicinity of the vehicle in diverse environments. Enhancing this ability of a self-driving car to distinguish between the elements of its environment under adverse conditions is an important challenge in computer vision. For example, poor weather conditions like fog and rain lead to image corruption which can cause a drastic drop in object detection (OD) performance. The primary navigation of autonomous vehicles depends on the effectiveness of the image processing techniques applied to the data collected from various visual sensors. Therefore, it is essential to develop the capability to detect objects like vehicles and pedestrians under challenging conditions such as like unpleasant weather. Ensembling multiple baseline deep learning models under different voting strategies for object detection and utilizing data augmentation to boost the models' performance is proposed to solve this problem. The data augmentation technique is particularly useful and works with limited training data for OD applications. Furthermore, using the baseline models significantly speeds up the OD process as compared to the custom models due to transfer learning. Therefore, the ensembling approach can be highly effective in resource-constrained devices deployed for autonomous vehicles in uncertain weather conditions. The applied techniques demonstrated an increase in accuracy over the baseline models and were able to identify objects from the images captured in the adverse foggy and rainy weather conditions. The applied techniques demonstrated an increase in accuracy over the baseline models and reached 32.75% mean average precision (mAP) and 52.56% average precision (AP) in detecting cars in the adverse fog and rain weather conditions present in the dataset. The effectiveness of multiple voting strategies for bounding box predictions on the dataset is also demonstrated. These strategies help increase the explainability of object detection in autonomous systems and improve the performance of the ensemble techniques over the baseline models
How well do self-supervised models transfer to medical imaging?
Self-supervised learning approaches have seen success transferring between similar medical imaging datasets, however there has been no large scale attempt to compare the transferability of self-supervised models against each other on medical images. In this study, we compare the generalisability of seven self-supervised models, two of which were trained in-domain, against supervised baselines across eight different medical datasets. We find that ImageNet pretrained self-supervised models are more generalisable than their supervised counterparts, scoring up to 10% better on medical classification tasks. The two in-domain pretrained models outperformed other models by over 20% on in-domain tasks, however they suffered significant loss of accuracy on all other tasks. Our investigation of the feature representations suggests that this trend may be due to the models learning to focus too heavily on specific areas
Cascaded complementary filter architecture for sensor fusion in attitude estimation
Copyright: © 2021 by the authors. Attitude estimation is the process of computing the orientation angles of an object with respect to a fixed frame of reference. Gyroscope, accelerometer, and magnetometer are some of the fundamental sensors used in attitude estimation. The orientation angles computed from these sensors are combined using the sensor fusion methodologies to obtain accurate estimates. The complementary filter is one of the widely adopted techniques whose performance is highly dependent on the appropriate selection of its gain parameters. This paper presents a novel cascaded architecture of the complementary filter that employs a nonlinear and linear version of the complementary filter within one framework. The nonlinear version is used to correct the gyroscope bias, while the linear version estimates the attitude angle. The significant advantage of the proposed architecture is its independence of the filter parameters, thereby avoiding tuning the filter’s gain parameters. The proposed architecture does not require any mathematical modeling of the system and is computationally inexpensive. The proposed methodology is applied to the real-world datasets, and the estimation results were found to be promising compared to the other state-of-the-art algorithms
Recommended from our members
Development and deployment of a generative model-based framework for text to photorealistic image generation
The task of generating photorealistic images from their textual descriptions is quite challenging. Most existing tasks in this domain are focused on the generation of images such as flowers or birds from their textual description, especially for validating the generative models based on Generative Adversarial Network (GAN) variants and for recreational purposes. However, such work is limited in the domain of photorealistic face image generation and the results obtained have not been satisfactory. This is partly due to the absence of concrete data in this domain and a large number of highly specific features/attributes involved in face generation compared to birds or flowers. In this paper, we propose an Attention Generative Adversarial Network (AttnGAN) for a fine-grained text-to-face generation that enables attention-driven multi-stage refinement by employing Deep Attentional Multimodal Similarity Model (DAMSM). Through extensive experimentation on the CelebA dataset, we evaluated our approach using the Frechet Inception Distance (FID) score. The output files for the Face2Text Dataset are also compare with that of the T2F Github project. According to the visual comparison, AttnGAN generated higher-quality images than T2F. Additionally, we compare our methodology with existing approaches with a specific focus on CelebA dataset and demonstrate that our approach generates a better FID score facilitating more realistic image generation. The application of such an approach can be found in criminal identification, where faces are generated from the textual description from an eyewitness. Such a method can bring consistency and eliminate the individual biases of an artist drawing the faces from the description given by the eyewitness. Finally, we discuss the deployment of the models on a Raspberry Pi to test how effective the models would be on a standalone device to facilitate portability and timely task completion