5 research outputs found

    Multi-Modal Sensor Fusion-Based Semantic Segmentation for Snow Driving Scenarios

    Get PDF
    In recent years, autonomous vehicle driving technology and advanced driver assistance systems have played a key role in improving road safety. However, weather conditions such as snow pose severe challenges for autonomous driving and are an active research area. Thanks to their superior reliability, the resilience of detection, and improved accuracy, advances in computation and sensor technology have paved the way for deep learning and neural network-based techniques that can replace the classical approaches. In this research, we investigate the semantic segmentation of roads in snowy environments. We propose a multi-modal fused RGB-T semantic segmentation utilizing a color (RGB) image and thermal map (T) as inputs for the network. This paper introduces a novel fusion module that combines the feature map from both inputs. We evaluate the proposed model on a new snow dataset that we collected and on other publicly available datasets. The segmentation results show that the proposed fused RGB-T input can segregate human subjects in snowy environments better than an RGB-only input. The fusion module plays a vital role in improving the efficiency of multiple input neural networks for person detection. Our results show that the proposed network can generate a higher success rate than other state-of-the-art networks. The combination of our fused module and pyramid supervision path generated the best results in both mean accuracy and mean intersection over union in every dataset

    DeepMetaForge: A Deep Vision-Transformer Metadata-Fusion Network for Automatic Skin Lesion Classification

    No full text
    Skin cancer is a dangerous form of cancer that develops slowly in skin cells. Delays in diagnosing and treating these malignant skin conditions may have serious repercussions. Likewise, early skin cancer detection has been shown to improve treatment outcomes. This paper proposes DeepMetaForge, a deep-learning framework for skin cancer detection from metadata-accompanied images. The proposed framework utilizes BEiT, a vision transformer pre-trained as a masked image modeling task, as the image-encoding backbone. We further propose merging the encoded metadata with the derived visual characteristics while compressing the aggregate information simultaneously, simulating how photos with metadata are interpreted. The experiment results on four public datasets of dermoscopic and smartphone skin lesion images reveal that the best configuration of our proposed framework yields 87.1% macro-average F1 on average. The empirical scalability analysis further shows that the proposed framework can be implemented in a variety of machine-learning paradigms, including applications on low-resource devices and as services. The findings shed light on not only the possibility of implementing telemedicine solutions for skin cancer on a nationwide scale that could benefit those in need of quality healthcare but also open doors to many intelligent applications in medicine where images and metadata are collected together, such as disease detection from CT-scan images and patients’ expression recognition from facial images
    corecore