27 research outputs found

    RGB-X Object Detection via Scene-Specific Fusion Modules

    Full text link
    Multimodal deep sensor fusion has the potential to enable autonomous vehicles to visually understand their surrounding environments in all weather conditions. However, existing deep sensor fusion methods usually employ convoluted architectures with intermingled multimodal features, requiring large coregistered multimodal datasets for training. In this work, we present an efficient and modular RGB-X fusion network that can leverage and fuse pretrained single-modal models via scene-specific fusion modules, thereby enabling joint input-adaptive network architectures to be created using small, coregistered multimodal datasets. Our experiments demonstrate the superiority of our method compared to existing works on RGB-thermal and RGB-gated datasets, performing fusion using only a small amount of additional parameters. Our code is available at https://github.com/dsriaditya999/RGBXFusion.Comment: Accepted to 2024 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV 2024

    RGB-D Salient Object Detection: A Survey

    Full text link
    Salient object detection (SOD), which simulates the human visual perception system to locate the most attractive object(s) in a scene, has been widely applied to various computer vision tasks. Now, with the advent of depth sensors, depth maps with affluent spatial information that can be beneficial in boosting the performance of SOD, can easily be captured. Although various RGB-D based SOD models with promising performance have been proposed over the past several years, an in-depth understanding of these models and challenges in this topic remains lacking. In this paper, we provide a comprehensive survey of RGB-D based SOD models from various perspectives, and review related benchmark datasets in detail. Further, considering that the light field can also provide depth maps, we review SOD models and popular benchmark datasets from this domain as well. Moreover, to investigate the SOD ability of existing models, we carry out a comprehensive evaluation, as well as attribute-based evaluation of several representative RGB-D based SOD models. Finally, we discuss several challenges and open directions of RGB-D based SOD for future research. All collected models, benchmark datasets, source code links, datasets constructed for attribute-based evaluation, and codes for evaluation will be made publicly available at https://github.com/taozh2017/RGBDSODsurveyComment: 24 pages, 12 figures. Has been accepted by Computational Visual Medi

    Object Segmentation and Reconstruction Using Infrastructure Sensor Nodes for Autonomous Mobility

    Get PDF
    This thesis focuses on the Lidar point cloud processing for the infrastructure sensor node that serves as the perception system for autonomous robots with general mobility in indoor applications. Compared with typical schemes mounting sensors on the robots, the method acquires data from infrastructure sensor nodes, providing a more comprehensive view of the environment, which benefits the robot's navigation. The number of sensors would not need to be increased even for multiple robots, significantly reducing costs. In addition, with a central perception system using the infrastructure sensor nodes navigating every robot, a more comprehensive understanding of the current environment and all the robots' locations can be obtained for the control and operation of the autonomous robots. For a robot in the detection range of the sensor node, the sensor node can detect and segment obstacles in its driveable area and reconstruct the incomplete, sparse point cloud of objects upon their movement. The complete shape by the reconstruction benefits the localization and path planning which follows the perception part of the robot's system. Considering the sparse Lidar data and the variety of object categories in the environment, a model-free scheme is selected for object segmentation. Point segmentation starts with background filtering. Considering the complexity of the indoor environment, a depth-matching-based background removal approach is first proposed. However, later tests imply that the method is adequate but not time-efficient. Therefore, based on the depth matching-based method, a process that only focuses on the drive-able area of the robot is proposed, and the computational complexity is significantly reduced. With optimization, the computation time for processing one frame of data can be greatly increased, from 0.2 second by the first approach to 0.01 second by the second approach. After background filtering, the remaining points for occurring objects are segmented as separate clusters using an object clustering algorithm. With independent clusters of objects, an object tracking algorithm is followed to allocate the point clusters with IDs and arrange the clusters in a time sequence. With a stream of clusters for a specific object in a time sequence, point registration is deployed to aggregate the clusters into a complete shape. And as noticed during the experiment, one of the differences between indoor and outdoor environments is that contact between objects in the indoor environment is much more common. The objects in contact are likely to be segmented as a single cluster by the model-free clustering algorithm, which needs to be avoided in the reconstruction process. Therefore an improvement is made in the tracking algorithm when contact happens. The algorithms in this thesis have been experimentally evaluated and presented

    3D data fusion by depth refinement and pose recovery

    Get PDF
    Refining depth maps from different sources to obtain a refined depth map, and aligning the rigid point clouds from different views, are two core techniques. Existing depth fusion algorithms do not provide a general framework to obtain a highly accurate depth map. Furthermore, existing rigid point cloud registration algorithms do not always align noisy point clouds robustly and accurately, especially when there are many outliers and large occlusions. In this thesis, we present a general depth fusion framework based on supervised, semi-supervised, and unsupervised adversarial network approaches. We show that the refined depth maps are more accurate than the source depth maps by depth fusion. We develop a new rigid point cloud registration algorithm by aligning two uncertainty-based Gaussian mixture models, which represent the structures of the two point clouds. We show that we can register rigid point clouds more accurately over a larger range of perturbations. Subsequently, the new supervised depth fusion algorithm and new rigid point cloud registration algorithm are integrated into the ROS system of a real gardening robot (called TrimBot) for practical usage in real environments. All the proposed algorithms have been evaluated on multiple existing datasets to show their superiority compared to prior work in the field

    Active SLAM: A Review On Last Decade

    Full text link
    This article presents a comprehensive review of the Active Simultaneous Localization and Mapping (A-SLAM) research conducted over the past decade. It explores the formulation, applications, and methodologies employed in A-SLAM, particularly in trajectory generation and control-action selection, drawing on concepts from Information Theory (IT) and the Theory of Optimal Experimental Design (TOED). This review includes both qualitative and quantitative analyses of various approaches, deployment scenarios, configurations, path-planning methods, and utility functions within A-SLAM research. Furthermore, this article introduces a novel analysis of Active Collaborative SLAM (AC-SLAM), focusing on collaborative aspects within SLAM systems. It includes a thorough examination of collaborative parameters and approaches, supported by both qualitative and statistical assessments. This study also identifies limitations in the existing literature and suggests potential avenues for future research. This survey serves as a valuable resource for researchers seeking insights into A-SLAM methods and techniques, offering a current overview of A-SLAM formulation.Comment: 34 pages, 8 figures, 6 table

    Medical Image Classification using Deep Learning Techniques and Uncertainty Quantification

    Get PDF
    The emergence of medical image analysis using deep learning techniques has introduced multiple challenges in terms of developing robust and trustworthy systems for automated grading and diagnosis. Several works have been presented to improve classification performance. However, these methods lack the diversity of capturing different levels of contextual information among image regions, strategies to present diversity in learning by using ensemble-based techniques, or uncertainty measures for predictions generated from automated systems. Consequently, the presented methods provide sub-optimal results which is not enough for clinical practice. To enhance classification performance and introduce trustworthiness, deep learning techniques and uncertainty quantification methods are required to provide diversity in contextual learning and the initial stage of explainability, respectively. This thesis aims to explore and develop novel deep learning techniques escorted by uncertainty quantification for developing actionable automated grading and diagnosis systems. More specifically, the thesis provides the following three main contributions. First, it introduces a novel entropy-based elastic ensemble of Deep Convolutional Neural Networks (DCNNs) architecture termed as 3E-Net for classifying grades of invasive breast carcinoma microscopic images. 3E-Net is based on a patch-wise network for feature extraction and image-wise networks for final image classification and uses an elastic ensemble based on Shannon Entropy as an uncertainty quantification method for measuring the level of randomness in image predictions. As the second contribution, the thesis presents a novel multi-level context and uncertainty-aware deep learning architecture named MCUa for the classification of breast cancer microscopic images. MCUa consists of multiple feature extractors and multi-level context-aware models in a dynamic ensemble fashion to learn the spatial dependencies among image patches and enhance the learning diversity. Also, the architecture uses Monte Carlo (MC) dropout for measuring the uncertainty of image predictions and deciding whether an input image is accurate based on the generated uncertainty score. The third contribution of the thesis introduces a novel model agnostic method (AUQantO) that establishes an actionable strategy for optimising uncertainty quantification for deep learning architectures. AUQantO method works on optimising a hyperparameter threshold, which is compared against uncertainty scores from Shannon entropy and MC-dropout. The optimal threshold is achieved based on single- and multi-objective functions which are optimised using multiple optimisation methods. A comprehensive set of experiments have been conducted using multiple medical imaging datasets and multiple novel evaluation metrics to prove the effectiveness of our three contributions to clinical practice. First, 3E-Net versions achieved an accuracy of 96.15% and 99.50% on invasive breast carcinoma dataset. The second contribution, MCUa, achieved an accuracy of 98.11% on Breast cancer histology images dataset. Lastly, AUQantO showed significant improvements in performance of the state-of-the-art deep learning models with an average accuracy improvement of 1.76% and 2.02% on Breast cancer histology images dataset and an average accuracy improvement of 5.67% and 4.24% on Skin cancer dataset using two uncertainty quantification techniques. AUQantO demonstrated the ability to generate the optimal number of excluded images in a particular dataset

    Perancangan Alat Gorden Dan Lampu Untuk Smarthome Berbasis Mikrokontroler dan Android

    Get PDF
    Perkembangan teknologi pada zaman ini memberikan perubahan yang nyata dalam kehidupan manusia. Telah banyak peralatan yang dibuat oleh manusia yang fungsinya untuk mempermudah pekerjaan manusia sehingga manusia sangat bergantung pada teknologi. Salah satu manfaat dari teknologi modern yang digunakan internet of things. Smarthome merupakan bagian dari internet of things yang bertujuan untuk memonitor dan mengendalikan Segala sesuatu di kediaman laptop atau smartphone. Untuk itu dalam implementasi smarthome jurnal ini akan membahas bagaimana merancang gorden dan lampu otomatis menggunakan sensor cahaya dan sensor temperature tubuh manusia berbasis arduino yang dikontrol dengan smartphone. Alat ini berfungsi untuk membuka atau menutup gorden dan menyalakan atau mematikan lampu dalam dua mode. Dalam alat ini akan terdapat dua mode yaitu otomatis dan manual, dimana mode otomatis akan bekerja dengan inputan yang masuk dari sensor sedangkan mode manual akan bekerja dengan inputan dari smartphone. Untuk mengerakkan gorden digunakan motor dc dan untuk mengontrol lampu digunakan relay sedangkan lcd digunakan untuk menampilkan mode yang sedang digunakan dalam alat

    Toward Efficient Rendering: A Neural Network Approach

    Get PDF
    Physically-based image synthesis has attracted considerable attention due to its wide applications in visual effects, video games, design visualization, and simulation. However, obtaining visually satisfactory renderings with ray tracing algorithms often requires casting a large number of rays and thus takes a vast amount of computation. The extensive computational and memory requirements of ray tracing methods pose a challenge, especially when running these rendering algorithms on resource-constrained platforms, and impede their applications that require high resolutions and refresh rates. This thesis presents three methods to address the challenge of efficient rendering. First, we present a hybrid rendering method to speed up Monte Carlo rendering algorithms. Our method first generates two versions of a rendering: one at a low resolution with a high sample rate (LRHS) and the other at a high resolution with a low sample rate (HRLS). We then develop a deep convolutional neural network to fuse these two renderings into a high-quality image as if it were rendered at a high resolution with a high sample rate. Specifically, we formulate this fusion task as a super-resolution problem that generates a high-resolution rendering from a low-resolution input (LRHS), assisted with the HRLS rendering. The HRLS rendering provides critical high-frequency details which are difficult to recover from the LRHS for any super-resolution methods. Our experiments show that our hybrid rendering algorithm is significantly faster than the state-of-the-art Monte Carlo denoising methods while rendering high-quality images when tested on both our own BCR dataset and the Gharbi dataset. Second, we investigate super-resolution to reduce the number of pixels to render and thus speed up Monte Carlo rendering algorithms. While great progress has been made in super-resolution technologies, it is essentially an ill-posed problem and cannot recover high-frequency details in renderings. To address this problem, we exploit high-resolution auxiliary features to guide the super-resolution of low-resolution renderings. These high-resolution auxiliary features can be quickly rendered by a rendering engine and, at the same time, provide valuable high-frequency details to assist super-resolution. To this end, we develop a cross-modality Transformer network that consists of an auxiliary feature branch and a low-resolution rendering branch. These two branches are designed to fuse high-resolution auxiliary features with the corresponding low-resolution rendering. Furthermore, we design residual densely-connected Swin Transformer groups for learning to extract representative features to enable high-quality super-resolution. Our experiments show that our auxiliary features-guided super-resolution method outperforms both state-of-the-art super-resolution methods and Monte Carlo denoising methods in producing high-quality renderings. Third, we present a deep-learning-based Monte Carlo Denoising method for the stereoscopic images. Research on deep-learning-based Monte Carlo denoising has made significant progress in recent years. However, existing methods are mostly designed for single-image Monte Carlo denoising, and stereoscopic image Monte Carlo denoising is less explored. Traditional methods require first rendering a noiseless for one view, which is time-consuming. Recent deep-learning-based methods achieve promising results on single-image Monte Carlo denoising, but their performance on the stereoscopic image is compromised as they do not consider the spatial correspondence between the left image and the right image. In this thesis, we present a deep-learning-based Monte Carlo denoising method for stereoscopic images. It takes low sampling per pixel (spp) stereoscopic images as inputs and estimates the high-quality result. Specifically, we extract features from two stereoscopic images and warp the features from one image to the other using the disparity finetuned from the disparity calculated from geometry. To train our network, we collected a large-scale Blender Cycles Stereo Ray-tracing dataset. Our experiments show that our method outperforms state-of-the-art methods when the sampling rates are low

    Design a CPW antenna on rubber substrate for multiband applications

    Get PDF
    This paper presents a compact CPW monopole antenna on rubber substrate for multiband applications. The multi band applications (2.45 and 3.65 GHz) is achieved on this antenna design with better antenna performances. Specially this antenna focused on ISM band application meanwhile some of slots (S1, S2, S3) have been used and attained another frequency band at 3.65 GHz for WiMAX application. The achievement of the antenna outcomes from this design that the bandwidth of 520 MHz for first band, the second band was 76 MHz for WiMAX application and the radiation efficiency attained around 90%. Moreover, the realized gain was at 4.27 dBi which overcome the most of existing design on that field. CST microwave studio has been used for antenna simulation
    corecore