45 research outputs found

    Non-local Aggregation for RGB-D Semantic Segmentation

    Get PDF
    Exploiting both RGB (2D appearance) and Depth (3D geometry) information can improve the performance of semantic segmentation. However, due to the inherent difference between the RGB and Depth information, it remains a challenging problem in how to integrate RGB-D features effectively. In this letter, to address this issue, we propose a Non-local Aggregation Network (NANet), with a well-designed Multi-modality Non-local Aggregation Module (MNAM), to better exploit the non-local context of RGB-D features at multi-stage. Compared with most existing RGB-D semantic segmentation schemes, which only exploit local RGB-D features, the MNAM enables the aggregation of non-local RGB-D information along both spatial and channel dimensions. The proposed NANet achieves comparable performances with state-of-the-art methods on popular RGB-D benchmarks, NYUDv2 and SUN-RGBD

    Exploring Audio Compression as Image Completion in Time-Frequency Domain

    Get PDF
    Audio compression is usually achieved with algorithms that exploit spectral properties of the given signal such as frequency or temporal masking. In this paper we propose to tackle such a problem from a different point of view, considering the time-frequency domain of an audio signal as an intensity map to be reconstructed via a data-driven approach. The compression stage removes some selected input values from the time-frequency representation of the original signal. Then, decompression works by reconstructing the missing samples as an image completion task. Our method is divided into two main parts: first, we analyse the feasibility of a data-driven audio reconstruction with missing samples in its time-frequency representation. To do so, we exploit an existing CNN model designed for depth completion, involving a sequence of sparse convolutions to deal with absent values. Second, we propose a method to select the values to be removed at compression stage, maximizing the perceived audio quality of the decompressed signal. In the experimental section we validate the proposed technique on some standard audio datasets and provide an extensive study on the quality of the reconstructed signal under different conditions

    Accurate semantic segmentation of RGB-D images for indoor navigation

    Get PDF
    We introduce an approach of semantic segmentation to detect various objects for the mobile robot system “ROSWITHA” (RObot System WITH Autonomy). Developing a semantic segmentation method is a challenging research field in machine learning and computer vision. The semantic segmentation approach is robust compared with the other traditional state-of- the-art methods for understanding the surroundings. Semantic segmentation is a method that presents the most information about the object, such as classification and localization of the object on the image level and the pixel level, thus precisely depicting the shape and position of the object in space. In this work, we experimented with verifying the effectiveness of semantic segmentation when used as an aid to improving the performance of robust indoor navigation tasks. To make the output map of semantic segmentation meaningful, and enhance the model accuracy, points cloud data were extracted from the depth camera, which fuses the data origi- nated from RGB and depth stream to improve the speed and accuracy compared with different machine learning algorithms. We compared our modified approach with the state-of-the-art methods and compared the results when trained with the available dataset NYUv2. Moreover, the model was then trained with the customized indoor dataset 1 (three classes) and dataset 2 (seven classes) to achieve a robust classification of the objects in the dynamic environment of Frankfurt University of Applied Sciences laboratories. The model attains a global accuracy of 98.2%, with a mean intersection over union (mIoU) of 90.9% for dataset 1. For dataset 2, the model achieves a global accuracy of 95.6%, with an mIoU of 72%. Furthermore, the evaluations were performed in our indoor scenario.14 página

    Road layout understanding by generative adversarial inpainting

    Get PDF
    Autonomous driving is becoming a reality, yet vehicles still need to rely on complex sensor fusion to understand the scene they act in. The ability to discern static environment and dynamic entities provides a comprehension of the road layout that poses constraints to the reasoning process about moving objects. We pursue this through a GAN-based semantic segmentation inpainting model to remove all dynamic objects from the scene and focus on understanding its static components such as streets, sidewalks and buildings. We evaluate this task on the Cityscapes dataset and on a novel synthetically generated dataset obtained with the CARLA simulator and specifically designed to quantitatively evaluate semantic segmentation inpaintings. We compare our methods with a variety of baselines working both in the RGB and segmentation domains
    corecore