12,977 research outputs found

    Hand gesture recognition based on signals cross-correlation

    Get PDF

    Multimodal Deep Learning for Robust RGB-D Object Recognition

    Full text link
    Robust object recognition is a crucial ingredient of many, if not all, real-world robotics applications. This paper leverages recent progress on Convolutional Neural Networks (CNNs) and proposes a novel RGB-D architecture for object recognition. Our architecture is composed of two separate CNN processing streams - one for each modality - which are consecutively combined with a late fusion network. We focus on learning with imperfect sensor data, a typical problem in real-world robotics tasks. For accurate learning, we introduce a multi-stage training methodology and two crucial ingredients for handling depth data with CNNs. The first, an effective encoding of depth information for CNNs that enables learning without the need for large depth datasets. The second, a data augmentation scheme for robust learning with depth images by corrupting them with realistic noise patterns. We present state-of-the-art results on the RGB-D object dataset and show recognition in challenging RGB-D real-world noisy settings.Comment: Final version submitted to IROS'2015, results unchanged, reformulation of some text passages in abstract and introductio

    Data mining and fusion

    No full text

    Hacia el modelado 3d de tumores cerebrales mediante endoneurosonografía y redes neuronales

    Get PDF
    Las cirugías mínimamente invasivas se han vuelto populares debido a que implican menos riesgos con respecto a las intervenciones tradicionales. En neurocirugía, las tendencias recientes sugieren el uso conjunto de la endoscopia y el ultrasonido, técnica llamada endoneurosonografía (ENS), para la virtualización 3D de las estructuras del cerebro en tiempo real. La información ENS se puede utilizar para generar modelos 3D de los tumores del cerebro durante la cirugía. En este trabajo, presentamos una metodología para el modelado 3D de tumores cerebrales con ENS y redes neuronales. Específicamente, se estudió el uso de mapas auto-organizados (SOM) y de redes neuronales tipo gas (NGN). En comparación con otras técnicas, el modelado 3D usando redes neuronales ofrece ventajas debido a que la morfología del tumor se codifica directamente sobre los pesos sinápticos de la red, no requiere ningún conocimiento a priori y la representación puede ser desarrollada en dos etapas: entrenamiento fuera de línea y adaptación en línea. Se realizan pruebas experimentales con maniquíes médicos de tumores cerebrales. Al final del documento, se presentan los resultados del modelado 3D a partir de una base de datos ENS.Minimally invasive surgeries have become popular because they reduce the typical risks of traditional interventions. In neurosurgery, recent trends suggest the combined use of endoscopy and ultrasound (endoneurosonography or ENS) for 3D virtualization of brain structures in real time. The ENS information can be used to generate 3D models of brain tumors during a surgery. This paper introduces a methodology for 3D modeling of brain tumors using ENS and unsupervised neural networks. The use of self-organizing maps (SOM) and neural gas networks (NGN) is particularly studied. Compared to other techniques, 3D modeling using neural networks offers advantages, since tumor morphology is directly encoded in synaptic weights of the network, no a priori knowledge is required, and the representation can be developed in two stages: off-line training and on-line adaptation. Experimental tests were performed using virtualized phantom brain tumors. At the end of the paper, the results of 3D modeling from an ENS database are presented

    Deep Projective 3D Semantic Segmentation

    Full text link
    Semantic segmentation of 3D point clouds is a challenging problem with numerous real-world applications. While deep learning has revolutionized the field of image semantic segmentation, its impact on point cloud data has been limited so far. Recent attempts, based on 3D deep learning approaches (3D-CNNs), have achieved below-expected results. Such methods require voxelizations of the underlying point cloud data, leading to decreased spatial resolution and increased memory consumption. Additionally, 3D-CNNs greatly suffer from the limited availability of annotated datasets. In this paper, we propose an alternative framework that avoids the limitations of 3D-CNNs. Instead of directly solving the problem in 3D, we first project the point cloud onto a set of synthetic 2D-images. These images are then used as input to a 2D-CNN, designed for semantic segmentation. Finally, the obtained prediction scores are re-projected to the point cloud to obtain the segmentation results. We further investigate the impact of multiple modalities, such as color, depth and surface normals, in a multi-stream network architecture. Experiments are performed on the recent Semantic3D dataset. Our approach sets a new state-of-the-art by achieving a relative gain of 7.9 %, compared to the previous best approach.Comment: Submitted to CAIP 201
    • …
    corecore