9 research outputs found

    SkinningNet: two-stream graph convolutional neural network for skinning prediction of synthetic characters

    Get PDF
    This work presents SkinningNet, an end-to-end Two-Stream Graph Neural Network architecture that computes skinning weights from an input mesh and its associated skeleton, without making any assumptions on shape class and structure of the provided mesh. Whereas previous methods pre-compute handcrafted features that relate the mesh and the skeleton or assume a fixed topology of the skeleton, the proposed method extracts this information in an end-to-end learnable fashion by jointly learning the best relationship between mesh vertices and skeleton joints. The proposed method exploits the benefits of the novel Multi-Aggregator Graph Convolution that combines the results of different aggregators during the summarizing step of the Message-Passing scheme, helping the operation to generalize for unseen topologies. Experimental results demonstrate the effectiveness of the contributions of our novel architecture, with SkinningNet outperforming current state-of-the-art alternatives.This work has been partially supported by the project PID2020-117142GB-I00, funded by MCIN/AEI /10.13039/501100011033.Peer ReviewedPostprint (author's final draft

    2D-3D Geometric Fusion Network using Multi-Neighbourhood Graph Convolution for RGB-D Indoor Scene Classification

    Full text link
    Multi-modal fusion has been proved to help enhance the performance of scene classification tasks. This paper presents a 2D-3D Fusion stage that combines 3D Geometric Features with 2D Texture Features obtained by 2D Convolutional Neural Networks. To get a robust 3D Geometric embedding, a network that uses two novel layers is proposed. The first layer, Multi-Neighbourhood Graph Convolution, aims to learn a more robust geometric descriptor of the scene combining two different neighbourhoods: one in the Euclidean space and the other in the Feature space. The second proposed layer, Nearest Voxel Pooling, improves the performance of the well-known Voxel Pooling. Experimental results, using NYU-Depth-V2 and SUN RGB-D datasets, show that the proposed method outperforms the current state-of-the-art in RGB-D indoor scene classification task

    Graph convolutional neural networks for 3D data analysis

    No full text
    (English) Deep Learning allows the extraction of complex features directly from raw input data, eliminating the need for hand-crafted features from the classical Machine Learning pipeline. This new paradigm brought a boost in the performance across several domains, including computer vision, natural language processing and audio processing. However, there are still challenges when dealing with unorganized structures. This thesis addresses this challenge using Graph Convolutional Neural Networks, a new set of techniques capable of managing graph structures that can be used for processing 3D data. The first part of the thesis focuses on the Graph Analysis task, in which we will study the capabilities of Graph Convolutional Neural Networks to capture the intrinsic geometric information of 3D data. We propose the Attention Graph Convolution layer that learns to infer the kernel used during the convolution, taking into account the particularities of each neighbourhood of the graph. We explore two variants of the Attention Graph Convolution layer, one that explores a residual approach and another one that allows the convolution to combine different neighbourhood domains. Furthermore, we propose a set of 3D pooling layers that mimics the behaviour of the pooling layers found in common 2D Convolutional Neural Networks architectures. Finally, we present a 2D-3D Fusion block capable of merging the 3D geometric information that we get from a Graph Convolutional Neural Network with the texture information obtained by a 2D Convolutional Neural Network. We evaluate the presented contributions on the RGB-D Scene Classification task. The second part of this thesis focuses on the Node Analysis task, which consists of extracting features on a node level, taking into account the neighbourhood structure. We present the Multi-Aggregator Graph Convolution layer that uses a multiple aggregator approach to better generalize for unseen topologies and learn better local representations. In addition, it reduces the memory footprint with respect to the Attention Graph Convolution layer. Finally, we analyze the capabilities of our proposed Graph Convolution layers to deal with heterogeneous graphs where the nodes of the graph may belong to different modalities. We evaluate the presented contributions with the Computer Graphics process of skinning a character mesh. Specifically, we propose a Two-Stream Graph Neural Network capable of predicting the skinning weights of a 3D character.(Català) Deep Learning permet l'extracció de característiques complexes directament de les dades d'entrada, fet que elimina la necessitat d'escollir les característiques manualment que tenen les arquitectures clàssiques de Machine Learning. Aquest nou paradigma va portar una millora en el rendiment a diverses àrees com la visió artificial, el processament del llenguatge natural i el processament d'àudio. Tot i això, encara hi ha desafiaments quan es tracta de dades no estructurades. Aquesta tesi aborda aquests desafiaments utilitzant Graph Convolutional Neural Networks, un nou conjunt de tècniques capaces de tractar amb grafs que es poden fer servir per processar dades 3D. La primera part de la tesi se centra en la tasca d'anàlisi de grafs, on estudiarem les capacitats de les Graph Convolutional Neural Networks per capturar la informació geomètrica intrínseca de les dades 3D. Proposem l'Attention Graph Convolution que aprèn a inferir el filtre utilitzat durant la convolució, tenint en compte les particularitats de cada veïnatge del graf. Explorem dues variants de l'Attention Graph Convolution, una que explora un enfocament residual i una altra que permet que la convolució combini diferents dominis de veïnatge. A més, proposem un conjunt de capes d'agrupació 3D que imiten el comportament de les capes d'agrupació que es troben a les arquitectures clàssiques de 2D Convolutional Neural Networks. Finalment, presentem el 2D-3D Fusion block capaç de fusionar la informació geomètrica 3D que obtenim d'una Graph Convolutional Neural Network amb la informació de textura obtinguda per una 2D Convolutional Neural Network. Avaluem les contribucions presentades en la tasca de classificació d'escenes RGB-D. La segona part d'aquesta tesi se centra en la tasca d'anàlisi de nodes, que consisteix a extreure característiques a nivell de node, tenint en compte l'estructura del veïnatge. Presentem la Multi-Aggregator Graph Convolution que utilitza múltiples agregadors per generalitzar millor les topologies no vistes i aprendre millors representacions locals. A més, redueix la memòria necessària en comparació a l'Attention Graph Convolution. Finalment, analitzem les capacitats de les nostres Graph Convolution proposades per tractar amb grafs heterogenis on els nodes del graf poden contenir diferents modalitats. Avaluem les contribucions presentades amb el procés, que pertany a la disciplina de Computer Graphics, per calcular les contribucions d'un esquelet 3D a la deformació d'un mesh. Específicament, proposem una Two-Stream Graph Neural Network capaç de predir els valors de les contribucions d'un esquelet 3D a la deformació d'un mesh.Postprint (published version

    Detección de objetos en trayectoria de colisión

    No full text

    Graph convolutional neural networks for 3D data analysis

    No full text
    (English) Deep Learning allows the extraction of complex features directly from raw input data, eliminating the need for hand-crafted features from the classical Machine Learning pipeline. This new paradigm brought a boost in the performance across several domains, including computer vision, natural language processing and audio processing. However, there are still challenges when dealing with unorganized structures. This thesis addresses this challenge using Graph Convolutional Neural Networks, a new set of techniques capable of managing graph structures that can be used for processing 3D data. The first part of the thesis focuses on the Graph Analysis task, in which we will study the capabilities of Graph Convolutional Neural Networks to capture the intrinsic geometric information of 3D data. We propose the Attention Graph Convolution layer that learns to infer the kernel used during the convolution, taking into account the particularities of each neighbourhood of the graph. We explore two variants of the Attention Graph Convolution layer, one that explores a residual approach and another one that allows the convolution to combine different neighbourhood domains. Furthermore, we propose a set of 3D pooling layers that mimics the behaviour of the pooling layers found in common 2D Convolutional Neural Networks architectures. Finally, we present a 2D-3D Fusion block capable of merging the 3D geometric information that we get from a Graph Convolutional Neural Network with the texture information obtained by a 2D Convolutional Neural Network. We evaluate the presented contributions on the RGB-D Scene Classification task. The second part of this thesis focuses on the Node Analysis task, which consists of extracting features on a node level, taking into account the neighbourhood structure. We present the Multi-Aggregator Graph Convolution layer that uses a multiple aggregator approach to better generalize for unseen topologies and learn better local representations. In addition, it reduces the memory footprint with respect to the Attention Graph Convolution layer. Finally, we analyze the capabilities of our proposed Graph Convolution layers to deal with heterogeneous graphs where the nodes of the graph may belong to different modalities. We evaluate the presented contributions with the Computer Graphics process of skinning a character mesh. Specifically, we propose a Two-Stream Graph Neural Network capable of predicting the skinning weights of a 3D character.(Català) Deep Learning permet l'extracció de característiques complexes directament de les dades d'entrada, fet que elimina la necessitat d'escollir les característiques manualment que tenen les arquitectures clàssiques de Machine Learning. Aquest nou paradigma va portar una millora en el rendiment a diverses àrees com la visió artificial, el processament del llenguatge natural i el processament d'àudio. Tot i això, encara hi ha desafiaments quan es tracta de dades no estructurades. Aquesta tesi aborda aquests desafiaments utilitzant Graph Convolutional Neural Networks, un nou conjunt de tècniques capaces de tractar amb grafs que es poden fer servir per processar dades 3D. La primera part de la tesi se centra en la tasca d'anàlisi de grafs, on estudiarem les capacitats de les Graph Convolutional Neural Networks per capturar la informació geomètrica intrínseca de les dades 3D. Proposem l'Attention Graph Convolution que aprèn a inferir el filtre utilitzat durant la convolució, tenint en compte les particularitats de cada veïnatge del graf. Explorem dues variants de l'Attention Graph Convolution, una que explora un enfocament residual i una altra que permet que la convolució combini diferents dominis de veïnatge. A més, proposem un conjunt de capes d'agrupació 3D que imiten el comportament de les capes d'agrupació que es troben a les arquitectures clàssiques de 2D Convolutional Neural Networks. Finalment, presentem el 2D-3D Fusion block capaç de fusionar la informació geomètrica 3D que obtenim d'una Graph Convolutional Neural Network amb la informació de textura obtinguda per una 2D Convolutional Neural Network. Avaluem les contribucions presentades en la tasca de classificació d'escenes RGB-D. La segona part d'aquesta tesi se centra en la tasca d'anàlisi de nodes, que consisteix a extreure característiques a nivell de node, tenint en compte l'estructura del veïnatge. Presentem la Multi-Aggregator Graph Convolution que utilitza múltiples agregadors per generalitzar millor les topologies no vistes i aprendre millors representacions locals. A més, redueix la memòria necessària en comparació a l'Attention Graph Convolution. Finalment, analitzem les capacitats de les nostres Graph Convolution proposades per tractar amb grafs heterogenis on els nodes del graf poden contenir diferents modalitats. Avaluem les contribucions presentades amb el procés, que pertany a la disciplina de Computer Graphics, per calcular les contribucions d'un esquelet 3D a la deformació d'un mesh. Específicament, proposem una Two-Stream Graph Neural Network capaç de predir els valors de les contribucions d'un esquelet 3D a la deformació d'un mesh.DOCTORAT EN TEORIA DEL SENYAL I COMUNICACIONS (Pla 2013

    Detección de objetos en trayectoria de colisión

    No full text

    2D–3D geometric fusion network using multi-neighbourhood graph convolution for RGB-D indoor scene classification

    No full text
    Multi-modal fusion has been proved to help enhance the performance of scene classification tasks. This paper presents a 2D-3D Fusion stage that combines 3D Geometric Features with 2D Texture Features obtained by 2D Convolutional Neural Networks. To get a robust 3D Geometric embedding, a network that uses two novel layers is proposed. The first layer, Multi-Neighbourhood Graph Convolution, aims to learn a more robust geometric descriptor of the scene combining two different neighbourhoods: one in the Euclidean space and the other in the Feature space. The second proposed layer, Nearest Voxel Pooling, improves the performance of the well-known Voxel Pooling. Experimental results, using NYU-Depth-V2 and SUN RGB-D datasets, show that the proposed method outperforms the current state-of-the-art in RGB-D indoor scene classification task.This work was supported by Secretary of Universities and Research of the Generalitat de Catalunya and the European Social Fund via a PhD grant (FI2018) in the framework of project TEC2016-75976-R, financed by the Ministerio de Economía, Industria y Competitividad and the European Regional Development Fund (ERDF).Peer ReviewedPostprint (author's final draft

    Residual attention graph convolutional network for geometric 3D scene classification

    No full text
    Geometric 3D scene classification is a very challenging task. Current methodologies extract the geometric information using only a depth channel provided by an RGB-D sensor. These kinds of methodologies introduce possible errors due to missing local geometric context in the depth channel. This work proposes a novel Residual Attention Graph Convolutional Network that exploits the intrinsic geometric context inside a 3D space without using any kind of point features, allowing the use of organized or unorganized 3D data. Experiments are done in NYU Depth v1 and SUN-RGBD datasets to study the different configurations and to demonstrate the effectiveness of the proposed method. Experimental results show that the proposed method outperforms current state-of-the-art in geometric 3D scene classification tasks.This research was supported by Secretary of Universities and Research of the Generalitat de Catalunya and the European Social Fund via a PhD grant (FI2019) in the framework of project TEC2016-75976-R, financed by the Ministerio de Economía, Industria y Competitividad and the European Regional Development Fund (ERDF).Peer Reviewe
    corecore