Search CORE

12 research outputs found

Escape from Cells: Deep Kd-Networks for the Recognition of 3D Point Cloud Models

Author: Klokov Roman
Lempitsky Victor
Publication venue
Publication date: 26/10/2017
Field of study

We present a new deep learning architecture (called Kd-network) that is designed for 3D model recognition tasks and works with unstructured point clouds. The new architecture performs multiplicative transformations and share parameters of these transformations according to the subdivisions of the point clouds imposed onto them by Kd-trees. Unlike the currently dominant convolutional architectures that usually require rasterization on uniform two-dimensional or three-dimensional grids, Kd-networks do not rely on such grids in any way and therefore avoid poor scaling behaviour. In a series of experiments with popular shape recognition benchmarks, Kd-networks demonstrate competitive performance in a number of shape recognition tasks such as shape classification, shape retrieval and shape part segmentation.Comment: Spotlight at ICCV'1

arXiv.org e-Print Archive

Crossref

Probabilistic Reconstruction Networks for 3D Shape Inference from a Single Image

Author: Boyer Edmond
Klokov Roman
Verbeek Jakob
Publication venue: HAL CCSD
Publication date: 09/09/2019
Field of study

Awarded with Best Science Paper Honourable Mention Award at BMVC'19.International audienceWe study end-to-end learning strategies for 3D shape inference from images, in particular from a single image. Several approaches in this direction have been investigated that explore different shape representations and suitable learning architectures. We focus instead on the underlying probabilistic mechanisms involved and contribute a more principled probabilistic inference-based reconstruction framework, which we coin Probabilistic Reconstruction Networks. This framework expresses image conditioned 3D shape inference through a family of latent variable models, and naturally decouples the choice of shape representations from the inference itself. Moreover, it suggests different options for the image conditioning and allows training in two regimes, using either Monte Carlo or variational approximation of the marginal likelihood. Using our Probabilistic Reconstruction Networks we obtain single image 3D reconstruction results that set a new state of the art on the ShapeNet dataset in terms of the intersection over union and earth mover’s distance evaluation metrics. Interestingly, we obtain these results using a basic voxel grid representation, improving over recent work based on finer point cloud or mesh based representations

INRIA a CCSD electronic archive server

Deep Learning for 3D Shape Modelling

Author: Klokov Roman
Publication venue
Publication date: 03/12/2021
Field of study

L’application des stratégies d’apprentissage profond, aux données de formes 3D pose divers défis aux chercheurs. La nature complexe de ces données 3D autorise différentes représentations, par exemples les grilles d’occupation, les nuages de points, les maillages ou les fonctions implicites. Chacune de ces représentations a vu apparaitre des familles de réseaux de neurones profonds capables de traiter et prédire en fonction d’échantillons, cela pour diverses tâches de reconnaissance, de génération et de modification de données.Les modèles d’apprentissage profond modernes obligent les chercheurs à effectuer divers choix de conception associés à leurs architectures, aux algorithmes d’apprentissage et à d’autres aspects plus spécifiques des applications choisies. Ces choix sont souvent faits sur la base d’heuristiques, ou de manière empirique au travers de nombreuses évaluations expérimentales coûteuses. La modélisation probabiliste offre une alternative à cela et permet de formaliser les tâches d’apprentissage automatique de manière rigoureuse et de développer des objectifs d’entrainement qui reposent sur les probabilités. Cette thèse explore la combinaison de l’apprentissage profond avec la modélisation probabiliste dans le cadre applicatif des données 3D de formes géométriques.La première contribution porte sur l’inférence d’une forme 3D à partir d’une seule vue et explore comment la modélisation probabiliste pourrait être appliquée dans ce contexte. Nous proposons pour cela un ensemble de modèles probabilistes, les réseaux de reconstruction probabilistes (PRN), qui traitent la tâche comme une génération conditionnée par l’image et introduisent une variable latente globale qui encode les informations de géométrie des formes. Nous expérimentons différents conditionnements par l’image et deux objectifs d’entraînement différents basés pour l’un sur la méthode de Monte Carlo et pour l’autre sur l’approximation variationnel de la vraisemblance du modèle. Les modèles PRN sont évalués avec l’inférence de grilles d’occupation 3D à partir d’une seule vue, sur des formes synthétiques observées à partir de points de vue aléatoires. Nous montrons que le conditionnement, par l’image observée, de la distribution a priori de la variable latente est suffisant pour obtenir des performances compétitives pour les métriques basées sur les nuages de points et état de l’art pour les métriques basées sur les voxels. Nous démontrons en outre que l’objectif probabiliste basé sur l’approximation variationnelle de la vraisemblance permet au modèle d’obtenir de meilleurs résultats que l’approximation basée sur Monte Carlo.La deuxième contribution est un modèle probabiliste pour la génération de nuages de points 3D. Ces nuages de points sont vus comme des distributions sur des variables échangeables et utilise le théorème de Finetti pour définir un modèle global de variables latentes avec des distributions conditionnellement indépendantes pour les coordonnées de chaque point. Pour modéliser ces distributions ponctuelles, un nouveau type de flux de normalisation conditionnelle est proposé, basé sur un couplage discret des dimensions des coordonnées ponctuelles. Nous étendons également nos réseaux de flux ponctuels discrets (DPFN) de la génération à la tâche d’inférence à vue unique en conditionnant la variable latente globale a priori d’une manière similaire aux PRN de la première contribution. Les performances génératives résultantes démontrent que les DPFN produisent des échantillons de qualité et de diversité similaires à l’état de l’art basé sur des flux de normalisation continus, mais sont environ 30 fois plus rapides que ces derniers, à la fois dans la formation et l’échantillonnage. Les résultats des tâches d’encodage automatique et d’inférence à vue unique montrent des performances compétitives et état de l’art avec les métriques de distance de chanfrein, de F-score et de distance de Wasserstein pour les nuages de points.Application of deep learning to geometric 3D data poses various challenges for researchers. The complex nature of geometric 3D data allows to represent it in different forms: occupancy grids, point clouds, meshes, implicit functions, etc. Each of those representations has already spawned streams of deep neural network models, capable of processing and predicting according data samples for further use in various data recognition, generation, and modification tasks.Modern deep learning models force researchers to make various design choices, associated with their architectures, learning algorithms and other specific aspects of the chosen applications. Often, these choices are made with the help of various heuristics and best practice methods discovered through numerous costly experimental evaluations. Probabilistic modeling provides an alternative to these methods that allows to formalize machine learning tasks in a meaningful manner and develop probability-based training objectives. This thesis explores combinations of deep learning based methods and probabilistic modeling in application to geometric 3D data.The first contribution explores how probabilistic modeling could be applied in the context of single-view 3D shape inference task. We propose a family of probabilistic models, Probabilistic Reconstruction Networks (PRNs),which treats the task as image conditioned generation and introduces a global latent variable, encoding shape geometry information. We explore different image conditioning options, and two different training objectives based on Monte Carlo and variational approximations of the model likelihood. Parameters of every distribution are predicted by multi-layered convolutional and fully-connected neural networks from the input images. All the options in the family of models are evaluated in the single-view 3D occupancy grid inference task on synthetic shapes and according image renderings from randomized viewpoints. We show that conditioning the latent variable prior on the input images is sufficient to achieve competitive and state-of-the-art single-view 3D shape inference performance for point cloud based and voxel based metrics, respectively. We additionally demonstrate that probabilistic objective based on variational approximation of the likelihood allows the model to obtain better results compared to Monte Carlo based approximation.The second contribution proposes a probabilistic model for 3D point cloud generation. It treats point clouds as distributions over exchangeable variables and use de Finetti’s representation theorem to define a global latent variable model with conditionally independent distributions for coordinates of each point. To model these point distributions a novel type of conditional normalizing flows is proposed, based on discrete coupling of point coordinate dimensions. These flows update the coordinates of each point sample multiple times by dividing them in two groups and inferring the updates for one group of coordinates from another group and, additionally, global latent variable sample by the means of multi-layered fully-connected neural networks with parameters shared for all the points. We also extend our Discrete Point Flow Networks (DPFNs) from generation to single-view inference task by conditioning the global latent variable prior in a manner similar to PRNs from the first contribution. Resulting generative performance demonstrates that DPFNs produce sets of samples of similar quality and diversity compared to state of the art based on continuous normalizing flows, but are approximately 30 times faster both in training and sampling. Results in autoencoding and single-view inference tasks show competitive and state-of-the-art performance for Chamfer distance, F-score and earth mover’s distance similarity metrics for point clouds

Theses.fr

Deep learning pour la modélisation de formes 3D

Author: Klokov Roman
Publication venue: HAL CCSD
Publication date: 03/12/2021
Field of study

Application of deep learning to geometric 3D data poses various challenges for researchers. The complex nature of geometric 3D data allows to represent it in different forms: occupancy grids, point clouds, meshes, implicit functions, etc. Each of those representations has already spawned streams of deep neural network models, capable of processing and predicting according data samples for further use in various data recognition, generation, and modification tasks.Modern deep learning models force researchers to make various design choices, associated with their architectures, learning algorithms and other specific aspects of the chosen applications. Often, these choices are made with the help of various heuristics and best practice methods discovered through numerous costly experimental evaluations. Probabilistic modeling provides an alternative to these methods that allows to formalize machine learning tasks in a meaningful manner and develop probability-based training objectives. This thesis explores combinations of deep learning based methods and probabilistic modeling in application to geometric 3D data.The first contribution explores how probabilistic modeling could be applied in the context of single-view 3D shape inference task. We propose a family of probabilistic models, Probabilistic Reconstruction Networks (PRNs),which treats the task as image conditioned generation and introduces a global latent variable, encoding shape geometry information. We explore different image conditioning options, and two different training objectives based on Monte Carlo and variational approximations of the model likelihood. Parameters of every distribution are predicted by multi-layered convolutional and fully-connected neural networks from the input images. All the options in the family of models are evaluated in the single-view 3D occupancy grid inference task on synthetic shapes and according image renderings from randomized viewpoints. We show that conditioning the latent variable prior on the input images is sufficient to achieve competitive and state-of-the-art single-view 3D shape inference performance for point cloud based and voxel based metrics, respectively. We additionally demonstrate that probabilistic objective based on variational approximation of the likelihood allows the model to obtain better results compared to Monte Carlo based approximation.The second contribution proposes a probabilistic model for 3D point cloud generation. It treats point clouds as distributions over exchangeable variables and use de Finetti’s representation theorem to define a global latent variable model with conditionally independent distributions for coordinates of each point. To model these point distributions a novel type of conditional normalizing flows is proposed, based on discrete coupling of point coordinate dimensions. These flows update the coordinates of each point sample multiple times by dividing them in two groups and inferring the updates for one group of coordinates from another group and, additionally, global latent variable sample by the means of multi-layered fully-connected neural networks with parameters shared for all the points. We also extend our Discrete Point Flow Networks (DPFNs) from generation to single-view inference task by conditioning the global latent variable prior in a manner similar to PRNs from the first contribution. Resulting generative performance demonstrates that DPFNs produce sets of samples of similar quality and diversity compared to state of the art based on continuous normalizing flows, but are approximately 30 times faster both in training and sampling. Results in autoencoding and single-view inference tasks show competitive and state-of-the-art performance for Chamfer distance, F-score and earth mover’s distance similarity metrics for point clouds.L’application des stratégies d’apprentissage profond, aux données de formes 3D pose divers défis aux chercheurs. La nature complexe de ces données 3D autorise différentes représentations, par exemples les grilles d’occupation, les nuages de points, les maillages ou les fonctions implicites. Chacune de ces représentations a vu apparaitre des familles de réseaux de neurones profonds capables de traiter et prédire en fonction d’échantillons, cela pour diverses tâches de reconnaissance, de génération et de modification de données.Les modèles d’apprentissage profond modernes obligent les chercheurs à effectuer divers choix de conception associés à leurs architectures, aux algorithmes d’apprentissage et à d’autres aspects plus spécifiques des applications choisies. Ces choix sont souvent faits sur la base d’heuristiques, ou de manière empirique au travers de nombreuses évaluations expérimentales coûteuses. La modélisation probabiliste offre une alternative à cela et permet de formaliser les tâches d’apprentissage automatique de manière rigoureuse et de développer des objectifs d’entrainement qui reposent sur les probabilités. Cette thèse explore la combinaison de l’apprentissage profond avec la modélisation probabiliste dans le cadre applicatif des données 3D de formes géométriques.La première contribution porte sur l’inférence d’une forme 3D à partir d’une seule vue et explore comment la modélisation probabiliste pourrait être appliquée dans ce contexte. Nous proposons pour cela un ensemble de modèles probabilistes, les réseaux de reconstruction probabilistes (PRN), qui traitent la tâche comme une génération conditionnée par l’image et introduisent une variable latente globale qui encode les informations de géométrie des formes. Nous expérimentons différents conditionnements par l’image et deux objectifs d’entraînement différents basés pour l’un sur la méthode de Monte Carlo et pour l’autre sur l’approximation variationnel de la vraisemblance du modèle. Les modèles PRN sont évalués avec l’inférence de grilles d’occupation 3D à partir d’une seule vue, sur des formes synthétiques observées à partir de points de vue aléatoires. Nous montrons que le conditionnement, par l’image observée, de la distribution a priori de la variable latente est suffisant pour obtenir des performances compétitives pour les métriques basées sur les nuages de points et état de l’art pour les métriques basées sur les voxels. Nous démontrons en outre que l’objectif probabiliste basé sur l’approximation variationnelle de la vraisemblance permet au modèle d’obtenir de meilleurs résultats que l’approximation basée sur Monte Carlo.La deuxième contribution est un modèle probabiliste pour la génération de nuages de points 3D. Ces nuages de points sont vus comme des distributions sur des variables échangeables et utilise le théorème de Finetti pour définir un modèle global de variables latentes avec des distributions conditionnellement indépendantes pour les coordonnées de chaque point. Pour modéliser ces distributions ponctuelles, un nouveau type de flux de normalisation conditionnelle est proposé, basé sur un couplage discret des dimensions des coordonnées ponctuelles. Nous étendons également nos réseaux de flux ponctuels discrets (DPFN) de la génération à la tâche d’inférence à vue unique en conditionnant la variable latente globale a priori d’une manière similaire aux PRN de la première contribution. Les performances génératives résultantes démontrent que les DPFN produisent des échantillons de qualité et de diversité similaires à l’état de l’art basé sur des flux de normalisation continus, mais sont environ 30 fois plus rapides que ces derniers, à la fois dans la formation et l’échantillonnage. Les résultats des tâches d’encodage automatique et d’inférence à vue unique montrent des performances compétitives et état de l’art avec les métriques de distance de chanfrein, de F-score et de distance de Wasserstein pour les nuages de points

Thèses en Ligne

Hal - Université Grenoble Alpes

INRIA a CCSD electronic archive server

Self-Supervised Dual Contouring

Author: Klokov Roman
Ovsjanikov Maks
Sundararaman Ramana
Publication venue: HAL CCSD
Publication date: 17/06/2024
Field of study

International audienceLearning-based isosurface extraction methods have recently emerged as a robust and efficient alternative to axiomatic techniques. However, the vast majority of such approaches rely on supervised training with axiomatically computed ground truths, thus potentially inheriting biases and data artifacts of the corresponding axiomatic methods. Steering away from such dependencies, we propose a self-supervised training scheme for the Neural Dual Contouring meshing framework, resulting in our method: Self-Supervised Dual Contouring (SDC). Instead of optimizing predicted mesh vertices with supervised training, we use two novel self-supervised loss functions that encourage the consistency between distances to the generated mesh up to the first order. Meshes reconstructed by SDC surpass existing data driven methods in capturing intricate details while being more robust to possible irregularities in the input. Furthermore, we use the same self-supervised training objective linking inferred mesh and input SDF, to regularize the training process of Deep Implicit Networks (DINs). We demonstrate that the resulting DINs produce higher-quality implicit functions, ultimately leading to more accurate and detail-preserving surfaces compared to prior baselines for different input modalities. Finally, we demonstrate that our self-supervised losses improve meshing performance in the single-view reconstruction task by enabling joint training of predicted SDF and resulting output mesh. We open-source our code at https://github.com/Sentient07/SD

HAL-Polytechnique

VoroMesh: Learning Watertight Surface Meshes with Voronoi Diagrams

Author: Alliez Pierre
Desbrun Mathieu
Klokov Roman
Maruani Nissim
Ovsjanikov Maks
Publication venue: HAL CCSD
Publication date: 02/10/2023
Field of study

International audienceIn stark contrast to the case of images, finding a concise, learnable discrete representation of 3D surfaces remains a challenge. In particular, while polygon meshes are arguably the most common surface representation used in geometry processing, their irregular and combinatorial structure often make them unsuitable for learning-based applications. In this work, we present VoroMesh, a novel and differentiable Voronoi-based representation of watertight 3D shape surfaces. From a set of 3D points (called generators) and their associated occupancy, we define our boundary representation through the Voronoi diagram of the generators as the subset of Voronoi faces whose two associated (equidistant) generators are of opposite occupancy: the resulting polygon mesh forms a watertight approximation of the target shape's boundary. To learn the position of the generators, we propose a novel loss function, dubbed VoroLoss, that minimizes the distance from ground truth surface samples to the closest faces of the Voronoi diagram which does not require an explicit construction of the entire Voronoi diagram. A direct optimization of the Voroloss to obtain generators on the Thingi32 dataset demonstrates the geometric efficiency of our representation compared to axiomatic meshing algorithms and recent learning-based mesh representations. We further use VoroMesh in a learning-based mesh prediction task from input SDF grids on the ABC dataset, and show comparable performance to state-of-the-art methods while guaranteeing closed output surfaces free of self-intersections

HAL-Polytechnique

VoroMesh: Learning Watertight Surface Meshes with Voronoi Diagrams

Author: Alliez Pierre
Desbrun Mathieu
Klokov Roman
Maruani Nissim
Ovsjanikov Maks
Publication venue: HAL CCSD
Publication date: 02/10/2023
Field of study

INRIA a CCSD electronic archive server

Controllable Laser Reduction of Graphene Oxide Films for Photoelectronic Applications

Author: Andrey Klokov (3211497)
Andrey Sharkov (3211512)
Konstantin Maslakov (3211494)
Nikolay Suetin (3211500)
Pavel Dyakonov (3211509)
Peter Timashev (695030)
Roman Khmelnitsky (3211518)
Sarkis Dagesyan (3211491)
Sergey Svyakhovskiy (3211515)
Stanislav Evlashin (3211503)
Svetlana Minaeva (3211506)
Publication venue
Publication date
Field of study

This article presents a new simple method of creating light-absorbing carbon material for optical devices such as bolometers. A simple method of laser microstructuring of graphene oxide is used in order to create such material. The absorption values of more than 98% in the visible and more than 90% in the infrared range are achieved. Moreover thermal properties of the films, such as temperature dependence and the thermal response of the samples, are studied. The change in resistance with temperature is 13 Ohm K–1, temperature coefficient of resistance (TCR) is 0.3% K–1, and the sensitivity is 0.17 V W–1 at 300 K. Thermal conductivity is rather high at ∼104 W m–1 K–1 at 300 K. The designed bolometer operates at room temperature using incandescent lamp as a light source. This technique suggests a new inexpensive way to create a selective absorption coating and/or active layer for optical devices. Developed GO and rGO films have a large surface area and high conductivity. These properties make carbon coatings a perfect candidate for creating a new type of optoelectronic devices (gas sensors, detectors of biological objects, etc.)

FigShare

MMFN

Author: Han Z.
Johnson Rie
Kazhdan Michael M.
Klokov Roman
Liu H.
Maaten Laurens Van Der
Maturana Daniel
Peng J.
Qi Charles R.
Qi Charles Ruizhongtai
Qi Charles Ruizhongtai
Savva Manolis
Sfikas Konstantinos
Sfikas Konstantinos
Socher Richard
Su Hang
Wu Zhirong
Wu Zhirong
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date
Field of study

Crossref