Search CORE

292 research outputs found

Green Function and Electromagnetic Potential for Computer Vision and Convolutional Neural Network Applications

Author: Beaini Dominique
Publication venue
Publication date: 01/06/2019
Field of study

RÉSUMÉ Pour les problèmes de vision machine (CV) avancées, tels que la classification, la segmentation de scènes et la détection d’objets salients, il est nécessaire d’extraire le plus de caractéristiques possibles des images. Un des outils les plus utilisés pour l’extraction de caractéristiques est l’utilisation d’un noyau de convolution, où chacun des noyaux est spécialisé pour l’extraction d’une caractéristique donnée. Ceci a mené au développement récent des réseaux de neurones convolutionnels (CNN) qui permet d’optimiser des milliers de noyaux à la fois, faisant du CNN la norme pour l’analyse d’images. Toutefois, une limitation importante du CNN est que les noyaux sont petits (généralement de taille 3x3 à 7x7), ce qui limite l’interaction longue-distance des caractéristiques. Une autre limitation est que la fusion des caractéristiques se fait par des additions pondérées et des opérations de mise en commun (moyennes et maximums locaux). En effet, ces opérations ne permettent pas de fusionner des caractéristiques du domaine spatial avec des caractéristiques puisque ces caractéristiques occupent des positions éloignées sur l’image. L’objectif de cette thèse est de développer des nouveaux noyaux de convolutions basés sur l’électromagnétisme (EM) et les fonctions de Green (GF) pour être utilisés dans des applications de vision machine (CV) et dans des réseaux de neurones convolutionnels (CNN). Ces nouveaux noyaux sont au moins aussi grands que l’image. Ils évitent donc plusieurs des limitations des CNN standards puisqu’ils permettent l’interaction longue-distance entre les pixels de limages. De plus, ils permettent de fusionner les caractéristiques du domaine spatial avec les caractéristiques du domaine du gradient. Aussi, étant donné tout champ vectoriel, les nouveaux noyaux permettent de trouver le champ vectoriel conservatif le plus rapproché du champ initial, ce qui signifie que le nouveau champ devient lisse, irrotationnel et conservatif (intégrable par intégrale curviligne). Pour répondre à cet objectif, nous avons d’abord développé des noyaux convolutionnels symétriques et asymétriques basés sur les propriétés des EM et des GF et résultant en des noyaux qui sont invariants en résolution et en rotation. Ensuite, nous avons développé la première méthode qui permet de déterminer la probabilité d’inclusion dans des contours partiels, permettant donc d’extrapoler des contours fins en des régions continues couvrant l’espace 2D. De plus, la présente thèse démontre que les noyaux basés sur les GF sont les solveurs optimaux du gradient et du Laplacien.----------ABSTRACT For advanced computer vision (CV) tasks such as classification, scene segmentation, and salient object detection, extracting features from images is mandatory. One of the most used tools for feature extraction is the convolutional kernel, with each kernel being specialized for specific feature detection. In recent years, the convolutional neural network (CNN) became the standard method of feature detection since it allowed to optimize thousands of kernels at the same time. However, a limitation of the CNN is that all the kernels are small (usually between 3x3 and 7x7), which limits the receptive field. Another limitation is that feature merging is done via weighted additions and pooling, which cannot be used to merge spatial-domain features with gradient-domain features since they are not located at the same pixel coordinate. The objective of this thesis is to develop electromagnetic (EM) convolutions and Green’s functions (GF) convolutions to be used in Computer Vision and convolutional neural networks (CNN). These new kernels do not have the limitations of the standard CNN kernels since they allow an unlimited receptive field and interaction between any pixel in the image by using kernels bigger than the image. They allow merging spatial domain features with gradient domain features by integrating any vector field. Additionally, they can transform any vector field of features into its least-error conservative field, meaning that the field of features becomes smooth, irrotational and conservative (line-integrable). At first, we developed different symmetrical and asymmetrical convolutional kernel based on EM and GF that are both resolution and rotation invariant. Then we developed the first method of determining the probability of being inside partial edges, which allow extrapolating thin edge features into the full 2D space. Furthermore, the current thesis proves that GF kernels are the least-error gradient and Laplacian solvers, and they are empirically demonstrated to be faster than the fastest competing method and easier to implement. Consequently, using the fast gradient solver, we developed the first method that directly combines edges with saliency maps in the gradient domain, then solves the gradient to go back to the saliency domain. The improvement of the saliency maps over the F-measure is on average 6.6 times better than the nearest competing algorithm on a selected dataset. Then, to improve the saliency maps further, we developed the DSS-GIS model which combines edges with salient regions deep inside the network

PolyPublie

Deep Bilateral Learning for Real-Time Image Enhancement

Author: Frédo Durand
Jain Vidit
Jiawen Chen
Jonathan T. Barron
Kingma Diederik
Michaël Gharbi
Samuel W. Hasinoff
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 20/07/2017
Field of study

Performance is a critical challenge in mobile image processing. Given a reference imaging pipeline, or even human-adjusted pairs of images, we seek to reproduce the enhancements and enable real-time evaluation. For this, we introduce a new neural network architecture inspired by bilateral grid processing and local affine color transforms. Using pairs of input/output images, we train a convolutional neural network to predict the coefficients of a locally-affine model in bilateral space. Our architecture learns to make local, global, and content-dependent decisions to approximate the desired image transformation. At runtime, the neural network consumes a low-resolution version of the input image, produces a set of affine transformations in bilateral space, upsamples those transformations in an edge-preserving fashion using a new slicing node, and then applies those upsampled transformations to the full-resolution image. Our algorithm processes high-resolution images on a smartphone in milliseconds, provides a real-time viewfinder at 1080p resolution, and matches the quality of state-of-the-art approximation techniques on a large class of image operators. Unlike previous work, our model is trained off-line from data and therefore does not require access to the original operator at runtime. This allows our model to learn complex, scene-dependent transformations for which no reference implementation is available, such as the photographic edits of a human retoucher.Comment: 12 pages, 14 figures, Siggraph 201

arXiv.org e-Print Archive

Crossref

INRIA a CCSD electronic archive server

A Revisit of Shape Editing Techniques: from the Geometric to the Neural Viewpoint

Author: Gao Lin
Lai Yu-Kun
Liu Ligang
Wu Tong
Yuan Yu-Jie
Publication venue
Publication date: 02/03/2021
Field of study

3D shape editing is widely used in a range of applications such as movie production, computer games and computer aided design. It is also a popular research topic in computer graphics and computer vision. In past decades, researchers have developed a series of editing methods to make the editing process faster, more robust, and more reliable. Traditionally, the deformed shape is determined by the optimal transformation and weights for an energy term. With increasing availability of 3D shapes on the Internet, data-driven methods were proposed to improve the editing results. More recently as the deep neural networks became popular, many deep learning based editing methods have been developed in this field, which is naturally data-driven. We mainly survey recent research works from the geometric viewpoint to those emerging neural deformation techniques and categorize them into organic shape editing methods and man-made model editing methods. Both traditional methods and recent neural network based methods are reviewed

arXiv.org e-Print Archive

Online Research @ Cardiff

Iterative Solvers for Physics-based Simulations and Displays

Author: Mercier Olivier
Publication venue
Publication date: 01/02/2018
Field of study

La génération d’images et de simulations réalistes requiert des modèles complexes pour capturer tous les détails d’un phénomène physique. Les équations mathématiques qui composent ces modèles sont compliquées et ne peuvent pas être résolues analytiquement. Des procédures numériques doivent donc être employées pour obtenir des solutions approximatives à ces modèles. Ces procédures sont souvent des algorithmes itératifs, qui calculent une suite convergente vers la solution désirée à partir d’un essai initial. Ces méthodes sont une façon pratique et efficace de calculer des solutions à des systèmes complexes, et sont au coeur de la plupart des méthodes de simulation modernes. Dans cette thèse par article, nous présentons trois projets où les algorithmes itératifs jouent un rôle majeur dans une méthode de simulation ou de rendu. Premièrement, nous présentons une méthode pour améliorer la qualité visuelle de simulations fluides. En créant une surface de haute résolution autour d’une simulation existante, stabilisée par une méthode itérative, nous ajoutons des détails additionels à la simulation. Deuxièmement, nous décrivons une méthode de simulation fluide basée sur la réduction de modèle. En construisant une nouvelle base de champ de vecteurs pour représenter la vélocité d’un fluide, nous obtenons une méthode spécifiquement adaptée pour améliorer les composantes itératives de la simulation. Finalement, nous présentons un algorithme pour générer des images de haute qualité sur des écrans multicouches dans un contexte de réalité virtuelle. Présenter des images sur plusieurs couches demande des calculs additionels à coût élevé, mais nous formulons le problème de décomposition des images afin de le résoudre efficacement avec une méthode itérative simple.Realistic computer-generated images and simulations require complex models to properly capture the many subtle behaviors of each physical phenomenon. The mathematical equations underlying these models are complicated, and cannot be solved analytically. Numerical procedures must thus be used to obtain approximate solutions. These procedures are often iterative algorithms, where an initial guess is progressively improved to converge to a desired solution. Iterative methods are a convenient and efficient way to compute solutions to complex systems, and are at the core of most modern simulation methods. In this thesis by publication, we present three papers where iterative algorithms play a major role in a simulation or rendering method. First, we propose a method to improve the visual quality of fluid simulations. By creating a high-resolution surface representation around an input fluid simulation, stabilized with iterative methods, we introduce additional details atop of the simulation. Second, we describe a method to compute fluid simulations using model reduction. We design a novel vector field basis to represent fluid velocity, creating a method specifically tailored to improve all iterative components of the simulation. Finally, we present an algorithm to compute high-quality images for multifocal displays in a virtual reality context. Displaying images on multiple display layers incurs significant additional costs, but we formulate the image decomposition problem so as to allow an efficient solution using a simple iterative algorithm

Dépôt Institutionnel Numérique