306 research outputs found

    Simultaneous Image Registration and Monocular Volumetric Reconstruction of a fluid flow

    Get PDF
    We propose to combine image registration and volumetric reconstruction from a monocular video of a draining off Hele-Shaw cell filled with water. A Hele-Shaw cell is a tank whose depth is small (e.g. 1 mm) compared to the other dimensions (e.g. 400 800 mm2). We use a technique known as molecular tagging which consists in marking by photobleaching a pattern in the fluid and then tracking its deformations. The evolution of the pattern is filmed with a camera whose principal axis coincides with the depth of the cell. The velocity of the fluid along this direction is not constant. Consequently,tracking the pattern cannot be achieved with classical methods because what is observed is the integration of the marked particles over the entire depth of the cell. The proposed approach is built on top of classical direct image registration in which we incorporate a volumetric image formation model. It allows us to accurately measure the motion and the velocity profiles for the entire volume (including the depth of the cell) which is something usually hard to achieve. The results we obtain are consistent with the theoretical hydrodynamic behaviour for this flow which is known as the laminar Poiseuille flow

    Image registration algorithm for molecular tagging velocimetry applied to unsteady flow in Hele-Shaw cell

    Get PDF
    In order to develop velocimetry methods for confined geometries, we propose to combine image registration and volumetric reconstruction from a monocular video of the draining of a Hele-Shaw cell filled with water. The cell’s thickness is small compared to the other two dimensions (e.g. 1x400 x 800 mm3). We use a technique known as molecular tagging which consists in marking by photobleaching a pattern in the fluid and then tracking its deformations. The evolution of the pattern is filmed with a camera whose principal axis coincides with the cell’s gap. The velocity of the fluid along this direction is not constant. Consequently, tracking the pattern cannot be achieved with classical methods because what is observed is the integral of the marked molecules over the entire cell’s gap. The proposed approach is built on top of direct image registration that we extend to specifically model the volumetric image formation. It allows us to accurately measure the motion and the velocity profiles for the entire volume (including the cell’s gap) which is something usually hard to achieve. The results we obtained are consistent with the theoretical hydrodynamic behaviour for this flow which is known as the Poiseuille flow

    Neural radiance fields in the industrial and robotics domain: applications, research opportunities and use cases

    Full text link
    The proliferation of technologies, such as extended reality (XR), has increased the demand for high-quality three-dimensional (3D) graphical representations. Industrial 3D applications encompass computer-aided design (CAD), finite element analysis (FEA), scanning, and robotics. However, current methods employed for industrial 3D representations suffer from high implementation costs and reliance on manual human input for accurate 3D modeling. To address these challenges, neural radiance fields (NeRFs) have emerged as a promising approach for learning 3D scene representations based on provided training 2D images. Despite a growing interest in NeRFs, their potential applications in various industrial subdomains are still unexplored. In this paper, we deliver a comprehensive examination of NeRF industrial applications while also providing direction for future research endeavors. We also present a series of proof-of-concept experiments that demonstrate the potential of NeRFs in the industrial domain. These experiments include NeRF-based video compression techniques and using NeRFs for 3D motion estimation in the context of collision avoidance. In the video compression experiment, our results show compression savings up to 48\% and 74\% for resolutions of 1920x1080 and 300x168, respectively. The motion estimation experiment used a 3D animation of a robotic arm to train Dynamic-NeRF (D-NeRF) and achieved an average peak signal-to-noise ratio (PSNR) of disparity map with the value of 23 dB and an structural similarity index measure (SSIM) 0.97

    Learning to Interpret Fluid Type Phenomena via Images

    Get PDF
    Learning to interpret fluid-type phenomena via images is a long-standing challenging problem in computer vision. The problem becomes even more challenging when the fluid medium is highly dynamic and refractive due to its transparent nature. Here, we consider imaging through such refractive fluid media like water and air. For water, we design novel supervised learning-based algorithms to recover its 3D surface as well as the highly distorted underground patterns. For air, we design a state-of-the-art unsupervised learning algorithm to predict the distortion-free image given a short sequence of turbulent images. Specifically, we design a deep neural network that estimates the depth and normal maps of a fluid surface by analyzing the refractive distortion of a reference background pattern. Regarding the recovery of severely downgraded underwater images due to the refractive distortions caused by water surface fluctuations, we present the distortion-guided network (DG-Net) for restoring distortion-free underwater images. The key idea is to use a distortion map to guide network training. The distortion map models the pixel displacement caused by water refraction. Furthermore, we present a novel unsupervised network to recover the latent distortion-free image. The key idea is to model non-rigid distortions as deformable grids. Our network consists of a grid deformer that estimates the distortion field and an image generator that outputs the distortion-free image. By leveraging the positional encoding operator, we can simplify the network structure while maintaining fine spatial details in the recovered images. We also develop a combinational deep neural network that can simultaneously perform recovery of the latent distortion-free image as well as 3D reconstruction of the transparent and dynamic fluid surface. Through extensive experiments on simulated and real captured fluid images, we demonstrate that our proposed deep neural networks outperform the current state-of-the-art on solving specific tasks

    Differentiable world programs

    Full text link
    L'intelligence artificielle (IA) moderne a ouvert de nouvelles perspectives prometteuses pour la création de robots intelligents. En particulier, les architectures d'apprentissage basées sur le gradient (réseaux neuronaux profonds) ont considérablement amélioré la compréhension des scènes 3D en termes de perception, de raisonnement et d'action. Cependant, ces progrès ont affaibli l'attrait de nombreuses techniques ``classiques'' développées au cours des dernières décennies. Nous postulons qu'un mélange de méthodes ``classiques'' et ``apprises'' est la voie la plus prometteuse pour développer des modèles du monde flexibles, interprétables et exploitables : une nécessité pour les agents intelligents incorporés. La question centrale de cette thèse est : ``Quelle est la manière idéale de combiner les techniques classiques avec des architectures d'apprentissage basées sur le gradient pour une compréhension riche du monde 3D ?''. Cette vision ouvre la voie à une multitude d'applications qui ont un impact fondamental sur la façon dont les agents physiques perçoivent et interagissent avec leur environnement. Cette thèse, appelée ``programmes différentiables pour modèler l'environnement'', unifie les efforts de plusieurs domaines étroitement liés mais actuellement disjoints, notamment la robotique, la vision par ordinateur, l'infographie et l'IA. Ma première contribution---gradSLAM--- est un système de localisation et de cartographie simultanées (SLAM) dense et entièrement différentiable. En permettant le calcul du gradient à travers des composants autrement non différentiables tels que l'optimisation non linéaire par moindres carrés, le raycasting, l'odométrie visuelle et la cartographie dense, gradSLAM ouvre de nouvelles voies pour intégrer la reconstruction 3D classique et l'apprentissage profond. Ma deuxième contribution - taskography - propose une sparsification conditionnée par la tâche de grandes scènes 3D encodées sous forme de graphes de scènes 3D. Cela permet aux planificateurs classiques d'égaler (et de surpasser) les planificateurs de pointe basés sur l'apprentissage en concentrant le calcul sur les attributs de la scène pertinents pour la tâche. Ma troisième et dernière contribution---gradSim--- est un simulateur entièrement différentiable qui combine des moteurs physiques et graphiques différentiables pour permettre l'estimation des paramètres physiques et le contrôle visuomoteur, uniquement à partir de vidéos ou d'une image fixe.Modern artificial intelligence (AI) has created exciting new opportunities for building intelligent robots. In particular, gradient-based learning architectures (deep neural networks) have tremendously improved 3D scene understanding in terms of perception, reasoning, and action. However, these advancements have undermined many ``classical'' techniques developed over the last few decades. We postulate that a blend of ``classical'' and ``learned'' methods is the most promising path to developing flexible, interpretable, and actionable models of the world: a necessity for intelligent embodied agents. ``What is the ideal way to combine classical techniques with gradient-based learning architectures for a rich understanding of the 3D world?'' is the central question in this dissertation. This understanding enables a multitude of applications that fundamentally impact how embodied agents perceive and interact with their environment. This dissertation, dubbed ``differentiable world programs'', unifies efforts from multiple closely-related but currently-disjoint fields including robotics, computer vision, computer graphics, and AI. Our first contribution---gradSLAM---is a fully differentiable dense simultaneous localization and mapping (SLAM) system. By enabling gradient computation through otherwise non-differentiable components such as nonlinear least squares optimization, ray casting, visual odometry, and dense mapping, gradSLAM opens up new avenues for integrating classical 3D reconstruction and deep learning. Our second contribution---taskography---proposes a task-conditioned sparsification of large 3D scenes encoded as 3D scene graphs. This enables classical planners to match (and surpass) state-of-the-art learning-based planners by focusing computation on task-relevant scene attributes. Our third and final contribution---gradSim---is a fully differentiable simulator that composes differentiable physics and graphics engines to enable physical parameter estimation and visuomotor control, solely from videos or a still image

    Vélocimétrie 3D par marquage moléculaire et recalage d’image pour le passage d’une bulle isolée en cellule de Hele-Shaw

    Get PDF
    Cet article décrit l’application de techniques de vision par ordinateur à la mesure de vitesse pour deux écoulements générés en Cellule de Hele-Shaw (CH). Un Écoulement Laminaire de Poiseuille (ELP) est généré par la vidange de la CH remplie d’un liquide préalablement au repos. La figure 1 donne un aperçu de la configuration expérimentale. Nous proposons un nouvel algorithme combinant le Recalage d’Image Direct (RID) et une reconstruction volumique en vision monoculaire permettant de suivre le mouvement d’un motif marquant le liquide au niveau moléculaire. La méthode nous permet d’obtenir une mesure expérimentale de l’ELP dans des géométries à accès optique limité. Précédemment, des mesures de vitesse de cet écoulement académique ont principalement été obtenues pour des régions d’intérêt restreintes (1 mm3 par µPIV (Sinton [2004])) ou sans prise en compte directe du mouvement dans la profondeur de la CH (par PIV classique (Roudet et al. [2011])). Par comparaison, notre approche nous permet de mesurer le développement de l’ELP pour un volume conséquent en milieu confiné (ici 147×147×1 mm3). Dans un deuxième temps, nous nous intéresserons à la mesure de vitesse en amont d’une bulle en ascension dans une CH à partir d’images générées numériquement. Les phénomènes observés sont de nature déformable et nous cherchons à faire une mesure tridimensionnelle à partir d’observations 2D uniquement. Nous proposons donc une méthode reposant sur deux éléments : d’une part, une modélisation 3D du liquide et de son mouvement en CH et, d’autre part, des contraintes physiques générales et souples. Garbe et al. [2008] proposent une variante d’estimation classique par flot optique en intégrant un modèle volumique pour un écoulement gazeux en micro-canal. Cependant, dans Garbe et al. [2008], le modèle d’ELP est utilisé comme un « a priori » très fort. L’article est organisé de la manière suivante. Nous décrivons le dispositif expérimental et le marquage moléculaire par photobleaching en §2. Notre algorithme de suivi et de reconstruction est détaillé en §3. Les résultats sont présentés en §4. En §5, nous concluons et discutons des applications potentielles de notre travail

    Rekonstruktion und skalierbare Detektion und Verfolgung von 3D Objekten

    Get PDF
    The task of detecting objects in images is essential for autonomous systems to categorize, comprehend and eventually navigate or manipulate its environment. Since many applications demand not only detection of objects but also the estimation of their exact poses, 3D CAD models can prove helpful since they provide means for feature extraction and hypothesis refinement. This work, therefore, explores two paths: firstly, we will look into methods to create richly-textured and geometrically accurate models of real-life objects. Using these reconstructions as a basis, we will investigate on how to improve in the domain of 3D object detection and pose estimation, focusing especially on scalability, i.e. the problem of dealing with multiple objects simultaneously.Objekterkennung in Bildern ist für ein autonomes System von entscheidender Bedeutung, um seine Umgebung zu kategorisieren, zu erfassen und schließlich zu navigieren oder zu manipulieren. Da viele Anwendungen nicht nur die Erkennung von Objekten, sondern auch die Schätzung ihrer exakten Positionen erfordern, können sich 3D-CAD-Modelle als hilfreich erweisen, da sie Mittel zur Merkmalsextraktion und Verfeinerung von Hypothesen bereitstellen. In dieser Arbeit werden daher zwei Wege untersucht: Erstens werden wir Methoden untersuchen, um strukturreiche und geometrisch genaue Modelle realer Objekte zu erstellen. Auf der Grundlage dieser Konstruktionen werden wir untersuchen, wie sich der Bereich der 3D-Objekterkennung und der Posenschätzung verbessern lässt, wobei insbesondere die Skalierbarkeit im Vordergrund steht, d.h. das Problem der gleichzeitigen Bearbeitung mehrerer Objekte

    Single View 3D Reconstruction using Deep Learning

    Get PDF
    One of the major challenges in the field of Computer Vision has been the reconstruction of a 3D object or scene from a single 2D image. While there are many notable examples, traditional methods for single view reconstruction often fail to generalise due to the presence of many brittle hand-crafted engineering solutions, limiting their applicability to real world problems. Recently, deep learning has taken over the field of Computer Vision and ”learning to reconstruct” has become the dominant technique for addressing the limitations of traditional methods when performing single view 3D reconstruction. Deep learning allows our reconstruction methods to learn generalisable image features and monocular cues that would otherwise be difficult to engineer through ad-hoc hand-crafted approaches. However, it can often be difficult to efficiently integrate the various 3D shape representations within the deep learning framework. In particular, 3D volumetric representations can be adapted to work with Convolutional Neural Networks, but they are computationally expensive and memory inefficient when using local convolutional layers. Also, the successful learning of generalisable feature representations for 3D reconstruction requires large amounts of diverse training data. In practice, this is challenging for 3D training data, as it entails a costly and time consuming manual data collection and annotation process. Researchers have attempted to address these issues by utilising self-supervised learning and generative modelling techniques, however these approaches often produce suboptimal results when compared with models trained on larger datasets. This thesis addresses several key challenges incurred when using deep learning for ”learning to reconstruct” 3D shapes from single view images. We observe that it is possible to learn a compressed representation for multiple categories of the 3D ShapeNet dataset, improving the computational and memory efficiency when working with 3D volumetric representations. To address the challenge of data acquisition, we leverage deep generative models to ”hallucinate” hidden or latent novel viewpoints for a given input image. Combining these images with depths estimated by a self-supervised depth estimator and the known camera properties, allowed us to reconstruct textured 3D point clouds without any ground truth 3D training data. Furthermore, we show that is is possible to improve upon the previous self-supervised monocular depth estimator by adding a self-attention and a discrete volumetric representation, significantly improving accuracy on the KITTI 2015 dataset and enabling the estimation of uncertainty depth predictions.Thesis (Ph.D.) -- University of Adelaide, School of Computer Science, 202
    corecore