95 research outputs found

    Machine Vision - Applications and Systems

    Get PDF
    none3http://www.intechopen.com/books/machine-vision-applications-and-systemsF. Solari; M. Chessa; S.P. SabatiniSolari, Fabio; Chessa, Manuela; Sabatini, SILVIO PAOL

    Robotic Manipulation under Transparency and Translucency from Light-field Sensing

    Full text link
    From frosted windows to plastic containers to refractive fluids, transparency and translucency are prevalent in human environments. The material properties of translucent objects challenge many of our assumptions in robotic perception. For example, the most common RGB-D sensors require the sensing of an infrared structured pattern from a Lambertian reflectance of surfaces. As such, transparent and translucent objects often remain invisible to robot perception. Thus, introducing methods that would enable robots to correctly perceive and then interact with the environment would be highly beneficial. Light-field (or plenoptic) cameras, for instance, which carry light direction and intensity, make it possible to perceive visual clues on transparent and translucent objects. In this dissertation, we explore the inference of transparent and translucent objects from plenoptic observations for robotic perception and manipulation. We propose a novel plenoptic descriptor, Depth Likelihood Volume (DLV), that incorporates plenoptic observations to represent depth of a pixel as a distribution rather than a single value. Building on the DLV, we present the Plenoptic Monte Carlo Localization algorithm, PMCL, as a generative method to infer 6-DoF poses of objects in settings with translucency. PMCL is able to localize both isolated transparent objects and opaque objects behind translucent objects using a DLV computed from a single view plenoptic observation. The uncertainty induced by transparency and translucency for pose estimation increases greatly as scenes become more cluttered. Under this scenario, we propose GlassLoc to localize feasible grasp poses directly from local DLV features. In GlassLoc, a convolutional neural network is introduced to learn DLV features for classifying grasp poses with grasping confidence. GlassLoc also suppresses the reflectance over multi-view plenoptic observations, which leads to more stable DLV representation. We evaluate GlassLoc in the context of a pick-and-place task for transparent tableware in a cluttered tabletop environment. We further observe that the transparent and translucent objects will generate distinguishable features in the light-field epipolar image plane. With this insight, we propose Light-field Inference of Transparency, LIT, as a two-stage generative-discriminative refractive object localization approach. In the discriminative stage, LIT uses convolutional neural networks to learn reflection and distortion features from photorealistic-rendered light-field images. The learned features guide generative object location inference through local depth estimation and particle optimization. We compare LIT with four state-of-the-art pose estimators to show our efficacy in the transparent object localization task. We perform a robot demonstration by building a champagne tower using the LIT pipeline.PHDRoboticsUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/169707/1/zhezhou_1.pd

    Challenges for Monocular 6D Object Pose Estimation in Robotics

    Full text link
    Object pose estimation is a core perception task that enables, for example, object grasping and scene understanding. The widely available, inexpensive and high-resolution RGB sensors and CNNs that allow for fast inference based on this modality make monocular approaches especially well suited for robotics applications. We observe that previous surveys on object pose estimation establish the state of the art for varying modalities, single- and multi-view settings, and datasets and metrics that consider a multitude of applications. We argue, however, that those works' broad scope hinders the identification of open challenges that are specific to monocular approaches and the derivation of promising future challenges for their application in robotics. By providing a unified view on recent publications from both robotics and computer vision, we find that occlusion handling, novel pose representations, and formalizing and improving category-level pose estimation are still fundamental challenges that are highly relevant for robotics. Moreover, to further improve robotic performance, large object sets, novel objects, refractive materials, and uncertainty estimates are central, largely unsolved open challenges. In order to address them, ontological reasoning, deformability handling, scene-level reasoning, realistic datasets, and the ecological footprint of algorithms need to be improved.Comment: arXiv admin note: substantial text overlap with arXiv:2302.1182

    Data-driven approaches for interactive appearance editing

    Get PDF
    This thesis proposes several techniques for interactive editing of digital content and fast rendering of virtual 3D scenes. Editing of digital content - such as images or 3D scenes - is difficult, requires artistic talent and technical expertise. To alleviate these difficulties, we exploit data-driven approaches that use the easily accessible Internet data (e. g., images, videos, materials) to develop new tools for digital content manipulation. Our proposed techniques allow casual users to achieve high-quality editing by interactively exploring the manipulations without the need to understand the underlying physical models of appearance. First, the thesis presents a fast algorithm for realistic image synthesis of virtual 3D scenes. This serves as the core framework for a new method that allows artists to fine tune the appearance of a rendered 3D scene. Here, artists directly paint the final appearance and the system automatically solves for the material parameters that best match the desired look. Along this line, an example-based material assignment approach is proposed, where the 3D models of a virtual scene can be "materialized" simply by giving a guidance source (image/video). Next, the thesis proposes shape and color subspaces of an object that are learned from a collection of exemplar images. These subspaces can be used to constrain image manipulations to valid shapes and colors, or provide suggestions for manipulations. Finally, data-driven color manifolds which contain colors of a specific context are proposed. Such color manifolds can be used to improve color picking performance, color stylization, compression or white balancing.Diese Dissertation stellt Techniken zum interaktiven Editieren von digitalen Inhalten und zum schnellen Rendering von virtuellen 3D Szenen vor. Digitales Editieren - seien es Bilder oder dreidimensionale Szenen - ist kompliziert, benötigt künstlerisches Talent und technische Expertise. Um diese Schwierigkeiten zu relativieren, nutzen wir datengesteuerte Ansätze, die einfach zugängliche Internetdaten, wie Bilder, Videos und Materialeigenschaften, nutzen um neue Werkzeuge zur Manipulation von digitalen Inhalten zu entwickeln. Die von uns vorgestellten Techniken erlauben Gelegenheitsnutzern das Editieren in hoher Qualität, indem Manipulationsmöglichkeiten interaktiv exploriert werden können ohne die zugrundeliegenden physikalischen Modelle der Bildentstehung verstehen zu müssen. Zunächst stellen wir einen effizienten Algorithmus zur realistischen Bildsynthese von virtuellen 3D Szenen vor. Dieser dient als Kerngerüst einer Methode, die Nutzern die Feinabstimmung des finalen Aussehens einer gerenderten dreidimensionalen Szene erlaubt. Hierbei malt der Künstler direkt das beabsichtigte Aussehen und das System errechnet automatisch die zugrundeliegenden Materialeigenschaften, die den beabsichtigten Eigenschaften am nahesten kommen. Zu diesem Zweck wird ein auf Beispielen basierender Materialzuordnungsansatz vorgestellt, für den das 3D Model einer virtuellen Szene durch das simple Anführen einer Leitquelle (Bild, Video) in Materialien aufgeteilt werden kann. Als Nächstes schlagen wir Form- und Farbunterräume von Objektklassen vor, die aus einer Sammlung von Beispielbildern gelernt werden. Diese Unterräume können genutzt werden um Bildmanipulationen auf valide Formen und Farben einzuschränken oder Manipulationsvorschläge zu liefern. Schließlich werden datenbasierte Farbmannigfaltigkeiten vorgestellt, die Farben eines spezifischen Kontexts enthalten. Diese Mannigfaltigkeiten ermöglichen eine Leistungssteigerung bei Farbauswahl, Farbstilisierung, Komprimierung und Weißabgleich

    Inferring surface shape from specular reflections

    Get PDF

    A Low-Dimensional Perceptual Space for Intuitive BRDF Editing

    Get PDF
    International audienceUnderstanding and characterizing material appearance based on human perception is challenging because of the highdimensionality and nonlinearity of reflectance data. We refer to the process of identifying specific characteristics of material appearance within the same category as material estimation, in contrast to material categorization which focuses on identifying inter-category differences [FNG15]. In this paper, we present a method to simulate the material estimation process based on human perception. We create a continuous perceptual space for measured tabulated data based on its underlying low-dimensional manifold. Unlike many previous works that only address individual perceptual attributes (such as gloss), we focus on extracting all possible dimensions that can explain the perceived differences between appearances. Additionally, we propose a new material editing interface that combines image navigation and sliders to visualize each perceptual dimension and facilitate the editing of tabulated BRDFs. We conduct a user study to evaluate the efficacy of the perceptual space and the interface in terms of appearance matching

    Using AI and Robotics for EV battery cable detection.: Development and implementation of end-to-end model-free 3D instance segmentation for industrial purposes

    Get PDF
    Master's thesis in Information- and communication technology (IKT590)This thesis describes a novel method for capturing point clouds and segmenting instances of cabling found on electric vehicle battery packs. The use of cutting-edge perception algorithm architectures, such as graph-based and voxel-based convolution, in industrial autonomous lithium-ion battery pack disassembly is being investigated. The thesis focuses on the challenge of getting a desirable representation of any battery pack using an ABB robot in conjunction with a high-end structured light camera, with "end-to-end" and "model-free" as design constraints. The thesis employs self-captured datasets comprised of several battery packs that have been captured and labeled. Following that, the datasets are used to create a perception system. This thesis recommends using HDR functionality in an industrial application to capture the full dynamic range of the battery packs. To adequately depict 3D features, a three-point-of-view capture sequence is deemed necessary. A general capture process for an entire battery pack is also presented, but a next-best-scan algorithm is likely required to ensure a "close to complete" representation. Graph-based deep-learning algorithms have been shown to be capable of being scaled up to50,000inputs while still exhibiting strong performance in terms of accuracy and processing time. The results show that an instance segmenting system can be implemented in less than two seconds. Using off-the-shelf hardware, demonstrate that a 3D perception system is industrially viable and competitive with a 2D perception system

    2D Photo Converter: Modeling 3D Objects from 2D Photos Using OpenGL

    Get PDF
    The concept of modeling a 3D object based on 2D photos has indeed been widely discussed and researched among the computer vision professionals and virtual reality technologists. However, regardless of the many researches going on and the rapid technological development in 3D modeling world, the best method to render a 3D model that satisfies the requirement of minimum image pre-processing, maximum model-realism and minimum error percentage is yet to be studied. This report will lay out another technique of modeling 3D objects using a 2D photo by analyzing the possibility and accuracy of light intensity evaluation towards the model. The main objective of this study is to propose an alternative solution to 3D modeling techniques by using the information from a 2D photo. It is hoped that by applying the proposed solution, the constraint of costs and time in the current 3D modeling system could be reduced. The research focuses on bitmap photos and it applies the principles of light intensity and distance relativity in estimating the depth volume of the model. The application is built by using Microsoft Visual C++ 6.0 and utilizes OpenGL Application Programming Interface (API) in its code. However, the results of the experiments conducted in this research study shows that the formula used in the application might not be the best method to produce a 3D model from a 2D photo. Nonetheless, the idea of using light intensity valuations in producing 3D models could be the new solution in 3D modeling technology. The framework design and the ideas could be the base research for further development in the 3D modeling research and analysis study
    • …
    corecore