95 research outputs found
Machine Vision - Applications and Systems
none3http://www.intechopen.com/books/machine-vision-applications-and-systemsF. Solari; M. Chessa; S.P. SabatiniSolari, Fabio; Chessa, Manuela; Sabatini, SILVIO PAOL
Robotic Manipulation under Transparency and Translucency from Light-field Sensing
From frosted windows to plastic containers to refractive fluids, transparency and translucency are prevalent in human environments. The material properties of translucent objects challenge many of our assumptions in robotic perception. For example, the most common RGB-D sensors require the sensing of an infrared structured pattern from a Lambertian reflectance of surfaces. As such, transparent and translucent objects often remain invisible to robot perception. Thus, introducing methods that would enable robots to correctly perceive and then interact with the environment would be highly beneficial. Light-field (or plenoptic) cameras, for instance, which carry light direction and intensity, make it possible to perceive visual clues on transparent and translucent objects. In this dissertation, we explore the inference of transparent and translucent objects from plenoptic observations for robotic perception and manipulation. We propose a novel plenoptic descriptor, Depth Likelihood Volume (DLV), that incorporates plenoptic observations to represent depth of a pixel as a distribution rather than a single value. Building on the DLV, we present the Plenoptic Monte Carlo Localization algorithm, PMCL, as a generative method to infer 6-DoF poses of objects in settings with translucency. PMCL is able to localize both isolated transparent objects and opaque objects behind translucent objects using a DLV computed from a single view plenoptic observation. The uncertainty induced by transparency and translucency for pose estimation increases greatly as scenes become more cluttered. Under this scenario, we propose GlassLoc to localize feasible grasp poses directly from local DLV features. In GlassLoc, a convolutional neural network is introduced to learn DLV features for classifying grasp poses with grasping confidence. GlassLoc also suppresses the reflectance over multi-view plenoptic observations, which leads to more stable DLV representation. We evaluate GlassLoc in the context of a pick-and-place task for transparent tableware in a cluttered tabletop environment. We further observe that the transparent and translucent objects will generate distinguishable features in the light-field epipolar image plane. With this insight, we propose Light-field Inference of Transparency, LIT, as a two-stage generative-discriminative refractive object localization approach. In the discriminative stage, LIT uses convolutional neural networks to learn reflection and distortion features from photorealistic-rendered light-field images. The learned features guide generative object location inference through local depth estimation and particle optimization. We compare LIT with four state-of-the-art pose estimators to show our efficacy in the transparent object localization task. We perform a robot demonstration by building a champagne tower using the LIT pipeline.PHDRoboticsUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/169707/1/zhezhou_1.pd
Challenges for Monocular 6D Object Pose Estimation in Robotics
Object pose estimation is a core perception task that enables, for example,
object grasping and scene understanding. The widely available, inexpensive and
high-resolution RGB sensors and CNNs that allow for fast inference based on
this modality make monocular approaches especially well suited for robotics
applications. We observe that previous surveys on object pose estimation
establish the state of the art for varying modalities, single- and multi-view
settings, and datasets and metrics that consider a multitude of applications.
We argue, however, that those works' broad scope hinders the identification of
open challenges that are specific to monocular approaches and the derivation of
promising future challenges for their application in robotics. By providing a
unified view on recent publications from both robotics and computer vision, we
find that occlusion handling, novel pose representations, and formalizing and
improving category-level pose estimation are still fundamental challenges that
are highly relevant for robotics. Moreover, to further improve robotic
performance, large object sets, novel objects, refractive materials, and
uncertainty estimates are central, largely unsolved open challenges. In order
to address them, ontological reasoning, deformability handling, scene-level
reasoning, realistic datasets, and the ecological footprint of algorithms need
to be improved.Comment: arXiv admin note: substantial text overlap with arXiv:2302.1182
Data-driven approaches for interactive appearance editing
This thesis proposes several techniques for interactive editing of digital content and fast rendering of virtual 3D scenes. Editing of digital content - such as images or 3D scenes - is difficult, requires artistic talent and technical expertise. To alleviate these difficulties, we exploit data-driven approaches that use the easily accessible Internet data (e. g., images, videos, materials) to develop new tools for digital content manipulation. Our proposed techniques allow casual users to achieve high-quality editing by interactively exploring the manipulations without the need to understand the underlying physical models of appearance.
First, the thesis presents a fast algorithm for realistic image synthesis of virtual 3D scenes. This serves as the core framework for a new method that allows artists to fine tune the appearance of a rendered 3D scene. Here, artists directly paint the final appearance and the system automatically solves for the material parameters that best match the desired look. Along this line, an example-based material assignment approach is proposed, where the 3D models of a virtual scene can be "materialized" simply by giving a guidance source (image/video). Next, the thesis proposes shape and color subspaces of an object that are learned from a collection of exemplar images. These subspaces can be used to constrain image manipulations to valid shapes and colors, or provide suggestions for manipulations. Finally, data-driven color manifolds which contain colors of a specific context are proposed. Such color manifolds can be used to improve color picking performance, color stylization, compression or white balancing.Diese Dissertation stellt Techniken zum interaktiven Editieren von digitalen Inhalten und zum schnellen Rendering von virtuellen 3D Szenen vor. Digitales Editieren - seien es Bilder oder dreidimensionale Szenen - ist kompliziert, benötigt künstlerisches Talent und technische Expertise. Um diese Schwierigkeiten zu relativieren, nutzen wir datengesteuerte Ansätze, die einfach zugängliche Internetdaten, wie Bilder, Videos und Materialeigenschaften, nutzen um neue Werkzeuge zur Manipulation von digitalen Inhalten zu entwickeln. Die von uns vorgestellten Techniken erlauben Gelegenheitsnutzern das Editieren in hoher Qualität, indem Manipulationsmöglichkeiten interaktiv exploriert werden können ohne die zugrundeliegenden physikalischen Modelle der Bildentstehung verstehen zu müssen.
Zunächst stellen wir einen effizienten Algorithmus zur realistischen Bildsynthese von virtuellen 3D Szenen vor. Dieser dient als Kerngerüst einer Methode, die Nutzern die Feinabstimmung des finalen Aussehens einer gerenderten dreidimensionalen Szene erlaubt. Hierbei malt der Künstler direkt das beabsichtigte Aussehen und das System errechnet automatisch die zugrundeliegenden Materialeigenschaften, die den beabsichtigten Eigenschaften am nahesten kommen. Zu diesem Zweck wird ein auf Beispielen basierender Materialzuordnungsansatz vorgestellt, für den das 3D Model einer virtuellen Szene durch das simple Anführen einer Leitquelle (Bild, Video) in Materialien aufgeteilt werden kann. Als Nächstes schlagen wir Form- und Farbunterräume von Objektklassen vor, die aus einer Sammlung von Beispielbildern gelernt werden. Diese Unterräume können genutzt werden um Bildmanipulationen auf valide Formen und Farben einzuschränken oder Manipulationsvorschläge zu liefern. Schließlich werden datenbasierte Farbmannigfaltigkeiten vorgestellt, die Farben eines spezifischen Kontexts enthalten. Diese Mannigfaltigkeiten ermöglichen eine Leistungssteigerung bei Farbauswahl, Farbstilisierung, Komprimierung und Weißabgleich
A Low-Dimensional Perceptual Space for Intuitive BRDF Editing
International audienceUnderstanding and characterizing material appearance based on human perception is challenging because of the highdimensionality and nonlinearity of reflectance data. We refer to the process of identifying specific characteristics of material appearance within the same category as material estimation, in contrast to material categorization which focuses on identifying inter-category differences [FNG15]. In this paper, we present a method to simulate the material estimation process based on human perception. We create a continuous perceptual space for measured tabulated data based on its underlying low-dimensional manifold. Unlike many previous works that only address individual perceptual attributes (such as gloss), we focus on extracting all possible dimensions that can explain the perceived differences between appearances. Additionally, we propose a new material editing interface that combines image navigation and sliders to visualize each perceptual dimension and facilitate the editing of tabulated BRDFs. We conduct a user study to evaluate the efficacy of the perceptual space and the interface in terms of appearance matching
Using AI and Robotics for EV battery cable detection.: Development and implementation of end-to-end model-free 3D instance segmentation for industrial purposes
Master's thesis in Information- and communication technology (IKT590)This thesis describes a novel method for capturing point clouds and segmenting instances of cabling found on electric vehicle battery packs. The use of cutting-edge perception algorithm architectures, such as graph-based and voxel-based convolution, in industrial autonomous lithium-ion battery pack disassembly is being investigated. The thesis focuses on the challenge of getting a desirable representation of any battery pack using an ABB robot in conjunction with a high-end structured light camera, with "end-to-end" and "model-free" as design constraints. The thesis employs self-captured datasets comprised of several battery packs that have been captured and labeled. Following that, the datasets are used to create a perception system. This thesis recommends using HDR functionality in an industrial application to capture the full dynamic range of the battery packs. To adequately depict 3D features, a three-point-of-view capture sequence is deemed necessary. A general capture process for an entire battery pack is also presented, but a next-best-scan algorithm is likely required to ensure a "close to complete" representation. Graph-based deep-learning algorithms have been shown to be capable of being scaled up to50,000inputs while still exhibiting strong performance in terms of accuracy and processing time. The results show that an instance segmenting system can be implemented in less than two seconds. Using off-the-shelf hardware, demonstrate that a 3D perception system is industrially viable and competitive with a 2D perception system
2D Photo Converter: Modeling 3D Objects from 2D Photos Using OpenGL
The concept of modeling a 3D object based on 2D photos has indeed been widely
discussed and researched among the computer vision professionals and virtual reality
technologists. However, regardless of the many researches going on and the rapid
technological development in 3D modeling world, the best method to render a 3D
model that satisfies the requirement of minimum image pre-processing, maximum
model-realism and minimum error percentage is yet to be studied. This report will lay
out another technique of modeling 3D objects using a 2D photo by analyzing the
possibility and accuracy of light intensity evaluation towards the model.
The main objective of this study is to propose an alternative solution to 3D modeling
techniques by using the information from a 2D photo. It is hoped that by applying the
proposed solution, the constraint of costs and time in the current 3D modeling system
could be reduced. The research focuses on bitmap photos and it applies the principles of
light intensity and distance relativity in estimating the depth volume of the model. The
application is built by using Microsoft Visual C++ 6.0 and utilizes OpenGL Application
Programming Interface (API) in its code.
However, the results of the experiments conducted in this research study shows that the
formula used in the application might not be the best method to produce a 3D model
from a 2D photo. Nonetheless, the idea of using light intensity valuations in producing
3D models could be the new solution in 3D modeling technology. The framework
design and the ideas could be the base research for further development in the 3D
modeling research and analysis study
- …