Search CORE

1,609 research outputs found

Recommended from our members

The MVP sensor planning system for robotic vision tasks

Author: Allen Peter K.
Tarabanis Konstantinos
Tsai Roger Y.
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/1995
Field of study

The MVP (machine vision planner) model-based sensor planning system for robotic vision is presented. MVP automatically synthesizes desirable camera views of a scene based on geometric models of the environment, optical models of the vision sensors, and models of the task to be achieved. The generic task of feature detectability has been chosen since it is applicable to many robot-controlled vision systems. For such a task, features of interest in the environment are required to simultaneously be visible, inside the field of view, in focus, and magnified as required. In this paper, we present a technique that poses the vision sensor planning problem in an optimization setting and determines viewpoints that satisfy all previous requirements simultaneously and with a margin. In addition, we present experimental results of this technique when applied to a robotic vision system that consists of a camera mounted on a robot manipulator in a hand-eye configuration

Columbia University Academic Commons

Model-Based Environmental Visual Perception for Humanoid Robots

Author: Gonzalez Aguirre David Israel
Publication venue: KIT-Bibliothek, Karlsruhe
Publication date: 01/01/2013
Field of study

The visual perception of a robot should answer two fundamental questions: What? and Where? In order to properly and efficiently reply to these questions, it is essential to establish a bidirectional coupling between the external stimuli and the internal representations. This coupling links the physical world with the inner abstraction models by sensor transformation, recognition, matching and optimization algorithms. The objective of this PhD is to establish this sensor-model coupling

KITopen

Visual Perception For Robotic Spatial Understanding

Author: Owens Jason Lawrence
Publication venue: ScholarlyCommons
Publication date: 01/01/2019
Field of study

Humans understand the world through vision without much effort. We perceive the structure, objects, and people in the environment and pay little direct attention to most of it, until it becomes useful. Intelligent systems, especially mobile robots, have no such biologically engineered vision mechanism to take for granted. In contrast, we must devise algorithmic methods of taking raw sensor data and converting it to something useful very quickly. Vision is such a necessary part of building a robot or any intelligent system that is meant to interact with the world that it is somewhat surprising we don\u27t have off-the-shelf libraries for this capability. Why is this? The simple answer is that the problem is extremely difficult. There has been progress, but the current state of the art is impressive and depressing at the same time. We now have neural networks that can recognize many objects in 2D images, in some cases performing better than a human. Some algorithms can also provide bounding boxes or pixel-level masks to localize the object. We have visual odometry and mapping algorithms that can build reasonably detailed maps over long distances with the right hardware and conditions. On the other hand, we have robots with many sensors and no efficient way to compute their relative extrinsic poses for integrating the data in a single frame. The same networks that produce good object segmentations and labels in a controlled benchmark still miss obvious objects in the real world and have no mechanism for learning on the fly while the robot is exploring. Finally, while we can detect pose for very specific objects, we don\u27t yet have a mechanism that detects pose that generalizes well over categories or that can describe new objects efficiently. We contribute algorithms in four of the areas mentioned above. First, we describe a practical and effective system for calibrating many sensors on a robot with up to 3 different modalities. Second, we present our approach to visual odometry and mapping that exploits the unique capabilities of RGB-D sensors to efficiently build detailed representations of an environment. Third, we describe a 3-D over-segmentation technique that utilizes the models and ego-motion output in the previous step to generate temporally consistent segmentations with camera motion. Finally, we develop a synthesized dataset of chair objects with part labels and investigate the influence of parts on RGB-D based object pose recognition using a novel network architecture we call PartNet

ScholarlyCommons@Penn

Enabling Viewpoint Learning through Dynamic Label Generation

Author: Blanz V.
Feixas M.
Freitag S.
Gao B.-B.
González Á.
Jacobs R. A.
Jordan M. I.
Lee C. H.
Lino C.
Liu Z.
Marchand E.
Meuschke M.
Monclús E.
Nguyen-Phuoc T. H.
Nimier-David M.
Secord A.
Shi N.
Song R.
Srivastava N.
Viola I.
Vázquez P.-P.
Yao W. Y. Z.
Zhang Y.
Publication venue
Publication date: 09/02/2021
Field of study

Optimal viewpoint prediction is an essential task in many computer graphics applications. Unfortunately, common viewpoint qualities suffer from two major drawbacks: dependency on clean surface meshes, which are not always available, and the lack of closed-form expressions, which requires a costly search involving rendering. To overcome these limitations we propose to separate viewpoint selection from rendering through an end-to-end learning approach, whereby we reduce the influence of the mesh quality by predicting viewpoints from unstructured point clouds instead of polygonal meshes. While this makes our approach insensitive to the mesh discretization during evaluation, it only becomes possible when resolving label ambiguities that arise in this context. Therefore, we additionally propose to incorporate the label generation into the training procedure, making the label decision adaptive to the current network predictions. We show how our proposed approach allows for learning viewpoint predictions for models from different object categories and for different viewpoint qualities. Additionally, we show that prediction times are reduced from several minutes to a fraction of a second, as compared to state-of-the-art (SOTA) viewpoint quality evaluation. We will further release the code and training data, which will to our knowledge be the biggest viewpoint quality dataset available

arXiv.org e-Print Archive

Crossref

UPCommons. Portal del coneixement obert de la UPC

Neural radiance fields in the industrial and robotics domain: applications, research opportunities and use cases

Author: Dopiriak Matúš
Gazda Juraj
Maksymyuk Taras
Pardo Enric
Šlapak Eugen
Publication venue
Publication date: 16/08/2023
Field of study

The proliferation of technologies, such as extended reality (XR), has increased the demand for high-quality three-dimensional (3D) graphical representations. Industrial 3D applications encompass computer-aided design (CAD), finite element analysis (FEA), scanning, and robotics. However, current methods employed for industrial 3D representations suffer from high implementation costs and reliance on manual human input for accurate 3D modeling. To address these challenges, neural radiance fields (NeRFs) have emerged as a promising approach for learning 3D scene representations based on provided training 2D images. Despite a growing interest in NeRFs, their potential applications in various industrial subdomains are still unexplored. In this paper, we deliver a comprehensive examination of NeRF industrial applications while also providing direction for future research endeavors. We also present a series of proof-of-concept experiments that demonstrate the potential of NeRFs in the industrial domain. These experiments include NeRF-based video compression techniques and using NeRFs for 3D motion estimation in the context of collision avoidance. In the video compression experiment, our results show compression savings up to 48\% and 74\% for resolutions of 1920x1080 and 300x168, respectively. The motion estimation experiment used a 3D animation of a robotic arm to train Dynamic-NeRF (D-NeRF) and achieved an average peak signal-to-noise ratio (PSNR) of disparity map with the value of 23 dB and an structural similarity index measure (SSIM) 0.97

arXiv.org e-Print Archive

A survey of real-time crowd rendering

Author: Andújar Gran Carlos Antonio
Beacco Porres Alejandro
Pelechano Gómez Núria
Publication venue: 'Wiley'
Publication date: 01/01/2016
Field of study

In this survey we review, classify and compare existing approaches for real-time crowd rendering. We first overview character animation techniques, as they are highly tied to crowd rendering performance, and then we analyze the state of the art in crowd rendering. We discuss different representations for level-of-detail (LoD) rendering of animated characters, including polygon-based, point-based, and image-based techniques, and review different criteria for runtime LoD selection. Besides LoD approaches, we review classic acceleration schemes, such as frustum culling and occlusion culling, and describe how they can be adapted to handle crowds of animated characters. We also discuss specific acceleration techniques for crowd rendering, such as primitive pseudo-instancing, palette skinning, and dynamic key-pose caching, which benefit from current graphics hardware. We also address other factors affecting performance and realism of crowds such as lighting, shadowing, clothing and variability. Finally we provide an exhaustive comparison of the most relevant approaches in the field.Peer ReviewedPostprint (author's final draft

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC