Search CORE

6 research outputs found

Aria Digital Twin: A New Benchmark Dataset for Egocentric 3D Machine Perception

Author: Charron Nicholas
Kong Chen
Newcombe Richard
Pan Xiaqing
Parkhi Omkar
Peters Scott
Ren Carl Yuheng
Whelan Thomas
Yang Yongqian
Publication venue
Publication date: 13/06/2023
Field of study

We introduce the Aria Digital Twin (ADT) - an egocentric dataset captured using Aria glasses with extensive object, environment, and human level ground truth. This ADT release contains 200 sequences of real-world activities conducted by Aria wearers in two real indoor scenes with 398 object instances (324 stationary and 74 dynamic). Each sequence consists of: a) raw data of two monochrome camera streams, one RGB camera stream, two IMU streams; b) complete sensor calibration; c) ground truth data including continuous 6-degree-of-freedom (6DoF) poses of the Aria devices, object 6DoF poses, 3D eye gaze vectors, 3D human poses, 2D image segmentations, image depth maps; and d) photo-realistic synthetic renderings. To the best of our knowledge, there is no existing egocentric dataset with a level of accuracy, photo-realism and comprehensiveness comparable to ADT. By contributing ADT to the research community, our mission is to set a new standard for evaluation in the egocentric machine perception domain, which includes very challenging research problems such as 3D object detection and tracking, scene reconstruction and understanding, sim-to-real learning, human pose prediction - while also inspiring new machine perception tasks for augmented reality (AR) applications. To kick start exploration of the ADT research use cases, we evaluated several existing state-of-the-art methods for object detection, segmentation and image translation tasks that demonstrate the usefulness of ADT as a benchmarking dataset

arXiv.org e-Print Archive

Very High Frame Rate Volumetric Integration of Depth Images on Mobile Devices

Author: Carl Yuheng Ren
David Murray
Olaf Kahler
Philip Torr
Victor Adrian Prisacariu
Xin Sun
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date
Field of study

Crossref

Robust Silhouette Extraction from Kinect

Author: Carl Yuheng Ren
David W. Murray
Iuri Frosio
Michele Pirovano
N. Alberto Borghese
Pier Luca Lanzi
Victor Prisacariu
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2013
Field of study

Natural User Interfaces allow users to interact with virtual environments with little intermediation. Immersion becomes a vital need for such interfaces to be successful and it is achieved by making the interface invisible to the user. For cognitive rehabilitation, a mirror view is a good interface to the virtual world, but obtaining immersion is not straightforward. An accurate player profile, or silhouette, accurately extracted from the real-world background, increases both the visual quality and the immersion of the player in the virtual environment. The Kinect SDK provides raw data that can be used to extract a simple player profile. In this paper, we present our method for obtaining a smooth player profile extraction from the Kinect image streams

Archivio istituzionale della ricerca - Politecnico di Milano

Crossref

AIR Universita degli studi di Milano

SemanticPaint: interactive segmentation and learning of 3D worlds

Author: Arnab Anurag
Cheng Ming-Ming
Golodetz Stuart
Hicks Stephen L
Izadi Shahram
Kähler Olaf
Murray David W
Prisacariu Victor A
Ren Carl Yuheng
Sapienza Michael
Torr Philip HS
Valentin Julien PC
Vineet Vibhav
Publication venue: Association for Computing Machinery
Publication date: 31/07/2015
Field of study

We present a real-time, interactive system for the geometric reconstruction, object-class segmentation and learning of 3D scenes [Valentin et al. 2015]. Using our system, a user can walk into a room wearing a depth camera and a virtual reality headset, and both densely reconstruct the 3D scene [Newcombe et al. 2011; Nießner et al. 2013; Prisacariu et al. 2014]) and interactively segment the environment into object classes such as 'chair', 'floor' and 'table'. The user interacts physically with the real-world scene, touching objects and using voice commands to assign them appropriate labels. These user-generated labels are leveraged by an online random forest-based machine learning algorithm, which is used to predict labels for previously unseen parts of the scene. The predicted labels, together with those provided directly by the user, are incorporated into a dense 3D conditional random field model, over which we perform mean-field inference to filter out label inconsistencies. The entire pipeline runs in real time, and the user stays 'in the loop' throughout the process, receiving immediate feedback about the progress of the labelling and interacting with the scene as necessary to refine the predicted segmentation

Crossref

Oxford University Research Archive