2,474 research outputs found
Ambient Intelligence for Next-Generation AR
Next-generation augmented reality (AR) promises a high degree of
context-awareness - a detailed knowledge of the environmental, user, social and
system conditions in which an AR experience takes place. This will facilitate
both the closer integration of the real and virtual worlds, and the provision
of context-specific content or adaptations. However, environmental awareness in
particular is challenging to achieve using AR devices alone; not only are these
mobile devices' view of an environment spatially and temporally limited, but
the data obtained by onboard sensors is frequently inaccurate and incomplete.
This, combined with the fact that many aspects of core AR functionality and
user experiences are impacted by properties of the real environment, motivates
the use of ambient IoT devices, wireless sensors and actuators placed in the
surrounding environment, for the measurement and optimization of environment
properties. In this book chapter we categorize and examine the wide variety of
ways in which these IoT sensors and actuators can support or enhance AR
experiences, including quantitative insights and proof-of-concept systems that
will inform the development of future solutions. We outline the challenges and
opportunities associated with several important research directions which must
be addressed to realize the full potential of next-generation AR.Comment: This is a preprint of a book chapter which will appear in the
Springer Handbook of the Metavers
Fully Automatic Multi-Object Articulated Motion Tracking
Fully automatic tracking of articulated motion in real-time with a monocular RGB camera is a challenging problem which is essential for many virtual reality (VR) and human-computer interaction applications. In this paper, we present an algorithm for multiple articulated objects tracking based on monocular RGB image sequence. Our algorithm can be directly employed in practical applications as it is fully automatic, real-time, and temporally stable. It consists of the following stages: dynamic objects counting, objects specific 3D skeletons generation, initial 3D poses estimation, and 3D skeleton fitting which fits each 3D skeleton to the corresponding 2D body-parts locations. In the skeleton fitting stage, the 3D pose of every object is estimated by maximizing an objective function that combines a skeleton fitting term with motion and pose priors. To illustrate the importance of our algorithm for practical applications, we present competitive results for real-time tracking of multiple humans. Our algorithm detects objects that enter or leave the scene, and dynamically generates or deletes their 3D skeletons. This makes our monocular RGB method optimal for real-time applications. We show that our algorithm is applicable for tracking multiple objects in outdoor scenes, community videos, and low-quality videos captured with mobile-phone cameras. Keywords: Multi-object motion tracking, Articulated motion capture, Deep learning, Anthropometric data, 3D pose estimation. DOI: 10.7176/CEIS/12-1-01 Publication date: March 31st 202
Towards High-Frequency Tracking and Fast Edge-Aware Optimization
This dissertation advances the state of the art for AR/VR tracking systems by
increasing the tracking frequency by orders of magnitude and proposes an
efficient algorithm for the problem of edge-aware optimization.
AR/VR is a natural way of interacting with computers, where the physical and
digital worlds coexist. We are on the cusp of a radical change in how humans
perform and interact with computing. Humans are sensitive to small
misalignments between the real and the virtual world, and tracking at
kilo-Hertz frequencies becomes essential. Current vision-based systems fall
short, as their tracking frequency is implicitly limited by the frame-rate of
the camera. This thesis presents a prototype system which can track at orders
of magnitude higher than the state-of-the-art methods using multiple commodity
cameras. The proposed system exploits characteristics of the camera
traditionally considered as flaws, namely rolling shutter and radial
distortion. The experimental evaluation shows the effectiveness of the method
for various degrees of motion.
Furthermore, edge-aware optimization is an indispensable tool in the computer
vision arsenal for accurate filtering of depth-data and image-based rendering,
which is increasingly being used for content creation and geometry processing
for AR/VR. As applications increasingly demand higher resolution and speed,
there exists a need to develop methods that scale accordingly. This
dissertation proposes such an edge-aware optimization framework which is
efficient, accurate, and algorithmically scales well, all of which are much
desirable traits not found jointly in the state of the art. The experiments
show the effectiveness of the framework in a multitude of computer vision tasks
such as computational photography and stereo.Comment: PhD thesi
Analysis of the hands in egocentric vision: A survey
Egocentric vision (a.k.a. first-person vision - FPV) applications have
thrived over the past few years, thanks to the availability of affordable
wearable cameras and large annotated datasets. The position of the wearable
camera (usually mounted on the head) allows recording exactly what the camera
wearers have in front of them, in particular hands and manipulated objects.
This intrinsic advantage enables the study of the hands from multiple
perspectives: localizing hands and their parts within the images; understanding
what actions and activities the hands are involved in; and developing
human-computer interfaces that rely on hand gestures. In this survey, we review
the literature that focuses on the hands using egocentric vision, categorizing
the existing approaches into: localization (where are the hands or parts of
them?); interpretation (what are the hands doing?); and application (e.g.,
systems that used egocentric hand cues for solving a specific problem).
Moreover, a list of the most prominent datasets with hand-based annotations is
provided
Interaction Replica: Tracking human-object interaction and scene changes from human motion
Humans naturally change their environment through interactions, e.g., by
opening doors or moving furniture. To reproduce such interactions in virtual
spaces (e.g., metaverse), we need to capture and model them, including changes
in the scene geometry, ideally from egocentric input alone (head camera and
body-worn inertial sensors). While the head camera can be used to localize the
person in the scene, estimating dynamic object pose is much more challenging.
As the object is often not visible from the head camera (e.g., a human not
looking at a chair while sitting down), we can not rely on visual object pose
estimation. Instead, our key observation is that human motion tells us a lot
about scene changes. Motivated by this, we present iReplica, the first
human-object interaction reasoning method which can track objects and scene
changes based solely on human motion. iReplica is an essential first step
towards advanced AR/VR applications in immersive virtual universes and can
provide human-centric training data to teach machines to interact with their
surroundings. Our code, data and model will be available on our project page at
http://virtualhumans.mpi-inf.mpg.de/ireplica
Application of augmented reality and robotic technology in broadcasting: A survey
As an innovation technique, Augmented Reality (AR) has been gradually deployed in the broadcast, videography and cinematography industries. Virtual graphics generated by AR are dynamic and overlap on the surface of the environment so that the original appearance can be greatly enhanced in comparison with traditional broadcasting. In addition, AR enables broadcasters to interact with augmented virtual 3D models on a broadcasting scene in order to enhance the performance of broadcasting. Recently, advanced robotic technologies have been deployed in a camera shooting system to create a robotic cameraman so that the performance of AR broadcasting could be further improved, which is highlighted in the paper
Learning to see and hear in 3D: Virtual reality as a platform for multisensory perceptual learning
Virtual reality (VR) is an emerging technology which allows for the presentation of immersive and realistic yet tightly controlled audiovisual scenes. In comparison to conventional displays, the VR system can include depth, 3D audio, fully integrated eye, head, and hand tracking, all over a much larger field of view than a desktop monitor provides. These properties demonstrate great potential for use in vision science experiments, especially those that can benefit from more naturalistic stimuli, particularly in the case of visual rehabilitation. Prior work using conventional displays has demonstrated that that visual loss due to stroke can be partially rehabilitated through laboratory-based tasks designed to promote long-lasting changes to visual sensitivity. In this work, I will explore how VR can provide a platform for new, more complex training paradigms which leverage multisensory stimuli. In this dissertation, I will (I) provide context to motivate the use of multisensory perceptual training in the context of visual rehabilitation, (II) demonstrate best practices for the appropriate use of VR in a controlled psychophysics setting, (III) describe a prototype integrated hardware system for improved eye tracking in VR, and (IV, V) discuss results from two audiovisual perceptual training studies, one using multisensory stimuli and the other with cross-modal audiovisual stimuli. This dissertation provides the foundation for future work in rehabilitating visual deficits, by both improving the hardware and software systems used to present the training paradigm as well as validating new techniques which use multisensory training not previously accessible with conventional desktop displays
- …