876 research outputs found
Real-time marker-less multi-person 3D pose estimation in RGB-Depth camera networks
This paper proposes a novel system to estimate and track the 3D poses of
multiple persons in calibrated RGB-Depth camera networks. The multi-view 3D
pose of each person is computed by a central node which receives the
single-view outcomes from each camera of the network. Each single-view outcome
is computed by using a CNN for 2D pose estimation and extending the resulting
skeletons to 3D by means of the sensor depth. The proposed system is
marker-less, multi-person, independent of background and does not make any
assumption on people appearance and initial pose. The system provides real-time
outcomes, thus being perfectly suited for applications requiring user
interaction. Experimental results show the effectiveness of this work with
respect to a baseline multi-view approach in different scenarios. To foster
research and applications based on this work, we released the source code in
OpenPTrack, an open source project for RGB-D people tracking.Comment: Submitted to the 2018 IEEE International Conference on Robotics and
Automatio
DigitalBeing: an Ambient Intelligent Dance Space.
DigitalBeing is an ambient intelligent system that aims to use stage lighting and lighting in projected imagery within a dance performance to portray dancerâs arousal state. The dance space will be augmented with pressure sensors to track dancersâ movements; dancers will also wear physiological sensors. Sensor data will be passed to a three layered architecture. Layer 1 is composed of a system that analyzes sensor data. Layer 2 is composed of two intelligent lighting systems that use the analyzed sensor information to adapt onstage and virtual lighting to show dancerâs arousal level. Layer 3 translates lighting changes to appropriate lighting board commands as well as rendering commands to render the projected imagery
Recommended from our members
Towards a Smart Drone Cinematographer for Filming Human Motion
Affordable consumer drones have made capturing aerial footage more convenient and accessible. However, shooting cinematic motion videos using a drone is challenging because it requires users to analyze dynamic scenarios while operating the controller. In this thesis, our task is to develop an autonomous drone cinematography system to capture cinematic videos of human motion. We understand the system's filming performance to be influenced by three key components: 1) video quality metric, which measures the aesthetic quality -- the angle, the distance, the image composition -- of the captured video, 2) visual feature, which encapsulates the visual elements that influence the filming style, and 3) camera planning, which is a decision-making model that predicts the next best movement. By analyzing these three components, we designed two autonomous drone cinematography systems using both heuristic-based methods and learning-based methods.For the first system, we designed an Autonomous CinemaTography system -- "ACT" by proposing a viewpoint quality metric focusing on the visibility of the 3D human skeleton of the subject. We expanded the application of human motion analysis and simplified manual control by assisting viewpoint selection using a through-the-lens method. For the second system, we designed an imitation-based system that learns the artistic intention of the cameramen through watching professional aerial videos. We designed a camera planner that analyzes the video contents and previous camera motion to predict future camera motion. Furthermore, we propose a planning framework, which can imitate a filming style by ``seeing" only one single demonstration video of such style. We named it ``one-shot imitation filming." To the best of our knowledge, this is the first work that extends imitation learning to autonomous filming. Experimental results in both simulation and field test exhibit significant improvements over existing techniques and our approach managed to help inexperienced pilots capture cinematic videos
MOVIN: Real-time Motion Capture using a Single LiDAR
Recent advancements in technology have brought forth new forms of interactive
applications, such as the social metaverse, where end users interact with each
other through their virtual avatars. In such applications, precise full-body
tracking is essential for an immersive experience and a sense of embodiment
with the virtual avatar. However, current motion capture systems are not easily
accessible to end users due to their high cost, the requirement for special
skills to operate them, or the discomfort associated with wearable devices. In
this paper, we present MOVIN, the data-driven generative method for real-time
motion capture with global tracking, using a single LiDAR sensor. Our
autoregressive conditional variational autoencoder (CVAE) model learns the
distribution of pose variations conditioned on the given 3D point cloud from
LiDAR.As a central factor for high-accuracy motion capture, we propose a novel
feature encoder to learn the correlation between the historical 3D point cloud
data and global, local pose features, resulting in effective learning of the
pose prior. Global pose features include root translation, rotation, and foot
contacts, while local features comprise joint positions and rotations.
Subsequently, a pose generator takes into account the sampled latent variable
along with the features from the previous frame to generate a plausible current
pose. Our framework accurately predicts the performer's 3D global information
and local joint details while effectively considering temporally coherent
movements across frames. We demonstrate the effectiveness of our architecture
through quantitative and qualitative evaluations, comparing it against
state-of-the-art methods. Additionally, we implement a real-time application to
showcase our method in real-world scenarios. MOVIN dataset is available at
\url{https://movin3d.github.io/movin_pg2023/}
Source coding for transmission of reconstructed dynamic geometry: a rate-distortion-complexity analysis of different approaches
Live 3D reconstruction of a human as a 3D mesh with commodity electronics is becoming a reality. Immersive applications (i.e. cloud gaming, tele-presence) benefit from effective transmission of such content over a bandwidth limited link. In this paper we outline different approaches for compressing live reconstructed mesh geometry based on distributing mesh reconstruction functions between sender and receiver. We evaluate rate-performance-complexity of different configurations. First, we investigate 3D mesh compression methods (i.e. dynamic/static) from MPEG-4. Second, we evaluate the option of using octree based point cloud compression and receiver side surface reconstruction
Recommended from our members
3D (embodied) projection mapping and sensing bodies : a study in interactive dance performance
This dissertation identifies the synergies between physical and virtual environments when designing for immersive experiences in interactive dance performances. The integration of virtual information in physical space is transforming our interactions and experiences with the world. By using the body and creative expression as the interface between real and virtual worlds, dance performance creates a privileged framework to research and design interactive mixed reality environments and immersive augmented architectures. The research is primarily situated in the fields of visual art and interaction design. It combines performance with transdisciplinary fields and intertwines practice with theory. The theoretical and conceptual implications involved in designing and experiencing immersive hybrid environments are analyzed using the realityâvirtuality continuum. These theories helped frame the ways augmented reality architectures are achieved through the integration of dance performance with digital software and reception displays. They also helped identify the main artistic affordances and restrictions in the design of augmented reality and augmented virtuality environments for live performance. These pervasive media architectures were materialized in three field experiments, the live dance performances. Each performance was created in three different stages of conception, design and production. The first stage was to âdigitizeâ the performerâs movement and brain activity to the virtual environment and our system. This was accomplished through the use of depth sensor cameras, 3D motion capture, and brain computer interfaces. The second stage was the creation of the computational architecture and software that aggregates the connections and mapping between the physical body and the spatial dynamics of the virtual environment. This process created real-time interactions between the performerâs behavior and motion and the real-time generative computer 3D graphics. Finally, the third stage consisted of the output modality: 3D projector based augmentation techniques were adopted in order to overlay the virtual environment onto physical space. This thesis proposes and lays out theoretical, technical, and artistic frameworks between 3D digital environments and moving bodies in dance performance. By sensing the body and the brain with the 3D virtual environments, new layers of augmentation and interactions are established, and ultimately this generates mixed reality environments for embodied improvisational self-expression.Radio-Television-Fil
- âŠ