Search CORE

7 research outputs found

Linear-time Online Action Detection From 3D Skeletal Data Using Bags of Gesturelets

Author: Hussein Mohamed E.
Meshry Moustafa
Torki Marwan
Publication venue
Publication date: 28/12/2015
Field of study

Sliding window is one direct way to extend a successful recognition system to handle the more challenging detection problem. While action recognition decides only whether or not an action is present in a pre-segmented video sequence, action detection identifies the time interval where the action occurred in an unsegmented video stream. Sliding window approaches for action detection can however be slow as they maximize a classifier score over all possible sub-intervals. Even though new schemes utilize dynamic programming to speed up the search for the optimal sub-interval, they require offline processing on the whole video sequence. In this paper, we propose a novel approach for online action detection based on 3D skeleton sequences extracted from depth data. It identifies the sub-interval with the maximum classifier score in linear time. Furthermore, it is invariant to temporal scale variations and is suitable for real-time applications with low latency

arXiv.org e-Print Archive

Crossref

Neural Rendering Techniques for Photo-realistic Image Generation and Novel View Synthesis

Author: Meshry Moustafa
Publication venue
Publication date: 01/01/2022
Field of study

Recent advances in deep generative models have enabled computers to imagine and generate fictional images from any given distribution of images. Techniques like Generative Adversarial Networks (GANs) and image-to-image (I2I) translation can generate images by mapping a random noise or an input image (e.g., a sketch or a semantic map) to photo-realistic images. However, there are still plenty of challenges regarding training such models and improving their output quality and diversity. Furthermore, to harness this imaginative and generative power for solving real-world applications, we need to be able to control different aspects of the rendering process; for example to specify the content and/or style of generated images, camera pose, lighting, ... etc. One challenge to training image generation models is the multi-modal nature of image synthesis. An image with a specific content, such as a cat or a car, can be generated with countless choices of different styles (e.g., colors, lighting, and local texture details). To enable user control over the generated style, previous works train multi-modal I2I translation networks, but they suffer from a complicated and slow training, and their training is specific to one target image domain. We address this limitation and propose a style pre-training strategy that generalizes across many image domains, improves the training stability and speed, and improves the performance in terms of output quality and diversity. Another challenge to GANs and I2I translation is to provide 3D control over the rendering process. For example, applications such as AR/VR, virtual tours and telepresence require generating consistent images or videos of 3D environments. However, GANs and I2I translation mainly operate in 2D, which limits their use for such applications. To address this limitation, we propose to condition image synthesis on coarse geometric proxies (e.g., a point cloud, a coarse mesh, or a voxel grid), and we augment these rough proxies with machine learned components to fix and compliment their artifacts and render photo-realistic images. We apply our proposal to solve the task of novel view synthesis under different challenging settings, and show photo-realistic novel views of complex scenes with multiple objects, tourist landmarks under different appearances, and human subjects under novel head poses and facial expressions

Digital Repository at the University of Maryland

RTMV: A Ray-Traced Multi-View Synthetic Dataset for Novel View Synthesis

Author: Birchfield Stan
Evans Alex
Kautz Jan
Keller Alexander
Khamis Sameh
Loop Charles
Meshry Moustafa
Morrical Nathan
Nagano Koki
Takikawa Towaki
Tremblay Jonathan
Publication venue
Publication date: 14/05/2022
Field of study

We present a large-scale synthetic dataset for novel view synthesis consisting of ~300k images rendered from nearly 2000 complex scenes using high-quality ray tracing at high resolution (1600 x 1600 pixels). The dataset is orders of magnitude larger than existing synthetic datasets for novel view synthesis, thus providing a large unified benchmark for both training and evaluation. Using 4 distinct sources of high-quality 3D meshes, the scenes of our dataset exhibit challenging variations in camera views, lighting, shape, materials, and textures. Because our dataset is too large for existing methods to process, we propose Sparse Voxel Light Field (SVLF), an efficient voxel-based light field approach for novel view synthesis that achieves comparable performance to NeRF on synthetic data, while being an order of magnitude faster to train and two orders of magnitude faster to render. SVLF achieves this speed by relying on a sparse voxel octree, careful voxel sampling (requiring only a handful of queries per ray), and reduced network structure; as well as ground truth depth maps at training time. Our dataset is generated by NViSII, a Python-based ray tracing renderer, which is designed to be simple for non-experts to use and share, flexible and powerful through its use of scripting, and able to create high-quality and physically-based rendered images. Experiments with a subset of our dataset allow us to compare standard methods like NeRF and mip-NeRF for single-scene modeling, and pixelNeRF for category-level modeling, pointing toward the need for future improvements in this area.Comment: Project page at http://www.cs.umd.edu/~mmeshry/projects/rtm

arXiv.org e-Print Archive

Deferred neural lighting

Author: Deschaintre Valentin
Diederik
Geldreich Rich
Gortler Steven J.
Haber Tom
Hedman Peter
Hui Zhuo
Kanamori Yoshihiro
Li Guannan
Meshry Moustafa Mahmoud
Nalbach Oliver
Nam Giljoo
Park Eunbyung
Sun Shao-Hua
Ye Wenjie
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date
Field of study

Crossref

One shot 3D photography

Author: Ayush Saraf
Choi Jungwook
Dai Xiaoliang
Flynn John
Francis Ge
Garg Ravi
Godard Clément
Godard Clément
Jan-Michael Frahm
Johannes Kopf
Josh Patterson
Kevin Matzen
Kuznietsov Yevhen
Liu Guilin
Liu Miaomiao
Martin-Brualla Ricardo
Matthew Yu
Menze Moritz
Meshry Moustafa
Michael Cohen
Mildenhall Ben
Ocean Quigley
Peizhao Zhang
Peluso V.
Peter Vajda
Poggi Matteo
Ramamonjisoa Michael
Ranftl Rene
Saxena Ashutosh
Shu Wu
Srinivasan Pratul P.
Suhib Alsisan
Tonioni A.
Wofk Diana
Xian Ke
Yangming Chong
Zhang Yinda
Zijian He
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date
Field of study

Crossref