13 research outputs found

    Dance In the Wild: Monocular Human Animation with Neural Dynamic Appearance Synthesis

    Get PDF
    Synthesizing dynamic appearances of humans in motion plays a central role in applications such as ARWR and video editing. While many recent methods have been proposed to tackle this problem,handling loose garments with complex textures and high dynamic motion still remains challenging. In this paper,we propose a video based appearance synthesis method that tackles such challenges and demonstrates high quality results for in-the-wild videos that have not been shown before. Specifically,we adopt a StyleGAN based architecture to the task of person specific video based motion retargeting. We introduce a novel motion signature that is used to modulate the generator weights to capture dynamic appearance changes as well as regularizing the single frame based pose estimates to improve temporal coherency. We evaluate our method on a set of challenging videos and show that our approach achieves state-of-the-art performance both qualitatively and quantitatively

    Improved content aware scene retargeting for retinitis pigmentosa patients

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>In this paper we present a novel scene retargeting technique to reduce the visual scene while maintaining the size of the key features. The algorithm is scalable to implementation onto portable devices, and thus, has potential for augmented reality systems to provide visual support for those with tunnel vision. We therefore test the efficacy of our algorithm on shrinking the visual scene into the remaining field of view for those patients.</p> <p>Methods</p> <p>Simple spatial compression of visual scenes makes objects appear further away. We have therefore developed an algorithm which removes low importance information, maintaining the size of the significant features. Previous approaches in this field have included <it>seam carving</it>, which removes low importance seams from the scene, and <it>shrinkability </it>which dynamically shrinks the scene according to a generated importance map. The former method causes significant artifacts and the latter is inefficient. In this work we have developed a new algorithm, combining the best aspects of both these two previous methods. In particular, our approach is to generate a <it>shrinkability </it>importance map using as seam based approach. We then use it to dynamically shrink the scene in similar fashion to the <it>shrinkability </it>method. Importantly, we have implemented it so that it can be used in real time without prior knowledge of future frames.</p> <p>Results</p> <p>We have evaluated and compared our algorithm to the <it>seam carving </it>and image <it>shrinkability </it>approaches from a content preservation perspective and a compression quality perspective. Also our technique has been evaluated and tested on a trial included 20 participants with simulated tunnel vision. Results show the robustness of our method at reducing scenes up to 50% with minimal distortion. We also demonstrate efficacy in its use for those with simulated tunnel vision of 22 degrees of field of view or less.</p> <p>Conclusions</p> <p>Our approach allows us to perform content aware video resizing in real time using only information from previous frames to avoid jitter. Also our method has a great benefit over the ordinary resizing method and even over other image retargeting methods. We show that the benefit derived from this algorithm is significant to patients with fields of view 20° or less.</p

    MotionBERT: A Unified Perspective on Learning Human Motion Representations

    Full text link
    We present a unified perspective on tackling various human-centric video tasks by learning human motion representations from large-scale and heterogeneous data resources. Specifically, we propose a pretraining stage in which a motion encoder is trained to recover the underlying 3D motion from noisy partial 2D observations. The motion representations acquired in this way incorporate geometric, kinematic, and physical knowledge about human motion, which can be easily transferred to multiple downstream tasks. We implement the motion encoder with a Dual-stream Spatio-temporal Transformer (DSTformer) neural network. It could capture long-range spatio-temporal relationships among the skeletal joints comprehensively and adaptively, exemplified by the lowest 3D pose estimation error so far when trained from scratch. Furthermore, our proposed framework achieves state-of-the-art performance on all three downstream tasks by simply finetuning the pretrained motion encoder with a simple regression head (1-2 layers), which demonstrates the versatility of the learned motion representations. Code and models are available at https://motionbert.github.io/Comment: ICCV 2023 Camera Read

    Periscope: A Robotic Camera System to Support Remote Physical Collaboration

    Full text link
    We investigate how robotic camera systems can offer new capabilities to computer-supported cooperative work through the design, development, and evaluation of a prototype system called Periscope. With Periscope, a local worker completes manipulation tasks with guidance from a remote helper who observes the workspace through a camera mounted on a semi-autonomous robotic arm that is co-located with the worker. Our key insight is that the helper, the worker, and the robot should all share responsibility of the camera view--an approach we call shared camera control. Using this approach, we present a set of modes that distribute the control of the camera between the human collaborators and the autonomous robot depending on task needs. We demonstrate the system's utility and the promise of shared camera control through a preliminary study where 12 dyads collaboratively worked on assembly tasks. Finally, we discuss design and research implications of our work for future robotic camera systems that facilitate remote collaboration.Comment: This is a pre-print of the article accepted for publication in PACM HCI and will be presented at CSCW 202

    Human Visual Perception, study and applications to understanding Images and Videos

    Get PDF
    Ph.DDOCTOR OF PHILOSOPH

    On-line Time Warping of Human Motion Sequences

    Get PDF
    Some application areas require motions to be time warped on-line as a motion is captured, aligning a partially captured motion to a complete prerecorded motion. For example movement training applications for dance and medical procedures, require on-line time warping for analysing and visually feeding back the accuracy of human motions as they are being performed. Additionally, real-time production techniques such as virtual production, in camera visual effects and the use of avatars in live stage performances, require on-line time warping to align virtual character performances to a live performer. The work in this thesis first addresses a research gap in the measurement of the alignment of two motions, proposing approaches based on rank correlation and evaluating them against existing distance based approaches to measuring motion similarity. The thesis then goes onto propose and evaluate novel methods for on-line time warping, which plot alignments in a forward direction and utilise forecasting and local continuity constraint techniques. Current studies into measuring the similarity of motions focus on distance based metrics for measuring the similarity of the motions to support motion recognition applications, leaving a research gap regarding the effectiveness of similarity metrics bases on correlation and the optimal metrics for measuring the alignment of two motions. This thesis addresses this research gap by comparing the performance of variety of similarity metrics based on distance and correlation, including novel combinations of joint parameterisation and correlation methods. The ability of each metric to measure both the similarity and alignment of two motions is independently assessed. This work provides a detailed evaluation of a variety of different approaches to using correlation within a similarity metric, testing their performance to determine which approach is optimal and comparing their performance against established distance based metrics. The results show that a correlation based metric, in which joints are parameterised using displacement vectors and correlation is measured using Kendall Tau rank correlation, is the optimal approach for measuring the alignment between two motions. The study also showed that similarity metrics based on correlation are better at measuring the alignment of two motions, which is important in motion blending and style transfer applications as well as evaluating the performance of time warping algorithms. It also showed that metrics based on distance are better at measuring the similarity of two motions, which is more relevant to motion recognition and classification applications. A number of approaches to on-line time warping have been proposed within existing research, that are based on plotting an alignment path backwards from a selected end-point within the complete motion. While these approaches work for discrete applications, such as recognising a motion, their lack of monotonic constraint between alignment of each frame, means these approaches do not support applications that require an alignment to be maintained continuously over a number of frames. For example applications involving continuous real-time visualisation, feedback or interaction. To solve this problem, a number of novel on-line time warping algorithms, based on forward plotting, motion forecasting and local continuity constraints are proposed and evaluated by applying them to human motions. Two benchmarks standards for evaluating the performance of on-line time warping algorithms are established, based on UTW time warping and compering the resulting alignment path with that produced by DTW. This work also proposes a novel approach to adapting existing local continuity constraints to a forward plotting approach. The studies within this thesis demonstrates that these time warping approaches are able to produce alignments of sufficient quality to support applications that require an alignment to be maintained continuously. The on-line time warping algorithms proposed in this study can align a previously recorded motion to a user in real-time, as they are performing the same action or an opposing action recorded at the same time as the motion being align. This solution has a variety of potential application areas including: visualisation applications, such as aligning a motion to a live performer to facilitate in camera visual effects or a live stage performance with a virtual avatar; motion feedback applications such as dance training or medical rehabilitation; and interaction applications such as working with Cobots

    Real-time 3D human body pose estimation from monocular RGB input

    Get PDF
    Human motion capture finds extensive application in movies, games, sports and biomechanical analysis. However, existing motion capture solutions require cumbersome external and/or on-body instrumentation, or use active sensors with limits on the possible capture volume dictated by power consumption. The ubiquity and ease of deployment of RGB cameras makes monocular RGB based human motion capture an extremely useful problem to solve, which would lower the barrier-to entry for content creators to employ motion capture tools, and enable newer applications of human motion capture. This thesis demonstrates the first real-time monocular RGB based motion-capture solutions that work in general scene settings. They are based on developing neural network based approaches to address the ill-posed problem of estimating 3D human pose from a single RGB image, in combination with model based fitting. In particular, the contributions of this work make advances towards three key aspects of real-time monocular RGB based motion capture, namely speed, accuracy, and the ability to work for general scenes. New training datasets are proposed, for single-person and multi-person scenarios, which, together with the proposed transfer learning based training pipeline, allow learning based approaches to be appearance invariant. The training datasets are accompanied by evaluation benchmarks with multiple avenues of fine-grained evaluation. The evaluation benchmarks differ visually from the training datasets, so as to promote efforts towards solutions that generalize to in-the-wild scenes. The proposed task formulations for the single-person and multi-person case allow higher accuracy, and incorporate additional qualities such as occlusion robustness, that are helpful in the context of a full motion capture solution. The multi-person formulations are designed to have a nearly constant inference time regardless of the number of subjects in the scene, and combined with contributions towards fast neural network inference, enable real-time 3D pose estimation for multiple subjects. Combining the proposed learning-based approaches with a model-based kinematic skeleton fitting step provides temporally stable joint angle estimates, which can be readily employed for driving virtual characters.Menschlicher Motion Capture findet umfangreiche Anwendung in Filmen, Spielen, Sport und biomechanischen Analysen. Bestehende Motion-Capture-Lösungen erfordern jedoch umständliche externe Instrumentierung und / oder Instrumentierung am Körper, oder verwenden aktive Sensoren deren begrenztes Erfassungsvolumen durch den Stromverbrauch begrenzt wird. Die Allgegenwart und einfache Bereitstellung von RGB-Kameras macht die monokulare RGB-basierte Motion Capture zu einem äußerst nützlichen Problem. Dies würde die Eintrittsbarriere für Inhaltsersteller für die Verwendung der Motion Capture verringern und neuere Anwendungen dieser Tools zur Analyse menschlicher Bewegungen ermöglichen. Diese Arbeit zeigt die ersten monokularen RGB-basierten Motion-Capture-Lösungen in Echtzeit, die in allgemeinen Szeneneinstellungen funktionieren. Sie basieren auf der Entwicklung neuronaler netzwerkbasierter Ansätze, um das schlecht gestellte Problem der Schätzung der menschlichen 3D-Pose aus einem einzelnen RGB-Bild in Kombination mit einer modellbasierten Anpassung anzugehen. Insbesondere machen die Beiträge dieser Arbeit Fortschritte in Richtung drei Schlüsselaspekte der monokularen RGB-basierten Echtzeit-Bewegungserfassung, nämlich Geschwindigkeit, Genauigkeit und die Fähigkeit, für allgemeine Szenen zu arbeiten. Es werden neue Trainingsdatensätze für Einzel- und Mehrpersonen-Szenarien vorgeschlagen, die zusammen mit der vorgeschlagenen Trainingspipeline, die auf Transferlernen basiert, ermöglichen, dass lernbasierte Ansätze nicht von Unterschieden im Erscheinungsbild des Bildes beeinflusst werden. Die Trainingsdatensätze werden von Bewertungsbenchmarks mit mehreren Möglichkeiten einer feinkörnigen Bewertung begleitet. Die angegebenen Benchmarks unterscheiden sich visuell von den Trainingsaufzeichnungen, um die Entwicklung von Lösungen zu fördern, die sich auf verschiedene Szenen verallgemeinern lassen. Die vorgeschlagenen Aufgabenformulierungen für den Einzel- und Mehrpersonenfall ermöglichen eine höhere Genauigkeit und enthalten zusätzliche Eigenschaften wie die Robustheit der Okklusion, die im Kontext einer vollständigen Bewegungserfassungslösung hilfreich sind. Die Mehrpersonenformulierungen sind so konzipiert, dass sie unabhängig von der Anzahl der Subjekte in der Szene eine nahezu konstante Inferenzzeit haben. In Kombination mit Beiträgen zur schnellen Inferenz neuronaler Netze ermöglichen sie eine 3D-Posenschätzung in Echtzeit für mehrere Subjekte. Die Kombination der vorgeschlagenen lernbasierten Ansätze mit einem modellbasierten kinematischen Skelettanpassungsschritt liefert zeitlich stabile Gelenkwinkelschätzungen, die leicht zum Ansteuern virtueller Charaktere verwendet werden können

    Flight Mechanics/Estimation Theory Symposium, 1990

    Get PDF
    This conference publication includes 32 papers and abstracts presented at the Flight Mechanics/Estimation Theory Symposium on May 22-25, 1990. Sponsored by the Flight Dynamics Division of Goddard Space Flight Center, this symposium features technical papers on a wide range of issues related to orbit-attitude prediction, determination and control; attitude sensor calibration; attitude determination error analysis; attitude dynamics; and orbit decay and maneuver strategy. Government, industry, and the academic community participated in the preparation and presentation of these papers
    corecore