1,588 research outputs found

    Recycle-GAN: Unsupervised Video Retargeting

    Full text link
    We introduce a data-driven approach for unsupervised video retargeting that translates content from one domain to another while preserving the style native to a domain, i.e., if contents of John Oliver's speech were to be transferred to Stephen Colbert, then the generated content/speech should be in Stephen Colbert's style. Our approach combines both spatial and temporal information along with adversarial losses for content translation and style preservation. In this work, we first study the advantages of using spatiotemporal constraints over spatial constraints for effective retargeting. We then demonstrate the proposed approach for the problems where information in both space and time matters such as face-to-face translation, flower-to-flower, wind and cloud synthesis, sunrise and sunset.Comment: ECCV 2018; Please refer to project webpage for videos - http://www.cs.cmu.edu/~aayushb/Recycle-GA

    Digging Deeper into Egocentric Gaze Prediction

    Full text link
    This paper digs deeper into factors that influence egocentric gaze. Instead of training deep models for this purpose in a blind manner, we propose to inspect factors that contribute to gaze guidance during daily tasks. Bottom-up saliency and optical flow are assessed versus strong spatial prior baselines. Task-specific cues such as vanishing point, manipulation point, and hand regions are analyzed as representatives of top-down information. We also look into the contribution of these factors by investigating a simple recurrent neural model for ego-centric gaze prediction. First, deep features are extracted for all input video frames. Then, a gated recurrent unit is employed to integrate information over time and to predict the next fixation. We also propose an integrated model that combines the recurrent model with several top-down and bottom-up cues. Extensive experiments over multiple datasets reveal that (1) spatial biases are strong in egocentric videos, (2) bottom-up saliency models perform poorly in predicting gaze and underperform spatial biases, (3) deep features perform better compared to traditional features, (4) as opposed to hand regions, the manipulation point is a strong influential cue for gaze prediction, (5) combining the proposed recurrent model with bottom-up cues, vanishing points and, in particular, manipulation point results in the best gaze prediction accuracy over egocentric videos, (6) the knowledge transfer works best for cases where the tasks or sequences are similar, and (7) task and activity recognition can benefit from gaze prediction. Our findings suggest that (1) there should be more emphasis on hand-object interaction and (2) the egocentric vision community should consider larger datasets including diverse stimuli and more subjects.Comment: presented at WACV 201

    Learning Structured Representations of Spatial and Interactive Dynamics for Trajectory Prediction in Crowded Scenes

    Get PDF
    Context plays a significant role in the generation of motion for dynamic agents in interactive environments. This work proposes a modular method that utilises a learned model of the environment for motion prediction. This modularity explicitly allows for unsupervised adaptation of trajectory prediction models to unseen environments and new tasks by relying on unlabelled image data only. We model both the spatial and dynamic aspects of a given environment alongside the per agent motions. This results in more informed motion prediction and allows for performance comparable to the state-of-the-art. We highlight the model's prediction capability using a benchmark pedestrian prediction problem and a robot manipulation task and show that we can transfer the predictor across these tasks in a completely unsupervised way. The proposed approach allows for robust and label efficient forward modelling, and relaxes the need for full model re-training in new environments

    Prediction of social dynamic agents and long-tailed learning challenges: a survey

    Get PDF
    Autonomous robots that can perform common tasks like driving, surveillance, and chores have the biggest potential for impact due to frequency of usage, and the biggest potential for risk due to direct interaction with humans. These tasks take place in openended environments where humans socially interact and pursue their goals in complex and diverse ways. To operate in such environments, such systems must predict this behaviour, especially when the behavior is unexpected and potentially dangerous. Therefore, we summarize trends in various types of tasks, modeling methods, datasets, and social interaction modules aimed at predicting the future location of dynamic, socially interactive agents. Furthermore, we describe long-tailed learning techniques from classification and regression problems that can be applied to prediction problems. To our knowledge this is the first work that reviews social interaction modeling within prediction, and long-tailed learning techniques within regression and prediction
    • …
    corecore