Search CORE

10 research outputs found

Learning Temporal Transformations From Time-Lapse Videos

Author: GE Hinton
J Yuen
KM Kitani
R Martin-Brualla
S Hochreiter
Y Shih
Publication venue
Publication date: 27/08/2016
Field of study

Based on life-long observations of physical, chemical, and biologic phenomena in the natural world, humans can often easily picture in their minds what an object will look like in the future. But, what about computers? In this paper, we learn computational models of object transformations from time-lapse videos. In particular, we explore the use of generative models to create depictions of objects at future times. These models explore several different prediction tasks: generating a future state given a single depiction of an object, generating a future state given two depictions of an object at different times, and generating future states recursively in a recurrent framework. We provide both qualitative and quantitative evaluations of the generated results, and also conduct a human evaluation to compare variations of our models.Comment: ECCV201

arXiv.org e-Print Archive

Crossref

Tree transformations and the semantics of loop-free programs

Author: Arbib Michael Anthony
Manes Ernest Gene
Publication venue
Publication date: 01/01/1978
Field of study

University of Szeged

Functor state machines

Author: Horváth Gábor
Publication venue
Publication date: 01/01/1981
Field of study

University of Szeged

Learning, Moving, And Predicting With Global Motion Representations

Author: Jaegle Andrew Coulter
Publication venue: ScholarlyCommons
Publication date: 01/01/2018
Field of study

In order to effectively respond to and influence the world they inhabit, animals and other intelligent agents must understand and predict the state of the world and its dynamics. An agent that can characterize how the world moves is better equipped to engage it. Current methods of motion computation rely on local representations of motion (such as optical flow) or simple, rigid global representations (such as camera motion). These methods are useful, but they are difficult to estimate reliably and limited in their applicability to real-world settings, where agents frequently must reason about complex, highly nonrigid motion over long time horizons. In this dissertation, I present methods developed with the goal of building more flexible and powerful notions of motion needed by agents facing the challenges of a dynamic, nonrigid world. This work is organized around a view of motion as a global phenomenon that is not adequately addressed by local or low-level descriptions, but that is best understood when analyzed at the level of whole images and scenes. I develop methods to: (i) robustly estimate camera motion from noisy optical flow estimates by exploiting the global, statistical relationship between the optical flow field and camera motion under projective geometry; (ii) learn representations of visual motion directly from unlabeled image sequences using learning rules derived from a formulation of image transformation in terms of its group properties; (iii) predict future frames of a video by learning a joint representation of the instantaneous state of the visual world and its motion, using a view of motion as transformations of world state. I situate this work in the broader context of ongoing computational and biological investigations into the problem of estimating motion for intelligent perception and action

ScholarlyCommons@Penn

Acta Cybernetica : Tomus 4. Fasciculus 1.

Author
Publication venue
Publication date: 01/01/1978
Field of study

University of Szeged

Acta Cybernetica : Tomus 5. Fasciculus 2.

Author
Publication venue
Publication date: 01/01/1981
Field of study

University of Szeged

Learning Beyond-pixel Mappings from Internet Videos

Author: Zhou Yipin
Publication venue: University of North Carolina at Chapel Hill Graduate School
Publication date: 01/01/2020
Field of study

Recently in the Computer Vision community, there have been significant advancements in algorithms to recognize or localize visual contents for both images and videos, for instance, object recognition and detection tasks. They infer the information that is directly visible within the images or video frames (predicting what’s in the frame). While human-level visual understanding could be much more than that, because human also have insights about the information ’beyond the frame’. In other words, people are able to reasonably infer information that is not visible from the current scenes, such as predicting possible future events. We expect the computational models could own the same capabilities one day. Learning beyond-pixel mappings can be a broad concept. In this dissertation, we carefully define and formulate the problems as specific and subdivided tasks from different aspects. Under this context, what beyond-pixel mapping does is to infer information of broader spatial or temporal context, or even information from other modalities like text or sound. We first present a computational framework to learn the mappings between short event video clips and their intrinsic temporal sequence (which one usually happens first). Then we keep exploring the follow-up direction by directly predicting the future. Specifically we utilize generative models to predict depictions of objects in their future state. Next, we explore a related generation task to generate video frames of the target person with unseen poses guided by a random person. Finally, we propose a framework to learn the mappings between input video frames and it’s counterpart in sound domain. The main contribution of this dissertation lies in exploring beyond-pixel mappings from various directions to add relevant knowledge to the next-generation AI platforms.Doctor of Philosoph

Carolina Digital Repository

Natural state transformations

Author: Arbib
Arbib
Baker
Barr
Beck
Dubuc
Eilenberg
Engelfriet
Ginzburg
Grätzer
Kleisli
Mac Lane
Manes
Meyer
Mezei
Rounds
Suad Alagić
Thatcher
Thatcher
Thatcher
Publication venue: 'Elsevier BV'
Publication date
Field of study

Crossref