953 research outputs found
Motion clouds: model-based stimulus synthesis of natural-like random textures for the study of motion perception
Choosing an appropriate set of stimuli is essential to characterize the
response of a sensory system to a particular functional dimension, such as the
eye movement following the motion of a visual scene. Here, we describe a
framework to generate random texture movies with controlled information
content, i.e., Motion Clouds. These stimuli are defined using a generative
model that is based on controlled experimental parametrization. We show that
Motion Clouds correspond to dense mixing of localized moving gratings with
random positions. Their global envelope is similar to natural-like stimulation
with an approximate full-field translation corresponding to a retinal slip. We
describe the construction of these stimuli mathematically and propose an
open-source Python-based implementation. Examples of the use of this framework
are shown. We also propose extensions to other modalities such as color vision,
touch, and audition
Globally-Coordinated Locally-Linear Modeling of Multi-Dimensional Data
This thesis considers the problem of modeling and analysis of continuous, locally-linear, multi-dimensional spatio-temporal data. Our work extends the previously reported theoretical work on the global coordination model to temporal analysis of continuous, multi-dimensional data. We have developed algorithms for time-varying data analysis and used them in full-scale, real-world applications. The applications demonstrated in this thesis include tracking, synthesis, recognitions and retrieval of dynamic objects based on their shape, appearance and motion. The proposed approach in this thesis has advantages over existing approaches to analyzing complex spatio-temporal data. Experiments show that the new modeling features of our approach improve the performance of existing approaches in many applications. In object tracking, our approach is the first one to track nonlinear appearance variations by using low-dimensional representation of the appearance change in globally-coordinated linear subspaces. In dynamic texture synthesis, we are able to model non-stationary dynamic textures, which cannot be handled by any of the existing approaches. In human motion synthesis, we show that realistic synthesis can be performed without using specific transition points, or key frames
Modeling and Recognizing Binary Human Interactions
Recognizing human activities from video is an important step forward towards the long-term goal of performing scene understanding fully automatically. Applications in this domain include, but are not limited to, the automated analysis of video surveillance footage for public and private monitoring, remote patient and elderly home monitoring, video archiving, search and retrieval, human-computer interaction, and robotics. While recent years have seen a concentration of works focusing on modeling and recognizing either group activities, or actions performed by people in isolation, modeling and recognizing binary human-human interactions is a fundamental building block that only recently has started to catalyze the attention of researchers.;This thesis introduces a new modeling framework for binary human-human interactions. The main idea is to describe interactions with spatio-temporal trajectories. Interaction trajectories can then be modeled as the output of dynamical systems, and recognizing interactions entails designing a suitable way for comparing them. This poses several challenges, starting from the type of information that should be captured by the trajectories, which defines the geometry structure of the output space of the systems. In addition, decision functions performing the recognition should account for the fact that the people interacting do not have a predefined ordering. This work addresses those challenges by carefully designing a kernel-based approach that combines non-linear dynamical system modeling with kernel PCA. Experimental results computed on three recently published datasets, clearly show the promise of this approach, where the classification accuracy, and the retrieval precision are comparable or better than the state-of-the-art
Building Machines That Learn and Think Like People
Recent progress in artificial intelligence (AI) has renewed interest in
building systems that learn and think like people. Many advances have come from
using deep neural networks trained end-to-end in tasks such as object
recognition, video games, and board games, achieving performance that equals or
even beats humans in some respects. Despite their biological inspiration and
performance achievements, these systems differ from human intelligence in
crucial ways. We review progress in cognitive science suggesting that truly
human-like learning and thinking machines will have to reach beyond current
engineering trends in both what they learn, and how they learn it.
Specifically, we argue that these machines should (a) build causal models of
the world that support explanation and understanding, rather than merely
solving pattern recognition problems; (b) ground learning in intuitive theories
of physics and psychology, to support and enrich the knowledge that is learned;
and (c) harness compositionality and learning-to-learn to rapidly acquire and
generalize knowledge to new tasks and situations. We suggest concrete
challenges and promising routes towards these goals that can combine the
strengths of recent neural network advances with more structured cognitive
models.Comment: In press at Behavioral and Brain Sciences. Open call for commentary
proposals (until Nov. 22, 2016).
https://www.cambridge.org/core/journals/behavioral-and-brain-sciences/information/calls-for-commentary/open-calls-for-commentar
Active inference, eye movements and oculomotor delays.
This paper considers the problem of sensorimotor delays in the optimal control of (smooth) eye movements under uncertainty. Specifically, we consider delays in the visuo-oculomotor loop and their implications for active inference. Active inference uses a generalisation of Kalman filtering to provide Bayes optimal estimates of hidden states and action in generalised coordinates of motion. Representing hidden states in generalised coordinates provides a simple way of compensating for both sensory and oculomotor delays. The efficacy of this scheme is illustrated using neuronal simulations of pursuit initiation responses, with and without compensation. We then consider an extension of the generative model to simulate smooth pursuit eye movements-in which the visuo-oculomotor system believes both the target and its centre of gaze are attracted to a (hidden) point moving in the visual field. Finally, the generative model is equipped with a hierarchical structure, so that it can recognise and remember unseen (occluded) trajectories and emit anticipatory responses. These simulations speak to a straightforward and neurobiologically plausible solution to the generic problem of integrating information from different sources with different temporal delays and the particular difficulties encountered when a system-like the oculomotor system-tries to control its environment with delayed signals
Human Interaction Recognition with Audio and Visual Cues
The automated recognition of human activities from video is a fundamental problem with applications in several areas, ranging from video surveillance, and robotics, to smart healthcare, and multimedia indexing and retrieval, just to mention a few. However, the pervasive diffusion of cameras capable of recording audio also makes available to those applications a complementary modality. Despite the sizable progress made in the area of modeling and recognizing group activities, and actions performed by people in isolation from video, the availability of audio cues has rarely being leveraged. This is even more so in the area of modeling and recognizing binary interactions between humans, where also the use of video has been limited.;This thesis introduces a modeling framework for binary human interactions based on audio and visual cues. The main idea is to describe an interaction with a spatio-temporal trajectory modeling the visual motion cues, and a temporal trajectory modeling the audio cues. This poses the problem of how to fuse temporal trajectories from multiple modalities for the purpose of recognition. We propose a solution whereby trajectories are modeled as the output of kernel state space models. Then, we developed kernel-based methods for the audio-visual fusion that act at the feature level, as well as at the kernel level, by exploiting multiple kernel learning techniques. The approaches have been extensively tested and evaluated with a dataset made of videos obtained from TV shows and Hollywood movies, containing five different interactions. The results show the promise of this approach by producing a significant improvement of the recognition rate when audio cues are exploited, clearly setting the state-of-the-art in this particular application
SEGMENTATION, RECOGNITION, AND ALIGNMENT OF COLLABORATIVE GROUP MOTION
Modeling and recognition of human motion in videos has broad applications in behavioral biometrics, content-based visual data analysis, security and surveillance, as well as designing interactive environments. Significant progress has been made in the past two decades by way of new models, methods, and implementations. In this dissertation, we focus our attention on a relatively less investigated sub-area called collaborative group motion analysis. Collaborative group motions are those that typically involve multiple objects, wherein the motion patterns of individual objects may vary significantly in both space and time, but the collective motion pattern of the ensemble allows characterization in terms of geometry and statistics. Therefore, the motions or activities of an individual object constitute local information. A framework to synthesize all local information into a holistic view, and to explicitly characterize interactions among objects, involves large scale global reasoning, and is of significant complexity. In this dissertation, we first review relevant previous contributions on human motion/activity modeling and recognition, and then propose several approaches to answer a sequence of traditional vision questions including 1) which of the motion elements among all are the ones relevant to a group motion pattern of interest (Segmentation); 2) what is the underlying motion pattern (Recognition); and 3) how two motion ensembles are similar and how we can 'optimally' transform one to match the other (Alignment). Our primary practical scenario is American football play, where the corresponding problems are 1) who are offensive players; 2) what are the offensive strategy they are using; and 3) whether two plays are using the same strategy and how we can remove the spatio-temporal misalignment between them due to internal or external factors. The proposed approaches discard traditional modeling paradigm but explore either concise descriptors, hierarchies, stochastic mechanism, or compact generative model to achieve both effectiveness and efficiency.
In particular, the intrinsic geometry of the spaces of the involved features/descriptors/quantities is exploited and statistical tools are established on these nonlinear manifolds. These initial attempts have identified new challenging problems in complex motion analysis, as well as in more general tasks in video dynamics. The insights gained from nonlinear geometric modeling and analysis in this dissertation may hopefully be useful toward a broader class of computer vision applications
- …