Search CORE

9,382 research outputs found

Towards automatic extraction of expressive elements from motion pictures : tempo

Author: Adams Brett
Dorai Chitra
Venkatesh Svetha
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2000
Field of study

This paper proposes a unique computational approach to extraction of expressive elements of motion pictures for deriving high level semantics of stories portrayed, thus enabling better video annotation and interpretation systems. This approach, motivated and directed by the existing cinematic conventions known as film grammar, as a first step towards demonstrating its effectiveness, uses the attributes of motion and shot length to define and compute a novel measure of tempo of a movie. Tempo flow plots are defined and derived for four full-length movies and edge analysis is performed leading to the extraction of dramatic story sections and events signaled by their unique tempo. The results confirm tempo as a useful attribute in its own right and a promising component of semantic constructs such as tone or mood of a film

CiteSeerX

Deakin Research Online

Basic gestures as spatiotemporal reference frames for repetitive dance/music patterns in samba and charleston

Author: Leman Marc
Naveda Luiz Alberto
Publication venue: 'University of California Press'
Publication date: 01/01/2010
Field of study

THE GOAL OF THE PRESENT STUDY IS TO GAIN BETTER insight into how dancers establish, through dancing, a spatiotemporal reference frame in synchrony with musical cues. With the aim of achieving this, repetitive dance patterns of samba and Charleston were recorded using a three-dimensional motion capture system. Geometric patterns then were extracted from each joint of the dancer's body. The method uses a body-centered reference frame and decomposes the movement into non-orthogonal periodicities that match periods of the musical meter. Musical cues (such as meter and loudness) as well as action-based cues (such as velocity) can be projected onto the patterns, thus providing spatiotemporal reference frames, or 'basic gestures,' for action-perception couplings. Conceptually speaking, the spatiotemporal reference frames control minimum effort points in action-perception couplings. They reside as memory patterns in the mental and/or motor domains, ready to be dynamically transformed in dance movements. The present study raises a number of hypotheses related to spatial cognition that may serve as guiding principles for future dance/music studies

Ghent University Academic Bibliography

Biologically Plausible Neural Model for the Recognition of Biological Motion and Actions

Author: Giese Martin Alexander
Poggio Tomaso
Publication venue
Publication date: 01/08/2002
Field of study

The visual recognition of complex movements and actions is crucial for communication and survival in many species. Remarkable sensitivity and robustness of biological motion perception have been demonstrated in psychophysical experiments. In recent years, neurons and cortical areas involved in action recognition have been identified in neurophysiological and imaging studies. However, the detailed neural mechanisms that underlie the recognition of such complex movement patterns remain largely unknown. This paper reviews the experimental results and summarizes them in terms of a biologically plausible neural model. The model is based on the key assumption that action recognition is based on learned prototypical patterns and exploits information from the ventral and the dorsal pathway. The model makes specific predictions that motivate new experiments

DSpace@MIT

Crossref

Proceedings of the Salford Postgraduate Annual Research Conference (SPARC) 2011

Author: Al Azawi W
Al Ghazali FAM
Al Jameel H
Al Rawahi M
Aljunaidy MM
Berlin J
Darlington J
Demetriou K
Demir ST
Doran C
Egbu JU
Elagili GY
Elferjani M
Elmagri MI
Elmsallati NA
Griffiths L
Hamlyn CP
Hardman M
Khashkhush AS
Kiddy PA
Lord JD
Masrilayanti M
Mohamed MII
Morrison SA
Mwenesongole E
Opoku A
Ozmen ES
Rigby MJ
ur Rehman S
Yates MT
Yusufu AS
Publication venue: University of Salford
Publication date: 01/08/2012
Field of study

These proceedings bring together a selection of papers from the 2011 Salford Postgraduate Annual Research Conference(SPARC). It includes papers from PhD students in the arts and social sciences, business, computing, science and engineering, education, environment, built environment and health sciences. Contributions from Salford researchers are published here alongside papers from students at the Universities of Anglia Ruskin, Birmingham City, Chester,De Montfort, Exeter, Leeds, Liverpool, Liverpool John Moores and Manchester

University of Salford Institutional Repository

A robust and efficient video representation for action recognition

Author: Oneata Dan
Schmid Cordelia
Verbeek Jakob
Wang Heng
Publication venue
Publication date: 21/04/2015
Field of study

This paper introduces a state-of-the-art video representation and applies it to efficient action recognition and detection. We first propose to improve the popular dense trajectory features by explicit camera motion estimation. More specifically, we extract feature point matches between frames using SURF descriptors and dense optical flow. The matches are used to estimate a homography with RANSAC. To improve the robustness of homography estimation, a human detector is employed to remove outlier matches from the human body as human motion is not constrained by the camera. Trajectories consistent with the homography are considered as due to camera motion, and thus removed. We also use the homography to cancel out camera motion from the optical flow. This results in significant improvement on motion-based HOF and MBH descriptors. We further explore the recent Fisher vector as an alternative feature encoding approach to the standard bag-of-words histogram, and consider different ways to include spatial layout information in these encodings. We present a large and varied set of evaluations, considering (i) classification of short basic actions on six datasets, (ii) localization of such actions in feature-length movies, and (iii) large-scale recognition of complex events. We find that our improved trajectory features significantly outperform previous dense trajectories, and that Fisher vectors are superior to bag-of-words encodings for video recognition tasks. In all three tasks, we show substantial improvements over the state-of-the-art results

arXiv.org e-Print Archive

Crossref

Hal - Université Grenoble Alpes

INRIA a CCSD electronic archive server

Robot Learning Dual-Arm Manipulation Tasks by Trial-and-Error and Multiple Human Demonstrations

Author: Kumra Sulabh
Publication venue: RIT Scholar Works
Publication date: 01/07/2015
Field of study

In robotics, there is a need of an interactive and expedite learning method as experience is expensive. In this research, we propose two different methods to make a humanoid robot learn manipulation tasks: Learning by trial-and-error, and Learning from demonstrations. Just like the way a child learns a new task assigned to him by trying all possible alternatives and further learning from his mistakes, the robot learns in the same manner in learning by trial-and error. We used Q-learning algorithm, in which the robot tries all the possible ways to do a task and creates a matrix that consists of Q-values based on the rewards it received for the actions performed. Using this method, the robot was made to learn dance moves based on a music track. Robot Learning from Demonstrations (RLfD) enable a human user to add new capabilities to a robot in an intuitive manner without explicitly reprogramming it. In this method, the robot learns skill from demonstrations performed by a human teacher. The robot extracts features from each demonstration called as key-points and learns a model of the demonstrated task or trajectory using Hidden Markov Model (HMM). The learned model is further used to produce a generalized trajectory. In the end, we discuss the differences between two developed systems and make conclusions based on the experiments performed

RIT Scholar Works

Surmising synchrony of sound and sight:Factors explaining variance of audiovisual integration in hurdling, tap dancing and drumming

Author: Heins Nina
Kluger Daniel S
Kohler Axel
Kornysheva Katja
Pomp Jennifer
Raab Markus
Schubotz Ricarda I
Trempler Ima
Vinbrüx Stefan
Zentgraf Karen
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 22/07/2021
Field of study

Auditory and visual percepts are integrated even when they are not perfectly temporally aligned with each other, especially when the visual signal precedes the auditory signal. This window of temporal integration for asynchronous audiovisual stimuli is relatively well examined in the case of speech, while other natural action-induced sounds have been widely neglected. Here, we studied the detection of audiovisual asynchrony in three different whole-body actions with natural action-induced sounds–hurdling, tap dancing and drumming. In Study 1, we examined whether audiovisual asynchrony detection, assessed by a simultaneity judgment task, differs as a function of sound production intentionality. Based on previous findings, we expected that auditory and visual signals should be integrated over a wider temporal window for actions creating sounds intentionally (tap dancing), compared to actions creating sounds incidentally (hurdling). While percentages of perceived synchrony differed in the expected way, we identified two further factors, namely high event density and low rhythmicity, to induce higher synchrony ratings as well. Therefore, we systematically varied event density and rhythmicity in Study 2, this time using drumming stimuli to exert full control over these variables, and the same simultaneity judgment tasks. Results suggest that high event density leads to a bias to integrate rather than segregate auditory and visual signals, even at relatively large asynchronies. Rhythmicity had a similar, albeit weaker effect, when event density was low. Our findings demonstrate that shorter asynchronies and visual-first asynchronies lead to higher synchrony ratings of whole-body action, pointing to clear parallels with audiovisual integration in speech perception. Overconfidence in the naturally expected, that is, synchrony of sound and sight, was stronger for intentional (vs. incidental) sound production and for movements with high (vs. low) rhythmicity, presumably because both encourage predictive processes. In contrast, high event density appears to increase synchronicity judgments simply because it makes the detection of audiovisual asynchrony more difficult. More studies using real-life audiovisual stimuli with varying event densities and rhythmicities are needed to fully uncover the general mechanisms of audiovisual integration

University of Birmingham Research Portal

Bangor University Research Portal

Hochschulschriftenserver - Universität Frankfurt am Main

The Extraction of Symbolic Postures to Transfer Social Cues into Robot

Author: De Silva P. Ravindra S.
Herath Susantha
Higashi Masatake
Lambacher Stephen G.
Madurapperuma Ajith P.
Matsumoto Tohru
Publication venue: 'IntechOpen'
Publication date: 01/01/2010
Field of study

IntechOpen

Tri-level Unified Framework for Human Gait Analysis

Author: Yumnam Jayanta Singh S. Nissi Paul,
Publication venue: Assam Don Bosco University
Publication date: 01/03/2016
Field of study

There are several applications that can be related to multimedia content analysis. Considering video as one of the prominent forms of multimedia content, this paper presents analysis of human walking motion (gait) found in video sequences by using promising strategy of integrating techniques from data fusion and computer vision. To provide solutions to the challenges in human gait analysis a unified framework is proposed comprising of three different levels: data level, feature descriptor level and decision level. The three levels perform specific tasks assigned to them. At the data level, features are extracted from input video sequences for minimal representation. At the feature descriptor level, features from minimal representation are rearranged to build a feature descriptor and finally at decision level meaningful interpretations are performed. For analysing human walking motion found in video sequences, initially, moving silhouettes are extracted using background subtraction for minimal representation at the data level. The extracted silhouettes are then represented in a common representation in a spatial form followed by correlation analysis and a feature descriptor is developed with minimum interest points at the feature descriptor level. Finally, interpretation of normal gait poses and transition poses are made at the decision level.Keywords:Multimedia content; Data Fusion; Unified Framework; Background Subtraction;Correlation; Feature Descriptor; interpretation of Gaits

Assam Don Bosco University Journals