363,073 research outputs found
Describing Videos by Exploiting Temporal Structure
Recent progress in using recurrent neural networks (RNNs) for image
description has motivated the exploration of their application for video
description. However, while images are static, working with videos requires
modeling their dynamic temporal structure and then properly integrating that
information into a natural language description. In this context, we propose an
approach that successfully takes into account both the local and global
temporal structure of videos to produce descriptions. First, our approach
incorporates a spatial temporal 3-D convolutional neural network (3-D CNN)
representation of the short temporal dynamics. The 3-D CNN representation is
trained on video action recognition tasks, so as to produce a representation
that is tuned to human motion and behavior. Second we propose a temporal
attention mechanism that allows to go beyond local temporal modeling and learns
to automatically select the most relevant temporal segments given the
text-generating RNN. Our approach exceeds the current state-of-art for both
BLEU and METEOR metrics on the Youtube2Text dataset. We also present results on
a new, larger and more challenging dataset of paired video and natural language
descriptions.Comment: Accepted to ICCV15. This version comes with code release and
supplementary materia
Data Visualization and Animation Lab (DVAL) overview
The general capabilities of the Langley Research Center Data Visualization and Animation Laboratory is described. These capabilities include digital image processing, 3-D interactive computer graphics, data visualization and analysis, video-rate acquisition and processing of video images, photo-realistic modeling and animation, video report generation, and color hardcopies. A specialized video image processing system is also discussed
Motion-Based Sign Language Video Summarization using Curvature and Torsion
An interesting problem in many video-based applications is the generation of
short synopses by selecting the most informative frames, a procedure which is
known as video summarization. For sign language videos the benefits of using
the -parameterized counterpart of the curvature of the 2-D signer's wrist
trajectory to identify keyframes, have been recently reported in the
literature. In this paper we extend these ideas by modeling the 3-D hand motion
that is extracted from each frame of the video. To this end we propose a new
informative function based on the -parameterized curvature and torsion of
the 3-D trajectory. The method to characterize video frames as keyframes
depends on whether the motion occurs in 2-D or 3-D space. Specifically, in the
case of 3-D motion we look for the maxima of the harmonic mean of the curvature
and torsion of the target's trajectory; in the planar motion case we seek for
the maxima of the trajectory's curvature. The proposed 3-D feature is
experimentally evaluated in applications of sign language videos on (1)
objective measures using ground-truth keyframe annotations, (2) human-based
evaluation of understanding, and (3) gloss classification and the results
obtained are promising.Comment: This work has been submitted to the IEEE for possible publication.
Copyright may be transferred without notice, after which this version may no
longer be accessibl
Effects of point-of-view modeling to teach life skills to students with cognitive impairments
The purposes of this study were to evaluate the effectiveness of point-of-view (POV) video modeling in teaching culinary skills to elementary students with developmental and cognitive disabilities, compare when the video contains narration versus sound indicators, and to evaluate students\u27 maintenance of gained skills without watching the video. A total of 8 students in grades 3-5 with an age range from 8-11 participated in the study. A single subject research design with ABCD phases was used in this study (Phase A baseline, Phase B and C intervention, and Phase D maintenance). Results showed that students gained skills during the intervention and maintained them without viewing the video or practicing the target skills. It seems that video-based instruction has potential for teaching students with cognitive disabilities
Special Section Guest Editorial: Image/Video Quality and System Performance
Rapid developments in display technologies, digital printing, imaging sensors, image processing and image transmission are providing new possibilities for creating and conveying visual content. In an age in which images and video are ubiquitous and where mobile, satellite, and three-dimensional (3-D) imaging have become ordinary experiences, quantification of the performance of modern imaging systems requires appropriate approaches. At the end of the imaging chain, a human observer must decide whether images and video are of a satisfactory visual quality. Hence the measurement and modeling of perceived image quality is of crucial importance, not only in visual arts and commercial applications but also in scientific and entertainment environments. Advances in our understanding of the human visual system offer new possibilities for creating visually superior imaging systems and promise more accurate modeling of image quality. As a result, there is a profusion of new research on imaging performance and perceived quality
Boundary value for a nonlinear transport equation emerging from a stochastic coagulation-fragmentation type model
We investigate the connection between two classical models of phase
transition phenomena, the (discrete size) stochastic Becker-D\"oring, a
continous time Markov chain model, and the (continuous size) deterministic
Lifshitz-Slyozov model, a nonlinear transport partial differential equation.
For general coefficients and initial data, we introduce a scaling parameter and
prove that the empirical measure associated to the stochastic Becker-D\"oring
system converges in law to the weak solution of the Lifshitz-Slyozov equation
when the parameter goes to 0. Contrary to previous studies, we use a weak
topology that includes the boundary of the state space (\ie\ the size )
allowing us to rigorously derive a boundary value for the Lifshitz-Slyozov
model in the case of incoming characteristics. The condition reads where is the volume distribution
function, solution of the Lifshitz-Slyozov equation, and the
aggregation and fragmentation rates, the concentration of free particles
and a nucleation constant emerging from the microscopic model. It is
the main novelty of this work and it answers to a question that has been
conjectured or suggested by both mathematicians and physicists. We emphasize
that this boundary value depends on a particular scaling (as opposed to a
modeling choice) and is the result of a separation of time scale and an
averaging of fast (fluctuating) variables.Comment: 42 pages, 3 figures, video on supplementary materials at
http://yvinec.perso.math.cnrs.fr/video.htm
Visual communication and entertainment through animation
Virtual animation is used today for everything including entertainment in motion pictures and video games, advertising on television and the internet, virtual animated videos used for industrial teaching aids, and project approvals for major building construction. Many modem companies are now insisting that new products are created using 3-D modeling and occasionally animation before approving funds for further development.
The research question in this work centers around thoughts and visions being effectively communicated so others can comprehend and share in the same perspective. This research will show the use of technology in answering this important question: Exploration of the literature will describe the early beginnings and the slow progression followed by the movement into the current \u27\u27high tech era of virtual reality
A Line-Of-Slight Sensor Network for Wide Area Video Surveillance: Simulation and Evaluation
Substantial performance improvement of a wide area video surveillance network can be obtained with the addition of a Line-of-Sight sensor. The research described in this thesis shows that while the Line-of-Sight sensor cannot monitor areas with the ubiquity of video cameras alone, the combined network produces substantially fewer false alarms and superior location precision for numerous moving people than video. Recent progress in the fabrication of inexpensive, robust CMOS-based video cameras have triggered a new approach to wide area surveillance of busy areas such as modeling an airport corridor as a distributed sensor network problem. Wireless communication between these cameras and other sensors make it more practical to deploy them in an arbitrary spatial configuration to unobtrusively monitor cooperative and non-cooperative people. The computation and communication to establish image registration between the cameras grows rapidly as the number of cameras increases. Computation is required to detect people in each image, establish a correspondence between people in two or more images, compute exact 3-D positions from each corresponding pair, temporally track targets in space and time, and assimilate resultant data until thresholds have been reached to either cause an alarm or abandon further monitoring of that person. Substantial improvement can be obtained with the addition of a Line-of-Sight sensor as a location detection system to decoupling the detection, localization, and identification subtasks. That is, if the where can be answered by a location detection system, the what can be addressed by the video most effectively
- …