363,073 research outputs found

    Describing Videos by Exploiting Temporal Structure

    Full text link
    Recent progress in using recurrent neural networks (RNNs) for image description has motivated the exploration of their application for video description. However, while images are static, working with videos requires modeling their dynamic temporal structure and then properly integrating that information into a natural language description. In this context, we propose an approach that successfully takes into account both the local and global temporal structure of videos to produce descriptions. First, our approach incorporates a spatial temporal 3-D convolutional neural network (3-D CNN) representation of the short temporal dynamics. The 3-D CNN representation is trained on video action recognition tasks, so as to produce a representation that is tuned to human motion and behavior. Second we propose a temporal attention mechanism that allows to go beyond local temporal modeling and learns to automatically select the most relevant temporal segments given the text-generating RNN. Our approach exceeds the current state-of-art for both BLEU and METEOR metrics on the Youtube2Text dataset. We also present results on a new, larger and more challenging dataset of paired video and natural language descriptions.Comment: Accepted to ICCV15. This version comes with code release and supplementary materia

    Data Visualization and Animation Lab (DVAL) overview

    Get PDF
    The general capabilities of the Langley Research Center Data Visualization and Animation Laboratory is described. These capabilities include digital image processing, 3-D interactive computer graphics, data visualization and analysis, video-rate acquisition and processing of video images, photo-realistic modeling and animation, video report generation, and color hardcopies. A specialized video image processing system is also discussed

    Motion-Based Sign Language Video Summarization using Curvature and Torsion

    Full text link
    An interesting problem in many video-based applications is the generation of short synopses by selecting the most informative frames, a procedure which is known as video summarization. For sign language videos the benefits of using the tt-parameterized counterpart of the curvature of the 2-D signer's wrist trajectory to identify keyframes, have been recently reported in the literature. In this paper we extend these ideas by modeling the 3-D hand motion that is extracted from each frame of the video. To this end we propose a new informative function based on the tt-parameterized curvature and torsion of the 3-D trajectory. The method to characterize video frames as keyframes depends on whether the motion occurs in 2-D or 3-D space. Specifically, in the case of 3-D motion we look for the maxima of the harmonic mean of the curvature and torsion of the target's trajectory; in the planar motion case we seek for the maxima of the trajectory's curvature. The proposed 3-D feature is experimentally evaluated in applications of sign language videos on (1) objective measures using ground-truth keyframe annotations, (2) human-based evaluation of understanding, and (3) gloss classification and the results obtained are promising.Comment: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessibl

    Effects of point-of-view modeling to teach life skills to students with cognitive impairments

    Get PDF
    The purposes of this study were to evaluate the effectiveness of point-of-view (POV) video modeling in teaching culinary skills to elementary students with developmental and cognitive disabilities, compare when the video contains narration versus sound indicators, and to evaluate students\u27 maintenance of gained skills without watching the video. A total of 8 students in grades 3-5 with an age range from 8-11 participated in the study. A single subject research design with ABCD phases was used in this study (Phase A baseline, Phase B and C intervention, and Phase D maintenance). Results showed that students gained skills during the intervention and maintained them without viewing the video or practicing the target skills. It seems that video-based instruction has potential for teaching students with cognitive disabilities

    Special Section Guest Editorial: Image/Video Quality and System Performance

    Get PDF
    Rapid developments in display technologies, digital printing, imaging sensors, image processing and image transmission are providing new possibilities for creating and conveying visual content. In an age in which images and video are ubiquitous and where mobile, satellite, and three-dimensional (3-D) imaging have become ordinary experiences, quantification of the performance of modern imaging systems requires appropriate approaches. At the end of the imaging chain, a human observer must decide whether images and video are of a satisfactory visual quality. Hence the measurement and modeling of perceived image quality is of crucial importance, not only in visual arts and commercial applications but also in scientific and entertainment environments. Advances in our understanding of the human visual system offer new possibilities for creating visually superior imaging systems and promise more accurate modeling of image quality. As a result, there is a profusion of new research on imaging performance and perceived quality

    Boundary value for a nonlinear transport equation emerging from a stochastic coagulation-fragmentation type model

    Full text link
    We investigate the connection between two classical models of phase transition phenomena, the (discrete size) stochastic Becker-D\"oring, a continous time Markov chain model, and the (continuous size) deterministic Lifshitz-Slyozov model, a nonlinear transport partial differential equation. For general coefficients and initial data, we introduce a scaling parameter and prove that the empirical measure associated to the stochastic Becker-D\"oring system converges in law to the weak solution of the Lifshitz-Slyozov equation when the parameter goes to 0. Contrary to previous studies, we use a weak topology that includes the boundary of the state space (\ie\ the size x=0x=0) allowing us to rigorously derive a boundary value for the Lifshitz-Slyozov model in the case of incoming characteristics. The condition reads limx0(a(x)u(t)b(x))f(t,x)=αu(t)2\lim_{x\to 0} (a(x)u(t)-b(x))f(t,x) = \alpha u(t)^2 where ff is the volume distribution function, solution of the Lifshitz-Slyozov equation, aa and bb the aggregation and fragmentation rates, uu the concentration of free particles and α\alpha a nucleation constant emerging from the microscopic model. It is the main novelty of this work and it answers to a question that has been conjectured or suggested by both mathematicians and physicists. We emphasize that this boundary value depends on a particular scaling (as opposed to a modeling choice) and is the result of a separation of time scale and an averaging of fast (fluctuating) variables.Comment: 42 pages, 3 figures, video on supplementary materials at http://yvinec.perso.math.cnrs.fr/video.htm

    Visual communication and entertainment through animation

    Get PDF
    Virtual animation is used today for everything including entertainment in motion pictures and video games, advertising on television and the internet, virtual animated videos used for industrial teaching aids, and project approvals for major building construction. Many modem companies are now insisting that new products are created using 3-D modeling and occasionally animation before approving funds for further development. The research question in this work centers around thoughts and visions being effectively communicated so others can comprehend and share in the same perspective. This research will show the use of technology in answering this important question: Exploration of the literature will describe the early beginnings and the slow progression followed by the movement into the current \u27\u27high tech era of virtual reality

    A Line-Of-Slight Sensor Network for Wide Area Video Surveillance: Simulation and Evaluation

    Get PDF
    Substantial performance improvement of a wide area video surveillance network can be obtained with the addition of a Line-of-Sight sensor. The research described in this thesis shows that while the Line-of-Sight sensor cannot monitor areas with the ubiquity of video cameras alone, the combined network produces substantially fewer false alarms and superior location precision for numerous moving people than video. Recent progress in the fabrication of inexpensive, robust CMOS-based video cameras have triggered a new approach to wide area surveillance of busy areas such as modeling an airport corridor as a distributed sensor network problem. Wireless communication between these cameras and other sensors make it more practical to deploy them in an arbitrary spatial configuration to unobtrusively monitor cooperative and non-cooperative people. The computation and communication to establish image registration between the cameras grows rapidly as the number of cameras increases. Computation is required to detect people in each image, establish a correspondence between people in two or more images, compute exact 3-D positions from each corresponding pair, temporally track targets in space and time, and assimilate resultant data until thresholds have been reached to either cause an alarm or abandon further monitoring of that person. Substantial improvement can be obtained with the addition of a Line-of-Sight sensor as a location detection system to decoupling the detection, localization, and identification subtasks. That is, if the where can be answered by a location detection system, the what can be addressed by the video most effectively
    corecore