14 research outputs found
Masked Autoencoder for Unsupervised Video Summarization
Summarizing a video requires a diverse understanding of the video, ranging
from recognizing scenes to evaluating how much each frame is essential enough
to be selected as a summary. Self-supervised learning (SSL) is acknowledged for
its robustness and flexibility to multiple downstream tasks, but the video SSL
has not shown its value for dense understanding tasks like video summarization.
We claim an unsupervised autoencoder with sufficient self-supervised learning
does not need any extra downstream architecture design or fine-tuning weights
to be utilized as a video summarization model. The proposed method to evaluate
the importance score of each frame takes advantage of the reconstruction score
of the autoencoder's decoder. We evaluate the method in major unsupervised
video summarization benchmarks to show its effectiveness under various
experimental settings
Spatiotemporal Augmentation on Selective Frequencies for Video Representation Learning
Recent self-supervised video representation learning methods focus on
maximizing the similarity between multiple augmented views from the same video
and largely rely on the quality of generated views. In this paper, we propose
frequency augmentation (FreqAug), a spatio-temporal data augmentation method in
the frequency domain for video representation learning. FreqAug stochastically
removes undesirable information from the video by filtering out specific
frequency components so that learned representation captures essential features
of the video for various downstream tasks. Specifically, FreqAug pushes the
model to focus more on dynamic features rather than static features in the
video via dropping spatial or temporal low-frequency components. In other
words, learning invariance between remaining frequency components results in
high-frequency enhanced representation with less static bias. To verify the
generality of the proposed method, we experiment with FreqAug on multiple
self-supervised learning frameworks along with standard augmentations.
Transferring the improved representation to five video action recognition and
two temporal action localization downstream tasks shows consistent improvements
over baselines
Comparison of gait before and after superficial trunk muscle exercise and deep trunk muscle exercise
Proceedings of the 2003 Winter Simulation Conference
We present a system called RUBE, which allows a modeler to customize model components and model structure in 2D and 3D. RUBE employs open source tools to assist in model authoring, allowing the user to visualize models with different metaphors. For example, it is possible to visualize an event graph as a city block, or a Petri network as an organically -oriented 3D machine. We suggest that such flexibility in visualization will allow existing model types to take on forms that may be more recognizable to modeling subcommunities, while employing notation as afforded by inexpensive graphical hardware. There is also a possibility to create model types using entirely new notations