510 research outputs found
Self-Supervised Video Hashing with Hierarchical Binary Auto-encoder
Existing video hash functions are built on three isolated stages: frame
pooling, relaxed learning, and binarization, which have not adequately explored
the temporal order of video frames in a joint binary optimization model,
resulting in severe information loss. In this paper, we propose a novel
unsupervised video hashing framework dubbed Self-Supervised Video Hashing
(SSVH), that is able to capture the temporal nature of videos in an end-to-end
learning-to-hash fashion. We specifically address two central problems: 1) how
to design an encoder-decoder architecture to generate binary codes for videos;
and 2) how to equip the binary codes with the ability of accurate video
retrieval. We design a hierarchical binary autoencoder to model the temporal
dependencies in videos with multiple granularities, and embed the videos into
binary codes with less computations than the stacked architecture. Then, we
encourage the binary codes to simultaneously reconstruct the visual content and
neighborhood structure of the videos. Experiments on two real-world datasets
(FCVID and YFCC) show that our SSVH method can significantly outperform the
state-of-the-art methods and achieve the currently best performance on the task
of unsupervised video retrieval
ARCHANGEL: Tamper-proofing Video Archives using Temporal Content Hashes on the Blockchain
We present ARCHANGEL; a novel distributed ledger based system for assuring
the long-term integrity of digital video archives. First, we describe a novel
deep network architecture for computing compact temporal content hashes (TCHs)
from audio-visual streams with durations of minutes or hours. Our TCHs are
sensitive to accidental or malicious content modification (tampering) but
invariant to the codec used to encode the video. This is necessary due to the
curatorial requirement for archives to format shift video over time to ensure
future accessibility. Second, we describe how the TCHs (and the models used to
derive them) are secured via a proof-of-authority blockchain distributed across
multiple independent archives. We report on the efficacy of ARCHANGEL within
the context of a trial deployment in which the national government archives of
the United Kingdom, Estonia and Norway participated.Comment: Accepted to CVPR Blockchain Workshop 201
Contrastive Masked Autoencoders for Self-Supervised Video Hashing
Self-Supervised Video Hashing (SSVH) models learn to generate short binary
representations for videos without ground-truth supervision, facilitating
large-scale video retrieval efficiency and attracting increasing research
attention. The success of SSVH lies in the understanding of video content and
the ability to capture the semantic relation among unlabeled videos. Typically,
state-of-the-art SSVH methods consider these two points in a two-stage training
pipeline, where they firstly train an auxiliary network by instance-wise
mask-and-predict tasks and secondly train a hashing model to preserve the
pseudo-neighborhood structure transferred from the auxiliary network. This
consecutive training strategy is inflexible and also unnecessary. In this
paper, we propose a simple yet effective one-stage SSVH method called ConMH,
which incorporates video semantic information and video similarity relationship
understanding in a single stage. To capture video semantic information for
better hashing learning, we adopt an encoder-decoder structure to reconstruct
the video from its temporal-masked frames. Particularly, we find that a higher
masking ratio helps video understanding. Besides, we fully exploit the
similarity relationship between videos by maximizing agreement between two
augmented views of a video, which contributes to more discriminative and robust
hash codes. Extensive experiments on three large-scale video datasets (i.e.,
FCVID, ActivityNet and YFCC) indicate that ConMH achieves state-of-the-art
results. Code is available at https://github.com/huangmozhi9527/ConMH.Comment: This work is accepted by the AAAI 2023. 9 pages, 6 figures, 6 table
- …