7,824 research outputs found
DC-image for real time compressed video matching
This chapter presents a suggested framework for video matching based on local features extracted from the DC-image of MPEG compressed videos, without full decompression. In addition, the relevant arguments and supporting evidences are discussed. Several local feature detectors will be examined to select the best for matching using the DC-image. Two experiments are carried to support the above. The first is comparing between the DC-image and I-frame, in terms of matching performance and computation complexity. The second experiment compares between using local features and global features regarding compressed video matching with respect to the DC-image. The results confirmed that the use of DC-image, despite its highly reduced size, it is promising as it produces higher matching precision, compared to the full I-frame. Also, SIFT, as a local feature, outperforms most of the standard global features. On the other hand, its computation complexity is relatively higher, but it is still within the real-time margin which leaves a space for further optimizations that can be done to improve this computation complexity
Video matching using DC-image and local features
This paper presents a suggested framework for video matching based on local features extracted from the DCimage of MPEG compressed videos, without decompression. The relevant arguments and supporting evidences are discussed for developing video similarity techniques that works directly on compressed videos, without decompression, and especially utilising small size images. Two experiments are carried to support the above. The first is comparing between the DC-image and I-frame, in terms of matching performance and the corresponding computation complexity. The second experiment compares between using local features and global features in video matching, especially in the compressed domain and with the small size images. The results confirmed that the use of DC-image, despite its highly reduced size, is promising as it produces at least similar (if not better) matching precision, compared to the full I-frame. Also, using SIFT, as a local feature, outperforms precision of most of the standard global features. On the other hand, its computation complexity is relatively higher, but it is still within the realtime margin. There are also various optimisations that can be done to improve this computation complexity
ARCHANGEL: Tamper-proofing Video Archives using Temporal Content Hashes on the Blockchain
We present ARCHANGEL; a novel distributed ledger based system for assuring
the long-term integrity of digital video archives. First, we describe a novel
deep network architecture for computing compact temporal content hashes (TCHs)
from audio-visual streams with durations of minutes or hours. Our TCHs are
sensitive to accidental or malicious content modification (tampering) but
invariant to the codec used to encode the video. This is necessary due to the
curatorial requirement for archives to format shift video over time to ensure
future accessibility. Second, we describe how the TCHs (and the models used to
derive them) are secured via a proof-of-authority blockchain distributed across
multiple independent archives. We report on the efficacy of ARCHANGEL within
the context of a trial deployment in which the national government archives of
the United Kingdom, Estonia and Norway participated.Comment: Accepted to CVPR Blockchain Workshop 201
Dense Piecewise Planar RGB-D SLAM for Indoor Environments
The paper exploits weak Manhattan constraints to parse the structure of
indoor environments from RGB-D video sequences in an online setting. We extend
the previous approach for single view parsing of indoor scenes to video
sequences and formulate the problem of recovering the floor plan of the
environment as an optimal labeling problem solved using dynamic programming.
The temporal continuity is enforced in a recursive setting, where labeling from
previous frames is used as a prior term in the objective function. In addition
to recovery of piecewise planar weak Manhattan structure of the extended
environment, the orthogonality constraints are also exploited by visual
odometry and pose graph optimization. This yields reliable estimates in the
presence of large motions and absence of distinctive features to track. We
evaluate our method on several challenging indoors sequences demonstrating
accurate SLAM and dense mapping of low texture environments. On existing TUM
benchmark we achieve competitive results with the alternative approaches which
fail in our environments.Comment: International Conference on Intelligent Robots and Systems (IROS)
201
Automatic learning of gait signatures for people identification
This work targets people identification in video based on the way they walk
(i.e. gait). While classical methods typically derive gait signatures from
sequences of binary silhouettes, in this work we explore the use of
convolutional neural networks (CNN) for learning high-level descriptors from
low-level motion features (i.e. optical flow components). We carry out a
thorough experimental evaluation of the proposed CNN architecture on the
challenging TUM-GAID dataset. The experimental results indicate that using
spatio-temporal cuboids of optical flow as input data for CNN allows to obtain
state-of-the-art results on the gait task with an image resolution eight times
lower than the previously reported results (i.e. 80x60 pixels).Comment: Proof of concept paper. Technical report on the use of ConvNets (CNN)
for gait recognition. Data and code:
http://www.uco.es/~in1majim/research/cnngaitof.htm
Vectors of Locally Aggregated Centers for Compact Video Representation
We propose a novel vector aggregation technique for compact video
representation, with application in accurate similarity detection within large
video datasets. The current state-of-the-art in visual search is formed by the
vector of locally aggregated descriptors (VLAD) of Jegou et. al. VLAD generates
compact video representations based on scale-invariant feature transform (SIFT)
vectors (extracted per frame) and local feature centers computed over a
training set. With the aim to increase robustness to visual distortions, we
propose a new approach that operates at a coarser level in the feature
representation. We create vectors of locally aggregated centers (VLAC) by first
clustering SIFT features to obtain local feature centers (LFCs) and then
encoding the latter with respect to given centers of local feature centers
(CLFCs), extracted from a training set. The sum-of-differences between the LFCs
and the CLFCs are aggregated to generate an extremely-compact video description
used for accurate video segment similarity detection. Experimentation using a
video dataset, comprising more than 1000 minutes of content from the Open Video
Project, shows that VLAC obtains substantial gains in terms of mean Average
Precision (mAP) against VLAD and the hyper-pooling method of Douze et. al.,
under the same compaction factor and the same set of distortions.Comment: Proc. IEEE International Conference on Multimedia and Expo, ICME
2015, Torino, Ital
Review of Person Re-identification Techniques
Person re-identification across different surveillance cameras with disjoint
fields of view has become one of the most interesting and challenging subjects
in the area of intelligent video surveillance. Although several methods have
been developed and proposed, certain limitations and unresolved issues remain.
In all of the existing re-identification approaches, feature vectors are
extracted from segmented still images or video frames. Different similarity or
dissimilarity measures have been applied to these vectors. Some methods have
used simple constant metrics, whereas others have utilised models to obtain
optimised metrics. Some have created models based on local colour or texture
information, and others have built models based on the gait of people. In
general, the main objective of all these approaches is to achieve a
higher-accuracy rate and lowercomputational costs. This study summarises
several developments in recent literature and discusses the various available
methods used in person re-identification. Specifically, their advantages and
disadvantages are mentioned and compared.Comment: Published 201
Monitoring wild animal communities with arrays of motion sensitive camera traps
Studying animal movement and distribution is of critical importance to
addressing environmental challenges including invasive species, infectious
diseases, climate and land-use change. Motion sensitive camera traps offer a
visual sensor to record the presence of a broad range of species providing
location -specific information on movement and behavior. Modern digital camera
traps that record video present new analytical opportunities, but also new data
management challenges. This paper describes our experience with a terrestrial
animal monitoring system at Barro Colorado Island, Panama. Our camera network
captured the spatio-temporal dynamics of terrestrial bird and mammal activity
at the site - data relevant to immediate science questions, and long-term
conservation issues. We believe that the experience gained and lessons learned
during our year long deployment and testing of the camera traps as well as the
developed solutions are applicable to broader sensor network applications and
are valuable for the advancement of the sensor network research. We suggest
that the continued development of these hardware, software, and analytical
tools, in concert, offer an exciting sensor-network solution to monitoring of
animal populations which could realistically scale over larger areas and time
spans
- …