7,531 research outputs found
A Deep Siamese Network for Scene Detection in Broadcast Videos
We present a model that automatically divides broadcast videos into coherent
scenes by learning a distance measure between shots. Experiments are performed
to demonstrate the effectiveness of our approach by comparing our algorithm
against recent proposals for automatic scene segmentation. We also propose an
improved performance measure that aims to reduce the gap between numerical
evaluation and expected results, and propose and release a new benchmark
dataset.Comment: ACM Multimedia 201
Algorithms for Video Structuring
Video structuring aims at automatically finding structure in a video sequence. Occupying a key-position within video analysis, it is a fundamental step for quality indexing and browsing. As a low level video analysis, video structuring can be seen as a serial process which includes (i) shot boundary detection, (ii) video shot feature extraction and (iii) video shot clustering. The resulting analysis serves as the base for higher level processing such as content-based image retrieval or semantic indexing. In this study, the whole process is examined and implemented. Two shot boundary detectors based on motion estimation and color distribution analysis are designed. Based on recent advances in machine learning, a novel technique for video shot clustering is presented. Typical approaches for segmenting and clustering shots use graph analysis, with split and merge algorithms for finding subgraphs corresponding to different scenes. In this work, the clustering algorithm is based on a spectral method which has proven its efficiency in still-image segmentation. This technique clusters points (in our case features extracted from video shots) using eigenvectors of matrices derived from data. Relevant data depends of the quality of feature extraction. After stating the main problems of video structuring, solutions are proposed defining an heuristical distance metric for similarity between shots. We combine color visual features with time constraints. The entire process of video structuring is tested on a ten hours home video database
Recommended from our members
Hierarchical video summarisation in reference frame subspace
In this paper, a hierarchical video structure summarization approach using Laplacian Eigenmap is proposed, where a small set of reference frames is selected from the video sequence to form a reference subspace to measure the dissimilarity between two arbitrary frames. In the proposed summarization scheme, the shot-level key frames are first detected from the continuity of inter-frame dissimilarity, and the sub-shot level and scene level representative frames are then summarized by using k-mean clustering. The experiment is carried on both test videos and movies, and the results show that in comparison with a similar approach using latent semantic analysis, the proposed approach using Laplacian Eigenmap can achieve a better recall rate in keyframe detection, and gives an efficient hierarchical summarization at sub shot, shot and scene levels subsequently
Zero Shot Learning with the Isoperimetric Loss
We introduce the isoperimetric loss as a regularization criterion for
learning the map from a visual representation to a semantic embedding, to be
used to transfer knowledge to unknown classes in a zero-shot learning setting.
We use a pre-trained deep neural network model as a visual representation of
image data, a Word2Vec embedding of class labels, and linear maps between the
visual and semantic embedding spaces. However, the spaces themselves are not
linear, and we postulate the sample embedding to be populated by noisy samples
near otherwise smooth manifolds. We exploit the graph structure defined by the
sample points to regularize the estimates of the manifolds by inferring the
graph connectivity using a generalization of the isoperimetric inequalities
from Riemannian geometry to graphs. Surprisingly, this regularization alone,
paired with the simplest baseline model, outperforms the state-of-the-art among
fully automated methods in zero-shot learning benchmarks such as AwA and CUB.
This improvement is achieved solely by learning the structure of the underlying
spaces by imposing regularity.Comment: Accepted to AAAI-2
Contextual cropping and scaling of TV productions
This is the author's accepted manuscript. The final publication is available at Springer via http://dx.doi.org/10.1007/s11042-011-0804-3. Copyright @ Springer Science+Business Media, LLC 2011.In this paper, an application is presented which automatically adapts SDTV (Standard Definition Television) sports productions to smaller displays through intelligent cropping and scaling. It crops regions of interest of sports productions based on a smart combination of production metadata and systematic video analysis methods. This approach allows a context-based composition of cropped images. It provides a differentiation between the original SD version of the production and the processed one adapted to the requirements for mobile TV. The system has been comprehensively evaluated by comparing the outcome of the proposed method with manually and statically cropped versions, as well as with non-cropped versions. Envisaged is the integration of the tool in post-production and live workflows
Zero-Shot Edge Detection with SCESAME: Spectral Clustering-based Ensemble for Segment Anything Model Estimation
This paper proposes a novel zero-shot edge detection with SCESAME, which
stands for Spectral Clustering-based Ensemble for Segment Anything Model
Estimation, based on the recently proposed Segment Anything Model (SAM). SAM is
a foundation model for segmentation tasks, and one of the interesting
applications of SAM is Automatic Mask Generation (AMG), which generates
zero-shot segmentation masks of an entire image. AMG can be applied to edge
detection, but suffers from the problem of overdetecting edges. Edge detection
with SCESAME overcomes this problem by three steps: (1) eliminating small
generated masks, (2) combining masks by spectral clustering, taking into
account mask positions and overlaps, and (3) removing artifacts after edge
detection. We performed edge detection experiments on two datasets, BSDS500 and
NYUDv2. Although our zero-shot approach is simple, the experimental results on
BSDS500 showed almost identical performance to human performance and CNN-based
methods from seven years ago. In the NYUDv2 experiments, it performed almost as
well as recent CNN-based methods. These results indicate that our method
effectively enhances the utility of SAM and can be a new direction in zero-shot
edge detection methods.Comment: 11 pages, accepted to WACV 2024 Worksho
Measuring scene detection performance
In this paper we evaluate the performance of scene detection techniques, starting from the classic precision/recall approach, moving to the better designed coverage/overflow measures, and finally proposing an improved metric, in order to solve frequently observed cases in which the numeric interpretation is different from the expected results. Numerical evaluation is performed on two recent proposals for automatic scene detection, and comparing them with a simple but effective novel approach. Experimental results are conducted to show how different measures may lead to different interpretations
Constraining the Power Spectrum using Clusters
(Shortened Abstract). We analyze a redshift sample of Abell/ACO clusters and
compare them with numerical simulations based on the truncated Zel'dovich
approximation (TZA), for a list of eleven dark matter (DM) models. For each
model we run several realizations, on which we estimate cosmic variance
effects. We analyse correlation statistics, the probability density function,
and supercluster properties from percolation analysis. As a general result, we
find that the distribution of galaxy clusters provides a constraint only on the
shape of the power spectrum, but not on its amplitude: a shape parameter 0.18 <
\Gamma < 0.25 and an effective spectral index at 20Mpc/h in the range
[-1.1,-0.9] are required by the Abell/ACO data. In order to obtain
complementary constraints on the spectrum amplitude, we consider the cluster
abundance as estimated using the Press--Schechter approach, whose reliability
is explicitly tested against N--body simulations. We conclude that, of the
cosmological models considered here, the only viable models are either Cold+Hot
DM ones with \Omega_\nu = [0.2-0.3], better if shared between two massive
neutrinos, and flat low-density CDM models with \Omega_0 = [0.3-0.5].Comment: 37 pages, Latex file, 9 figures; New Astronomy, in pres
Analysis and Re-use of Videos in Educational Digital Libraries with Automatic Scene Detection
The advent of modern approaches to education, like Massive Open Online Courses (MOOC), made video the basic media for educating and transmitting knowledge. However, IT tools are still not adequate to allow video content re-use, tagging, annotation and personalization. In this paper we analyze the problem of identifying coherent sequences, called scenes, in order to provide the users with a more manageable editing unit. A simple spectral clustering technique is proposed and compared with state-of-the-art results. We also discuss correct ways to evaluate the performance of automatic scene detection algorithms
- âŠ