Search CORE

7,531 research outputs found

A Deep Siamese Network for Scene Detection in Broadcast Videos

Author: Apostolidis E.
Krizhevsky A.
Mikolov T.
Zhou B.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2015
Field of study

We present a model that automatically divides broadcast videos into coherent scenes by learning a distance measure between shots. Experiments are performed to demonstrate the effectiveness of our approach by comparing our algorithm against recent proposals for automatic scene segmentation. We also propose an improved performance measure that aims to reduce the gap between numerical evaluation and expected results, and propose and release a new benchmark dataset.Comment: ACM Multimedia 201

arXiv.org e-Print Archive

Crossref

Archivio istituzionale della ricerca - Università di Modena e Reggio Emilia

Algorithms for Video Structuring

Author: Gatica-Perez Daniel
Guillemot Maël
Odobez Jean-Marc
Publication venue: IDIAP
Publication date: 10/03/2006
Field of study

Video structuring aims at automatically finding structure in a video sequence. Occupying a key-position within video analysis, it is a fundamental step for quality indexing and browsing. As a low level video analysis, video structuring can be seen as a serial process which includes (i) shot boundary detection, (ii) video shot feature extraction and (iii) video shot clustering. The resulting analysis serves as the base for higher level processing such as content-based image retrieval or semantic indexing. In this study, the whole process is examined and implemented. Two shot boundary detectors based on motion estimation and color distribution analysis are designed. Based on recent advances in machine learning, a novel technique for video shot clustering is presented. Typical approaches for segmenting and clustering shots use graph analysis, with split and merge algorithms for finding subgraphs corresponding to different scenes. In this work, the clustering algorithm is based on a spectral method which has proven its efficiency in still-image segmentation. This technique clusters points (in our case features extracted from video shots) using eigenvectors of matrices derived from data. Relevant data depends of the quality of feature extraction. After stating the main problems of video structuring, solutions are proposed defining an heuristical distance metric for similarity between shots. We combine color visual features with time constraints. The entire process of video structuring is tested on a ten hours home video database

Infoscience - École polytechnique fédérale de Lausanne

Recommended from our members

Hierarchical video summarisation in reference frame subspace

Author: Crookes D
Jiang RM
Sadka AH
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2009
Field of study

In this paper, a hierarchical video structure summarization approach using Laplacian Eigenmap is proposed, where a small set of reference frames is selected from the video sequence to form a reference subspace to measure the dissimilarity between two arbitrary frames. In the proposed summarization scheme, the shot-level key frames are first detected from the continuity of inter-frame dissimilarity, and the sub-shot level and scene level representative frames are then summarized by using k-mean clustering. The experiment is carried on both test videos and movies, and the results show that in comparison with a similar approach using latent semantic analysis, the proposed approach using Laplacian Eigenmap can achieve a better recall rate in keyframe detection, and gives an efficient hierarchical summarization at sub shot, shot and scene levels subsequently

Brunel University Research Archive

Zero Shot Learning with the Isoperimetric Loss

Author: Bertozzi Andrea
Deutsch Shay
Soatto Stefano
Publication venue
Publication date: 15/03/2019
Field of study

We introduce the isoperimetric loss as a regularization criterion for learning the map from a visual representation to a semantic embedding, to be used to transfer knowledge to unknown classes in a zero-shot learning setting. We use a pre-trained deep neural network model as a visual representation of image data, a Word2Vec embedding of class labels, and linear maps between the visual and semantic embedding spaces. However, the spaces themselves are not linear, and we postulate the sample embedding to be populated by noisy samples near otherwise smooth manifolds. We exploit the graph structure defined by the sample points to regularize the estimates of the manifolds by inferring the graph connectivity using a generalization of the isoperimetric inequalities from Riemannian geometry to graphs. Surprisingly, this regularization alone, paired with the simplest baseline model, outperforms the state-of-the-art among fully automated methods in zero-shot learning benchmarks such as AwA and CUB. This improvement is achieved solely by learning the structure of the underlying spaces by imposing regularity.Comment: Accepted to AAAI-2

arXiv.org e-Print Archive

eScholarship - University of California

Association for the Advancement of Artificial Intelligence: AAAI Publications

Contextual cropping and scaling of TV productions

Author: A Treisman
DA Forsyth
DL Ruderman
Gerhard Stoll
Joerg Deigmoeller
L Itti
L Sachs
L-Q Chen
M Knee
Norbert Just
O Meur Le
R Mohan
Takebumi Itagaki
W-H Cheng
WY Lum
X Hou
Z Zhang
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 11/05/2011
Field of study

This is the author's accepted manuscript. The final publication is available at Springer via http://dx.doi.org/10.1007/s11042-011-0804-3. Copyright @ Springer Science+Business Media, LLC 2011.In this paper, an application is presented which automatically adapts SDTV (Standard Definition Television) sports productions to smaller displays through intelligent cropping and scaling. It crops regions of interest of sports productions based on a smart combination of production metadata and systematic video analysis methods. This approach allows a context-based composition of cropped images. It provides a differentiation between the original SD version of the production and the processed one adapted to the requirements for mobile TV. The system has been comprehensively evaluated by comparing the outcome of the proposed method with manually and statically cropped versions, as well as with non-cropped versions. Envisaged is the integration of the tool in post-production and live workflows

Crossref

Brunel University Research Archive

Zero-Shot Edge Detection with SCESAME: Spectral Clustering-based Ensemble for Segment Anything Model Estimation

Author: Kambe Hiroyuki
Nakamoto Ryosuke
Takase Yusuke
Yamagiwa Hiroaki
Publication venue
Publication date: 18/11/2023
Field of study

This paper proposes a novel zero-shot edge detection with SCESAME, which stands for Spectral Clustering-based Ensemble for Segment Anything Model Estimation, based on the recently proposed Segment Anything Model (SAM). SAM is a foundation model for segmentation tasks, and one of the interesting applications of SAM is Automatic Mask Generation (AMG), which generates zero-shot segmentation masks of an entire image. AMG can be applied to edge detection, but suffers from the problem of overdetecting edges. Edge detection with SCESAME overcomes this problem by three steps: (1) eliminating small generated masks, (2) combining masks by spectral clustering, taking into account mask positions and overlaps, and (3) removing artifacts after edge detection. We performed edge detection experiments on two datasets, BSDS500 and NYUDv2. Although our zero-shot approach is simple, the experimental results on BSDS500 showed almost identical performance to human performance and CNN-based methods from seven years ago. In the NYUDv2 experiments, it performed almost as well as recent CNN-based methods. These results indicate that our method effectively enhances the utility of SAM and can be a new direction in zero-shot edge detection methods.Comment: 11 pages, accepted to WACV 2024 Worksho

arXiv.org e-Print Archive

Measuring scene detection performance

Author: BARALDI LORENZO
CUCCHIARA Rita
GRANA Costantino
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2015
Field of study

In this paper we evaluate the performance of scene detection techniques, starting from the classic precision/recall approach, moving to the better designed coverage/overflow measures, and finally proposing an improved metric, in order to solve frequently observed cases in which the numeric interpretation is different from the expected results. Numerical evaluation is performed on two recent proposals for automatic scene detection, and comparing them with a simple but effective novel approach. Experimental results are conducted to show how different measures may lead to different interpretations

Crossref

Archivio istituzionale della ricerca - Università di Modena e Reggio Emilia

Constraining the Power Spectrum using Clusters

Author: A. Klypin
Abell
Abell
Achilli
Adams
Athanassopoulos
Bahcall
Bahcall
Bahcall
Bailand
Bardeen
Batuski
Bennett
Bernardeau
Bernardeau
Biviano
Blumenthal
Blumenthal
Bonometto
Borgani
Borgani
Borgani
Borgani
Borgani
Borgani
Branch
C.C. Smith
Cen
Colafrancesco
Coles
Coles
Collins
Croft
Dalton
Davis
Dekel
Doroshkevich
Efstathiou
Eke
Eke
Evrard
Fang
Gaztañaga
Gaztañaga
Ghigna
Górski
Górski
Hockney
Hodges
Holtzman
Holtzman
J. Holtzman
J.R. Primack
Jing
Jing
Jing
K.M. Górski
Katgert
Kennicutt
Kerscher
Klypin
Klypin
Klypin
Klypin
Klypin
Kofman
L. Moscardini
Lacey
Lacey
Liddle
Liddle
Liddle
Lilje
Ling
Loveday
Lucchin
Lumsden
M. Plionis
Ma
Maddox
Mann
Matarrese
Melott
Mo
Monaco
Moscardini
Moscardini
Mould
Nichol
Nichol
Olivier
Park
Peacock
Peacock
Peebles
Peebles
Plionis
Plionis
Plionis
Plionis
Pogosyan
Postman
Postman
Press
Primack
Quintana
R. Stompor
Raychaudhury
Romer
Ross
S. Borgani
Sahni
Sandage
Sathyaprakash
Scaramella
Scaramella
Shandarin
Shapley
Smoot
Stompor
Stompor
Sugiyama
Sugiyama
Sutherland
Tini Brunozzi
Tonnen
Turner
Van Dalen
Viana
Walter
West
White
White
White
Wright
Zel'dovich
Publication venue: 'Elsevier BV'
Publication date: 13/11/1996
Field of study

(Shortened Abstract). We analyze a redshift sample of Abell/ACO clusters and compare them with numerical simulations based on the truncated Zel'dovich approximation (TZA), for a list of eleven dark matter (DM) models. For each model we run several realizations, on which we estimate cosmic variance effects. We analyse correlation statistics, the probability density function, and supercluster properties from percolation analysis. As a general result, we find that the distribution of galaxy clusters provides a constraint only on the shape of the power spectrum, but not on its amplitude: a shape parameter 0.18 < \Gamma < 0.25 and an effective spectral index at 20Mpc/h in the range [-1.1,-0.9] are required by the Abell/ACO data. In order to obtain complementary constraints on the spectrum amplitude, we consider the cluster abundance as estimated using the Press--Schechter approach, whose reliability is explicitly tested against N--body simulations. We conclude that, of the cosmological models considered here, the only viable models are either Cold+Hot DM ones with \Omega_\nu = [0.2-0.3], better if shared between two massive neutrinos, and flat low-density CDM models with \Omega_0 = [0.3-0.5].Comment: 37 pages, Latex file, 9 figures; New Astronomy, in pres

arXiv.org e-Print Archive

Archivio istituzionale della ricerca - Università di Trieste

Crossref

CERN Document Server

Analysis and Re-use of Videos in Educational Digital Libraries with Automatic Scene Detection

Author: Baraldi Lorenzo
Cucchiara Rita
Grana Costantino
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2016
Field of study

The advent of modern approaches to education, like Massive Open Online Courses (MOOC), made video the basic media for educating and transmitting knowledge. However, IT tools are still not adequate to allow video content re-use, tagging, annotation and personalization. In this paper we analyze the problem of identifying coherent sequences, called scenes, in order to provide the users with a more manageable editing unit. A simple spectral clustering technique is proposed and compared with state-of-the-art results. We also discuss correct ways to evaluate the performance of automatic scene detection algorithms

Crossref

Archivio istituzionale della ricerca - Università di Modena e Reggio Emilia

Introduction to the special issue on semantic analysis for interactive multimedia services

Author: Avrithis Yannis
Kompatsiaris Ioannis
O'Connor Noel E.
Publication venue: 'Elsevier BV'
Publication date: 01/01/2008
Field of study

Crossref

Irish Universities

DSpace at NTUA

DCU Online Research Access Service