23 research outputs found
ELVIS: Entertainment-led video summaries
© ACM, 2010. This is the author's version of the work. It is posted here by permission of ACM for your personal use. Not for redistribution. The definitive version was published in ACM Transactions on Multimedia Computing, Communications, and Applications, 6(3): Article no. 17 (2010) http://doi.acm.org/10.1145/1823746.1823751Video summaries present the user with a condensed and succinct representation of the content of a video stream. Usually this is achieved by attaching degrees of importance to low-level image, audio and text features. However, video content elicits strong and measurable physiological responses in the user, which are potentially rich indicators of what video content is memorable to or emotionally engaging for an individual user. This article proposes a technique that exploits such physiological responses to a given video stream by a given user to produce Entertainment-Led VIdeo Summaries (ELVIS). ELVIS is made up of five analysis phases which correspond to the analyses of five physiological response measures: electro-dermal response (EDR), heart rate (HR), blood volume pulse (BVP), respiration rate (RR), and respiration amplitude (RA). Through these analyses, the temporal locations of the most entertaining video subsegments, as they occur within the video stream as a whole, are automatically identified. The effectiveness of the ELVIS technique is verified through a statistical analysis of data collected during a set of user trials. Our results show that ELVIS is more consistent than RANDOM, EDR, HR, BVP, RR and RA selections in identifying the most entertaining video subsegments for content in the comedy, horror/comedy, and horror genres. Subjective user reports also reveal that ELVIS video summaries are comparatively easy to understand, enjoyable, and informative
Recommended from our members
Complexity of coordinated beamforming and scheduling for OFDMA based heterogeneous networks
Coordination is foreseen to be an important component of future mobile radio networks. It is especially relevant in heterogeneous networks, where high power base stations produce strong interference to an underlying layer of low power base stations. This work investigates in detail the achievable performance gains for one coordination technique—coordinated beamforming. It reveals the main factors that influence the throughput of the mobile stations. These findings are combined with an analysis of the computational complexity. As a result, a heuristic algorithm is presented that achieves results close to an exhaustive search with significantly less calculations. Detailed simulation analysis is presented on a realistic network layout
The COST292 experimental framework for TRECVID 2007
In this paper, we give an overview of the four tasks submitted to TRECVID 2007 by COST292. In shot boundary (SB) detection task, four SB detectors have been developed and the results are merged using two merging algorithms. The framework developed for the high-level feature extraction task comprises four systems. The first system transforms a set of low-level descriptors into the semantic space using
Latent Semantic Analysis and utilises neural networks for feature detection. The second system uses a Bayesian classifier trained with a “bag of subregions”. The third system uses a multi-modal classifier based on SVMs and several descriptors. The fourth system uses two image classifiers based on ant colony optimisation and particle swarm optimisation respectively. The system submitted to the search task is
an interactive retrieval application combining retrieval functionalities in various modalities with a user interface supporting automatic and interactive search over all queries submitted. Finally, the rushes task submission is based on a video summarisation and browsing system comprising two different interest curve algorithms and three features
COST292 experimental framework for TRECVID 2008
In this paper, we give an overview of the four tasks submitted to TRECVID 2008 by COST292. The high-level feature extraction framework comprises four systems. The first system transforms a set of low-level descriptors into the semantic space using Latent Semantic Analysis and utilises neural networks for feature detection. The second system uses a multi-modal classifier based on SVMs and several descriptors. The third system uses three image classifiers based on ant colony optimisation, particle swarm optimisation and a multi-objective learning algorithm. The fourth system uses a Gaussian model for singing detection and a person detection algorithm. The search task is based on an interactive retrieval application combining retrieval functionalities in various modalities with a user interface supporting automatic and interactive search over all queries submitted. The rushes task submission is based on a spectral clustering approach for removing similar scenes based on eigenvalues of frame similarity matrix and and a redundancy removal strategy which depends on semantic features extraction such as camera motion and faces. Finally, the submission to the copy detection task is conducted by two different systems. The first system consists of a video module and an audio module. The second system is based on mid-level features that are related to the temporal structure of videos
Different expressions of the same mode: a recent dialogue between archaeological and contemporary drawing practices
In this article we explore what we perceive as pertinent features of shared experience at the excavations of an Iron Age Hillfort at Bodfari, North Wales, referencing artist, archaeologist and examples of seminal art works and archaeological records resulting through inter-disciplinary collaboration. We explore ways along which archaeological and artistic practices of improvisation become entangled and productive through their different modes of mark-making. We contend that marks and memories of artist and archaeologist alike emerge interactively, through the mutually constituting effects of the object of study, the tools of exploration, and the practitioners themselves, when they are enmeshed in the cross-modally bound activities. These include, but are not limited to, remote sensing, surveying, mattocking, trowelling, drawing, photographing, videoing and sound recording. These marks represent the co-signatories: the gesture of the often anonymous practitioners, the voice of the deposits, as well as the imprint of the tools, and their interplay creates a multi-threaded narrative documenting their modes of intra-action, in short our practices. They occupy the conceptual space of paradata, and in the process of saturating the interstices of digital cognitive prosthetics they lend probity to their translations in both art form and archive
The COST292 experimental framework for TRECVID 2007
In this paper, we give an overview of the four tasks submitted to TRECVID 2007 by COST292. In shot boundary (SB) detection task, four SB detectors have been developed and the results are merged using two merging algorithms. The framework developed for the high-level feature extraction task comprises four systems. The first system transforms a set of low-level descriptors into the semantic space using Latent Semantic Analysis and utilises neural networks for feature detection. The second system uses a Bayesian classifier trained with a "bag of subregions". The third system uses a multi-modal classifier based on SVMs and several descriptors. The fourth system uses two image classifiers based on ant colony optimisation and particle swarm optimisation respectively. The system submitted to the search task is an interactive retrieval application combining retrieval functionalities in various modalities with a user interface supporting automatic and interactive search over all queries submitted. Finally, the rushes task submission is based on a video summarisation and browsing system comprising two different interest curve algorithms and three features