Search CORE

245 research outputs found

COSMOS-7: Video-oriented MPEG-7 scheme for modelling and filtering of semantic content

Author: Agius HW
Angelides MC
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/01/2005
Field of study

MPEG-7 prescribes a format for semantic content models for multimedia to ensure interoperability across a multitude of platforms and application domains. However, the standard leaves it open as to how the models should be used and how their content should be filtered. Filtering is a technique used to retrieve only content relevant to user requirements, thereby reducing the necessary content-sifting effort of the user. This paper proposes an MPEG-7 scheme that can be deployed for semantic content modelling and filtering of digital video. The proposed scheme, COSMOS-7, produces rich and multi-faceted semantic content models and supports a content-based filtering approach that only analyses content relating directly to the preferred content requirements of the user

CiteSeerX

Brunel University Research Archive

Video summarisation: A conceptual framework and survey of the state of the art

Author: Arthur G. Money
Babaguchi
Boyatzis
Cernekova
Chang
Chang
Crockford
Dey
Dimitrova
Ekin
Ferman
Gianluigi
Hanjalic
Hanjalic
Harry Agius
Joffe
Kim
Lee
Lew
Li
Li
Lienhart
Ma
Moriyama
Ngo
Otsuka
Shih
Silverman
Taylor
Tjondronegoro
Tseng
Wang
Zhu
Publication venue: 'Elsevier BV'
Publication date: 01/02/2008
Field of study

This is the post-print (final draft post-refereeing) version of the article. Copyright @ 2007 Elsevier Inc.Video summaries provide condensed and succinct representations of the content of a video stream through a combination of still images, video segments, graphical representations and textual descriptors. This paper presents a conceptual framework for video summarisation derived from the research literature and used as a means for surveying the research literature. The framework distinguishes between video summarisation techniques (the methods used to process content from a source video stream to achieve a summarisation of that stream) and video summaries (outputs of video summarisation techniques). Video summarisation techniques are considered within three broad categories: internal (analyse information sourced directly from the video stream), external (analyse information not sourced directly from the video stream) and hybrid (analyse a combination of internal and external information). Video summaries are considered as a function of the type of content they are derived from (object, event, perception or feature based) and the functionality offered to the user for their consumption (interactive or static, personalised or generic). It is argued that video summarisation would benefit from greater incorporation of external information, particularly user based information that is unobtrusively sourced, in order to overcome longstanding challenges such as the semantic gap and providing video summaries that have greater relevance to individual users

Crossref

Brunel University Research Archive

Combining audio-based similarity with web-based data to accelerate automatic music playlist generation

Author: Gerhard Widmer
Markus Schedl
Peter Knees
Tim Pohle
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2006
Field of study

We present a technique for combining audio signal-based music similarity with web-based musical artist similarity to accelerate the task of automatic playlist generation. We demonstrate the applicability of our proposed method by extending a recently published interface for music players that benefits from intelligent structuring of audio collections. While the original approach involves the calculation of similarities between every pair of songs in a collection, we incorporate web-based data to reduce the number of necessary similarity calculations. More precisely, we exploit artist similarity determined automatically by means of web retrieval to avoid similarity calculation between tracks of dissimilar and/or unrelated artists. We evaluate our acceleration technique on two audio collections with different characteristics. It turns out that the proposed combination of audio- and text-based similarity not only reduces the number of necessary calculations considerably but also yields better results, in terms of musical quality, than the initial approach based on audio data only. Additionally, we conducted a small user study that further confirms the quality of the resulting playlists

CiteSeerX

Crossref

Semantic analysis of field sports video using a petri-net of audio-visual concepts

Author: A. F. Smeaton
D. Sadlier
D. Sinclair
Ekin
L. Bai
N. E. O'Connor
S. Lao
Tang
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/01/2009
Field of study

The most common approach to automatic summarisation and highlight detection in sports video is to train an automatic classifier to detect semantic highlights based on occurrences of low-level features such as action replays, excited commentators or changes in a scoreboard. We propose an alternative approach based on the detection of perception concepts (PCs) and the construction of Petri-Nets which can be used for both semantic description and event detection within sports videos. Low-level algorithms for the detection of perception concepts using visual, aural and motion characteristics are proposed, and a series of Petri-Nets composed of perception concepts is formally defined to describe video content. We call this a Perception Concept Network-Petri Net (PCN-PN) model. Using PCN-PNs, personalized high-level semantic descriptions of video highlights can be facilitated and queries on high-level semantics can be achieved. A particular strength of this framework is that we can easily build semantic detectors based on PCN-PNs to search within sports videos and locate interesting events. Experimental results based on recorded sports video data across three types of sports games (soccer, basketball and rugby), and each from multiple broadcasters, are used to illustrate the potential of this framework

University of Limerick Institutional Repository

Crossref

Irish Universities

DCU Online Research Access Service

Future-Viewer: An Efficient Framework for Navigating and Classifying Audio-Visual Documents

Author: CAMPANELLA M.
MIGLIORATI P.
R. LEONARDI
Publication venue: Ecole Polytechnique Fédérale de Lausanne
Publication date: 01/01/2005
Field of study

In this paper we present an intuitive framework named Future-Viewer, introduced for the effective visualization of spatiotemporal low-level features, in the context of browsing and retrieval of a multimedia document. This tool is used to facilitate the access to the content and to improve the understanding of the semantics associated to the considered multimedia document. The main visualization paradigm employed consists in representing a 2D feature space in which the video document shots are located. The features that characterize the 2D space's axes can be selected by the user. Shots with similar content fall near each other, and the tool offers various functionalities for automatically nding and annotating shot clusters in the feature space. These annotations can also be stored in MPEG7 format. The use of this application to browse the content of few audio-video sequences demonstrate very interesting capabilities

Archivio istituzionale della ricerca - Università di Brescia

ELVIS: Entertainment-led video summaries

Author: Arthur G. Money
Babaguchi N.
Cacioppo J. T.
Damnjanovic U.
Furini M.
Greenwald M. K.
Harry Agius
Jaimes A.
Kim J.
Leonhardt S.
Millet C.
Money A. G.
Nasoz F.
Rikkard N. S.
Sebe N.
Shipman S.
Takahashi Y.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/08/2010
Field of study

© ACM, 2010. This is the author's version of the work. It is posted here by permission of ACM for your personal use. Not for redistribution. The definitive version was published in ACM Transactions on Multimedia Computing, Communications, and Applications, 6(3): Article no. 17 (2010) http://doi.acm.org/10.1145/1823746.1823751Video summaries present the user with a condensed and succinct representation of the content of a video stream. Usually this is achieved by attaching degrees of importance to low-level image, audio and text features. However, video content elicits strong and measurable physiological responses in the user, which are potentially rich indicators of what video content is memorable to or emotionally engaging for an individual user. This article proposes a technique that exploits such physiological responses to a given video stream by a given user to produce Entertainment-Led VIdeo Summaries (ELVIS). ELVIS is made up of five analysis phases which correspond to the analyses of five physiological response measures: electro-dermal response (EDR), heart rate (HR), blood volume pulse (BVP), respiration rate (RR), and respiration amplitude (RA). Through these analyses, the temporal locations of the most entertaining video subsegments, as they occur within the video stream as a whole, are automatically identified. The effectiveness of the ELVIS technique is verified through a statistical analysis of data collected during a set of user trials. Our results show that ELVIS is more consistent than RANDOM, EDR, HR, BVP, RR and RA selections in identifying the most entertaining video subsegments for content in the comedy, horror/comedy, and horror genres. Subjective user reports also reveal that ELVIS video summaries are comparatively easy to understand, enjoyable, and informative

Crossref

Brunel University Research Archive

Analysing user physiological responses for affective video summarisation

Author: Agius
Allanson
Arthur G. Money
Boiten
Bradley
Brown
Cacioppo
Carlson
Cernekova
Coicca
Colombo
Detenber
Dimitrova
Ekman
Frazier
Fridja
Gomez
Gomez
Greenwald
Gross
Hagemann
Hanjalic
Hanjalic
Harry Agius
Kramer
Lang
Lew
Li
McIntyre
Money
Nasoz
Palomba
Philippot
Picard
Picard
Piferi
Power
Scheirer
Simon
Simons
Smeulders
Steinbeis
Suziki
van Reekum
VanDiest
Ward
Winton
Publication venue: 'Elsevier BV'
Publication date: 01/04/2009
Field of study

This is the post-print version of the final paper published in Displays. The published article is available from the link below. Changes resulting from the publishing process, such as peer review, editing, corrections, structural formatting, and other quality control mechanisms may not be reflected in this document. Changes may have been made to this work since it was submitted for publication. Copyright @ 2009 Elsevier B.V.Video summarisation techniques aim to abstract the most significant content from a video stream. This is typically achieved by processing low-level image, audio and text features which are still quite disparate from the high-level semantics that end users identify with (the ‘semantic gap’). Physiological responses are potentially rich indicators of memorable or emotionally engaging video content for a given user. Consequently, we investigate whether they may serve as a suitable basis for a video summarisation technique by analysing a range of user physiological response measures, specifically electro-dermal response (EDR), respiration amplitude (RA), respiration rate (RR), blood volume pulse (BVP) and heart rate (HR), in response to a range of video content in a variety of genres including horror, comedy, drama, sci-fi and action. We present an analysis framework for processing the user responses to specific sub-segments within a video stream based on percent rank value normalisation. The application of the analysis framework reveals that users respond significantly to the most entertaining video sub-segments in a range of content domains. Specifically, horror content seems to elicit significant EDR, RA, RR and BVP responses, and comedy content elicits comparatively lower levels of EDR, but does seem to elicit significant RA, RR, BVP and HR responses. Drama content seems to elicit less significant physiological responses in general, and both sci-fi and action content seem to elicit significant EDR responses. We discuss the implications this may have for future affective video summarisation approaches

Crossref

Brunel University Research Archive

On the Design and Exploitation of User's Personal and Public Information for Semantic Personal Digital Photograph Annotation

Author: Supheakmungkol Sarin
Tadashi Miyosawa
Toshinori Nagahashi
Wataru Kameyama
Publication venue: 'Hindawi Limited'
Publication date: 01/01/2008
Field of study

Automating the process of semantic annotation of digital personal photographs is a crucial step towards efficient and effective management of this increasingly high volume of content. However, this is still a highly challenging task for the research community. This paper proposes a novel solution. Our solution integrates all contextual information available to and from the users, such as their daily emails, schedules, chat archives, web browsing histories, documents, online news, Wikipedia data, and so forth. We then analyze this information and extract important semantic terms, using them as semantic keyword suggestions for their photos. Those keywords are in the form of named entities, such as names of people, organizations, locations, and date/time as well as high frequency terms. Experiments conducted with 10 subjects and a total of 313 photos proved that our proposed approach can significantly help users with the annotation process. We achieved a 33% gain in annotation time as compared to manual annotation. We also obtained very positive results in the accuracy rate of our suggested keywords

Crossref

Directory of Open Access Journals

Recommended from our members

Olfaction-enhanced multimedia: Perspectives and challenges

Author: Ademoye OA
Ghinea G
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 06/08/2010
Field of study

This is the post-print version of the Article. The official published version can be accessed from the link below - Copyright @ 2011 Springer VerlagOlfaction—or smell—is one of the last challenges which multimedia and multimodal applications have to conquer. Enhancing such applications with olfactory stimuli has the potential to create a more complex—and richer—user multimedia experience, by heightening the sense of reality and diversifying user interaction modalities. Nonetheless, olfaction-enhanced multimedia still remains a challenging research area. More recently, however, there have been initial signs of olfactory-enhanced applications in multimedia, with olfaction being used towards a variety of goals, including notification alerts, enhancing the sense of reality in immersive applications, and branding, to name but a few. However, as the goal of a multimedia application is to inform and/or entertain users, achieving quality olfaction-enhanced multimedia applications from the users’ perspective is vital to the success and continuity of these applications. Accordingly, in this paper we have focused on investigating the user perceived experience of olfaction-enhanced multimedia applications, with the aim of discovering the quality evaluation factors that are important from a user’s perspective of these applications, and consequently ensure the continued advancement and success of olfaction-enhanced multimedia applications

Brunel University Research Archive