7,694 research outputs found
Semantic analysis of field sports video using a petri-net of audio-visual concepts
The most common approach to automatic summarisation and highlight detection in sports video is to train an automatic classifier to detect semantic highlights based on occurrences of low-level features such as action replays, excited commentators or changes in a scoreboard. We propose an alternative approach based on the detection of perception concepts (PCs) and the construction of Petri-Nets which can be used for both semantic description and event detection within sports videos. Low-level algorithms for the detection of perception concepts using visual, aural and motion characteristics are proposed, and a series of Petri-Nets composed of perception concepts is formally defined to describe video content. We call this a Perception Concept Network-Petri Net (PCN-PN) model. Using PCN-PNs, personalized high-level semantic descriptions of video highlights can be facilitated and queries on high-level semantics can be achieved. A particular strength of this framework is that we can easily build semantic detectors based on PCN-PNs to search within sports videos and locate interesting events. Experimental results based on recorded sports
video data across three types of sports games (soccer, basketball and rugby), and each from multiple broadcasters, are used to illustrate the potential of this framework
Multimodal music information processing and retrieval: survey and future challenges
Towards improving the performance in various music information processing
tasks, recent studies exploit different modalities able to capture diverse
aspects of music. Such modalities include audio recordings, symbolic music
scores, mid-level representations, motion, and gestural data, video recordings,
editorial or cultural tags, lyrics and album cover arts. This paper critically
reviews the various approaches adopted in Music Information Processing and
Retrieval and highlights how multimodal algorithms can help Music Computing
applications. First, we categorize the related literature based on the
application they address. Subsequently, we analyze existing information fusion
approaches, and we conclude with the set of challenges that Music Information
Retrieval and Sound and Music Computing research communities should focus in
the next years
Interactive Search and Exploration in Online Discussion Forums Using Multimodal Embeddings
In this paper we present a novel interactive multimodal learning system,
which facilitates search and exploration in large networks of social multimedia
users. It allows the analyst to identify and select users of interest, and to
find similar users in an interactive learning setting. Our approach is based on
novel multimodal representations of users, words and concepts, which we
simultaneously learn by deploying a general-purpose neural embedding model. We
show these representations to be useful not only for categorizing users, but
also for automatically generating user and community profiles. Inspired by
traditional summarization approaches, we create the profiles by selecting
diverse and representative content from all available modalities, i.e. the
text, image and user modality. The usefulness of the approach is evaluated
using artificial actors, which simulate user behavior in a relevance feedback
scenario. Multiple experiments were conducted in order to evaluate the quality
of our multimodal representations, to compare different embedding strategies,
and to determine the importance of different modalities. We demonstrate the
capabilities of the proposed approach on two different multimedia collections
originating from the violent online extremism forum Stormfront and the
microblogging platform Twitter, which are particularly interesting due to the
high semantic level of the discussions they feature
Evaluating Content-centric vs User-centric Ad Affect Recognition
Despite the fact that advertisements (ads) often include strongly emotional
content, very little work has been devoted to affect recognition (AR) from ads.
This work explicitly compares content-centric and user-centric ad AR
methodologies, and evaluates the impact of enhanced AR on computational
advertising via a user study. Specifically, we (1) compile an affective ad
dataset capable of evoking coherent emotions across users; (2) explore the
efficacy of content-centric convolutional neural network (CNN) features for
encoding emotions, and show that CNN features outperform low-level emotion
descriptors; (3) examine user-centered ad AR by analyzing Electroencephalogram
(EEG) responses acquired from eleven viewers, and find that EEG signals encode
emotional information better than content descriptors; (4) investigate the
relationship between objective AR and subjective viewer experience while
watching an ad-embedded online video stream based on a study involving 12
users. To our knowledge, this is the first work to (a) expressly compare user
vs content-centered AR for ads, and (b) study the relationship between modeling
of ad emotions and its impact on a real-life advertising application.Comment: Accepted at the ACM International Conference on Multimodal Interation
(ICMI) 201
Can small be beautiful? assessing image resolution requirements for mobile TV
Mobile TV services are now being offered in several countries, but for cost reasons, most of these services offer material directly recoded for mobile consumption (i.e. without additional editing). The experiment reported in this paper, aims to assess the image resolution and bitrate requirements for displaying this type of material on mobile devices. The study, with 128 participants, examined responses to four different image resolutions, seven video encoding bitrates, two audio bitrates and four content types. The results show that acceptability is significantly lower for images smaller than 168×126, regardless of content type. The effect is more pronounced when bandwidth is abundant, and is due to important detail being lost in the smaller screens. In contrast to previous studies, participants are more likely to rate image quality as unacceptable when the audio quality is high
CHORUS Deliverable 4.5: Report of the 3rd CHORUS Conference
The third and last CHORUS conference on Multimedia Search Engines took place from the 26th to the 27th of May 2009 in Brussels, Belgium. About 100 participants from 15 European countries, the US, Japan and Australia learned about the latest developments in the domain. An exhibition of 13 stands presented 16 research projects currently ongoing around the
world
- …