903 research outputs found
Short user-generated videos classification using accompanied audio categories
This paper investigates the classification of short user-generated videos (UGVs) using the accompanied audio data since short UGVs accounts for a great proportion of the Internet UGVs and many short UGVs are accompanied by singlecategory soundtracks. We define seven types of UGVs corresponding to seven audio categories respectively. We also investigate three modeling approaches for audio feature representation, namely, single Gaussian (1G), Gaussian mixture (GMM) and Bag-of-Audio-Word (BoAW) models. Then using Support Vector Machine (SVM) with three different distance measurements corresponding to three feature representations, classifiers are trained to categorize the UGVs. The accompanying evaluation results show that these approaches are effective for categorizing the short UGVs based on their audio track. Experimental results show that a GMM representation with approximated Bhattacharyya distance (ABD) measurement produces the best performance, and BoAW representation with chi-square kernel also reports comparable results
Baseline analysis of a conventional and virtual reality lifelog retrieval system
Continuous media capture via a wearable devices is currently one of the most popular methods to establish a comprehensive record of the entirety of an individual's life experience, referred to in the research community as a lifelog. These vast multimodal corpora include visual and other sensor data and are enriched by content analysis, to generate as extensive a record of an individual's life experience. However, interfacing with such datasets remains an active area of research, and despite the advent of new technology and a plethora of competing mediums for processing digital information, there has been little focus on newly emerging platforms such as virtual reality. In this work, we suggest that the increase in immersion and spatial dimensions provided by virtual reality could provide significant benefits to users when compared to more conventional access methodologies. Hence, we motivate virtual reality as a viable method of exploring multimedia archives (specifically lifelogs) by performing a baseline comparative analysis using a novel application prototype built for the HTC Vive and a conventional prototype built for a standard personal computer
Towards developing a collaborative video platform for learning
The work presented in this paper outlines issues relating to the development of a collaborative video platform for learning. Student adoption of collaborative and video technology is increasing dramatically, becoming part of their everyday lives. The aim of this paper is to propose a system and framework for the successful integration of these technologies into teaching and learning. At the outset we assess current trends and previous research, using these findings to inform the development of a new platform. System specifications are then presented with specific needs identified for students and educators. Finally our tentative framework for a integrating a collaborative video platform for learning is presented
Improving the evaluation of web search systems
Linkage analysis as an aid to web search has been assumed to be of significant benefit and we know that it is being implemented by many major Search Engines. Why then have few TREC participants been able to scientifically prove the benefits of linkage analysis over the past three years? In this paper we put forward reasons why disappointing results have been found and we identify the linkage density requirements of a dataset to faithfully support experiments into linkage analysis. We also report a series of linkage-based retrieval experiments on a more densely linked dataset culled from the TREC web documents
Beyond single-shot text queries: bridging the gap(s) between research communities
This workshop brings together researchers from different streams and communities that deal with information access in the widest sense. The general goal is to foster collaboration between the different communities and to showcase research that sits at the border between different areas of research
iForgot: a model of forgetting in robotic memories
Much effort has focused in recent years on developing more life-like robots. In this paper we propose a model of memory for robots, based on human digital memories, though our model incorporates an element of forgetting to ensure that the robotic memory appears more human and therefore can address some of the challenges for human-robot interaction
EMIR: A novel emotion-based music retrieval system
Music is inherently expressive of emotion meaning and affects the mood of people. In this paper, we present a novel EMIR (Emotional Music Information Retrieval) System that uses latent emotion elements both in music and non-descriptive queries (NDQs) to detect implicit emotional association between users and music to enhance Music Information Retrieval (MIR). We try to understand the latent emotional intent of queries via machine learning for emotion classification and compare the performance of emotion detection approaches on different feature sets. For this purpose, we extract music emotion features from lyrics and social tags crawled from the Internet, label some for training and model them in high-dimensional emotion space and recognize latent emotion of users by query emotion analysis. The similarity between queries and music is computed by verified BM25 model
Interactive searching and browsing of video archives: using text and using image matching
Over the last number of decades much research work has been done in the general area of video and audio analysis. Initially the applications driving this included capturing video in digital form and then being able to store, transmit
and render it, which involved a large effort to develop compression and encoding standards. The technology needed to do all this is now easily available and cheap, with applications of digital video processing now commonplace,
ranging from CCTV (Closed Circuit TV) for security, to home capture of broadcast TV on home DVRs for personal viewing.
One consequence of the development in technology for creating, storing and distributing digital video is that there has been a huge increase in the volume of digital video, and this in turn has created a need for techniques to allow effective management of this video, and by that we mean content management. In the BBC, for example, the archives department receives approximately 500,000 queries per year and has over 350,000 hours of content in its library. Having huge archives of video information is hardly any benefit if we have no effective means of being able to locate video clips which are of relevance to whatever our information needs may be. In this chapter we report our work on developing two specific retrieval and browsing tools for digital video information. Both of these are based on an analysis of the captured video for the purpose of automatically structuring into shots or higher level semantic units like TV news stories. Some also include analysis of the video for the automatic detection of features such as the presence or absence of faces. Both include some elements of searching, where a user specifies a query or information need, and browsing, where a user is allowed to browse through sets of retrieved video shots. We support the presentation of these tools with illustrations of actual video retrieval systems developed and working on hundreds of hours of video content
Semantic concept detection in imbalanced datasets based on different under-sampling strategies
Semantic concept detection is a very useful technique for developing powerful retrieval or filtering systems for multimedia data. To date, the methods for concept detection have been converging on generic classification schemes. However, there is often imbalanced dataset or rare class problems in classification algorithms, which deteriorate the performance of many classifiers. In this paper, we adopt three “under-sampling” strategies to handle this imbalanced dataset issue in a SVM classification framework and evaluate their performances
on the TRECVid 2007 dataset and additional positive
samples from TRECVid 2010 development set. Experimental
results show that our well-designed “under-sampling” methods
(method SAK) increase the performance of concept detection
about 9.6% overall. In cases of extreme imbalance in
the collection the proposed methods worsen the performance
than a baseline sampling method (method SI), however in the
majority of cases, our proposed methods increase the performance of concept detection substantially. We also conclude that method SAK is a promising solution to address the SVM classification with not extremely imbalanced datasets
- …