Search CORE

211 research outputs found

User requirements for multimedia indexing and retrieval of unedited audio-visual footage - RUSHES

Author: Fuentes Ardeo L
Izquierdo E
Sadka A H
Schreer O
Sotiriou D
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2008
Field of study

Multimedia analysis and reuse of raw un-edited audio visual content known as rushes is gaining acceptance by a large number of research labs and companies. A set of research projects are considering multimedia indexing, annotation, search and retrieval in the context of European funded research, but only the FP6 project RUSHES is focusing on automatic semantic annotation, indexing and retrieval of raw and un-edited audio-visual content. Even professional content creators and providers as well as home-users are dealing with this type of content and therefore novel technologies for semantic search and retrieval are required. As a first result of this project, the user requirements and possible user-scenarios are presented in this paper. These results lay down the foundation for the research and development of a multimedia search engine particularly dedicated to the specific needs of the users and the content

Crossref

Brunel University Research Archive

Short user-generated videos classification using accompanied audio categories

Author: Guo Jinlin
Gurrin Cathal
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2012
Field of study

This paper investigates the classification of short user-generated videos (UGVs) using the accompanied audio data since short UGVs accounts for a great proportion of the Internet UGVs and many short UGVs are accompanied by singlecategory soundtracks. We define seven types of UGVs corresponding to seven audio categories respectively. We also investigate three modeling approaches for audio feature representation, namely, single Gaussian (1G), Gaussian mixture (GMM) and Bag-of-Audio-Word (BoAW) models. Then using Support Vector Machine (SVM) with three different distance measurements corresponding to three feature representations, classifiers are trained to categorize the UGVs. The accompanying evaluation results show that these approaches are effective for categorizing the short UGVs based on their audio track. Experimental results show that a GMM representation with approximated Bhattacharyya distance (ABD) measurement produces the best performance, and BoAW representation with chi-square kernel also reports comparable results

Crossref

Irish Universities

DCU Online Research Access Service

A Video Library System Using Scene Detection and Automatic Tagging

Author: Baraldi Lorenzo
Cucchiara Rita
Grana Costantino
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2017
Field of study

We present a novel video browsing and retrieval system for edited videos, in which videos are automatically decomposed into meaningful and storytelling parts (i.e. scenes) and tagged according to their transcript. The system relies on a Triplet Deep Neural Network which exploits multimodal features, and has been implemented as a set of extensions to the eXo Platform Enterprise Content Management System (ECMS). This set of extensions enable the interactive visualization of a video, its automatic and semi-automatic annotation, as well as a keyword-based search inside the video collection. The platform also allows a natural integration with third-party add-ons, so that automatic annotations can be exploited outside the proposed platform

Archivio istituzionale della ricerca - Università di Modena e Reggio Emilia

Guest editorial: Content-Based Multimedia Indexing

Author: Benois-Pineau Jenny
Merialdo Bernard
Schoeffmann Klaus
Szirányi Tamás
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2015
Field of study

SZTAKI Publication Repository

EURECOM Repository

Recommended from our members

Recognizing and Classifying Environmental Sounds

Author: Ellis Daniel P. W.
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2012
Field of study

Prof. Ellis presents a summary of LabROSA's new work, with a focus on recognizing environmental sounds, particularly for video classification by soundtrack

Columbia University Academic Commons

Recommended from our members

Multimedia: information representation and access

Author: Brown Evan
Little Suzanne
Rüger Stefan
Publication venue: 'Facet Publishing'
Publication date: 23/06/2011
Field of study

[About the book] Information retrieval (IR) is a complex human activity supported by sophisticated systems. Information science has contributed much to the design and evaluation of previous generations of IR system development and to our general understanding of how such systems should be designed and yet, due to the increasing success and diversity of IR systems, many recent textbooks concentrate on IR systems themselves and ignore the human side of searching for information. This book is the first text to provide an information science perspective on IR

Open Research Online (The Open University)

A Survey of AI Music Generation Tools and Models

Author: Baca Jared
Rawassizadeh Reza
Rekabdar Banafsheh
Zhu Yueyue
Publication venue
Publication date: 23/08/2023
Field of study

In this work, we provide a comprehensive survey of AI music generation tools, including both research projects and commercialized applications. To conduct our analysis, we classified music generation approaches into three categories: parameter-based, text-based, and visual-based classes. Our survey highlights the diverse possibilities and functional features of these tools, which cater to a wide range of users, from regular listeners to professional musicians. We observed that each tool has its own set of advantages and limitations. As a result, we have compiled a comprehensive list of these factors that should be considered during the tool selection process. Moreover, our survey offers critical insights into the underlying mechanisms and challenges of AI music generation

arXiv.org e-Print Archive

Audio Event Detection in Movies using Multiple Audio Words and Contextual Bayesian Networks

Author: Demarty Claire-Hélène
Gravier Guillaume
Gros Patrick
Penet Cédric
Publication venue: HAL CCSD
Publication date: 17/06/2013
Field of study

International audienceThis article investigates a novel use of the well known audio words representations to detect specific audio events, namely gunshots and explosions, in order to get more robustness towards soundtrack variability in Hollywood movies. An audio stream is processed as a sequence of stationary segments. Each segment is described by one or several audio words obtained by applying product quantization to standard features. Such a representation using multiple audio words constructed via product quantisation is one of the novelties described in this work. Based on this representation, Bayesian networks are used to exploit the contextual information in order to detect audio events. Experiments are performed on a comprehensive set of 15 movies, made publicly available. Results are comparable to the state of the art results obtained on the same dataset but show increased robustness to decision thresholds, however limiting the range of possible operating points in some conditions. Late fusion provides a solution to this issue

HAL-CentraleSupelec

INRIA a CCSD electronic archive server

HAL-Rennes 1