Search CORE

12,729 research outputs found

Collecting ground truth annotations for drum detection in polyphonic music

Author: De Baets Bernard
Degroeve Sven
Leman Marc
Lesaffre Micheline
Martens Jean-Pierre
Tanghe Koen
Publication venue
Publication date: 01/01/2005
Field of study

In order to train and test algorithms that can automatically detect drum events in polyphonic music, ground truth data is needed. This paper describes a setup used for gathering manual annotations for 49 real-world music fragments containing different drum event types. Apart from the drum events, the beat was also annotated. The annotators were experienced drummers or percussionists. This paper is primarily aimed towards other drum detection researchers, but might also be of interest to others dealing with automatic music analysis, manual annotation and data gathering. Its purpose is threefold: providing annotation data for algorithm training and evaluation, describing a practical way of setting up a drum annotation task, and reporting issues that came up during the annotation sessions while at the same time providing some thoughts on important points that could be taken into account when setting up similar tasks in the future

CiteSeerX

Ghent University Academic Bibliography

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

Collecting ground truth annotations for drum detection in polyphonic music

Author: De Baets Bernard
Degroeve Sven
Leman Marc
Lesaffre Micheline
Martens Jean-Pierre
Tanghe Koen
Publication venue
Publication date: 01/01/2005
Field of study

Ghent University Academic Bibliography

Learning Multimodal Latent Attributes

Author: Fu Y
Gong S
Hospedales TM
Xiang T
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/02/2014
Field of study

Abstract—The rapid development of social media sharing has created a huge demand for automatic media classification and annotation techniques. Attribute learning has emerged as a promising paradigm for bridging the semantic gap and addressing data sparsity via transferring attribute knowledge in object recognition and relatively simple action classification. In this paper, we address the task of attribute learning for understanding multimedia data with sparse and incomplete labels. In particular we focus on videos of social group activities, which are particularly challenging and topical examples of this task because of their multi-modal content and complex and unstructured nature relative to the density of annotations. To solve this problem, we (1) introduce a concept of semi-latent attribute space, expressing user-defined and latent attributes in a unified framework, and (2) propose a novel scalable probabilistic topic model for learning multi-modal semi-latent attributes, which dramatically reduces requirements for an exhaustive accurate attribute ontology and expensive annotation effort. We show that our framework is able to exploit latent attributes to outperform contemporary approaches for addressing a variety of realistic multimedia sparse data learning tasks including: multi-task learning, learning with label noise, N-shot transfer learning and importantly zero-shot learning

CiteSeerX

Queen Mary Research Online

Recommended from our members

MUSCLE movie-database: a multimodal corpus with rich annotation for dialogue and saliency detection

Author: Antonopoulos P.
Benetos E.
Kotropoulos C.
Kotti M.
Maragos P.
Moschou V.
Nikolaidis N.
Pitas I.
Spachos D.
Tzimouli K.
Zlantintsi A.
Publication venue
Publication date: 01/01/2008
Field of study

City Research Online

Spiral - Imperial College Digital Repository

The CAMOMILE collaborative annotation platform for multi-modal, multi-lingual and multi-media documents

Author: Adda Gilles
Barras Claude
Bredin Herve
Budnik Mateusz
Hernando Pericás Francisco Javier
Mariani Joseph
Morros Rubió Josep Ramon
Poignant Johann
Publication venue: European Language Resources Association
Publication date: 01/01/2016
Field of study

In this paper, we describe the organization and the implementation of the CAMOMILE collaborative annotation framework for multimodal, multimedia, multilingual (3M) data. Given the versatile nature of the analysis which can be performed on 3M data, the structure of the server was kept intentionally simple in order to preserve its genericity, relying on standard Web technologies. Layers of annotations, defined as data associated to a media fragment from the corpus, are stored in a database and can be managed through standard interfaces with authentication. Interfaces tailored specifically to the needed task can then be developed in an agile way, relying on simple but reliable services for the management of the centralized annotations. We then present our implementation of an active learning scenario for person annotation in video, relying on the CAMOMILE server; during a dry run experiment, the manual annotation of 716 speech segments was thus propagated to 3504 labeled tracks. The code of the CAMOMILE framework is distributed in open source.Peer ReviewedPostprint (author's final draft

UPCommons. Portal del coneixement obert de la UPC