Search CORE

9,145 research outputs found

The GTZAN dataset: Its contents, its faults, their effects on evaluation, and its future use

Author: Sturm Bob L.
Publication venue: 'Informa UK Limited'
Publication date: 01/01/2013
Field of study

The GTZAN dataset appears in at least 100 published works, and is the most-used public dataset for evaluation in machine listening research for music genre recognition (MGR). Our recent work, however, shows GTZAN has several faults (repetitions, mislabelings, and distortions), which challenge the interpretability of any result derived using it. In this article, we disprove the claims that all MGR systems are affected in the same ways by these faults, and that the performances of MGR systems in GTZAN are still meaningfully comparable since they all face the same faults. We identify and analyze the contents of GTZAN, and provide a catalog of its faults. We review how GTZAN has been used in MGR research, and find few indications that its faults have been known and considered. Finally, we rigorously study the effects of its faults on evaluating five different MGR systems. The lesson is not to banish GTZAN, but to use it with consideration of its contents.Comment: 29 pages, 7 figures, 6 tables, 128 reference

arXiv.org e-Print Archive

VBN

A Comparison of Human, Automatic and Collaborative Music Genre Classification and User Centric Evaluation of Genre Classification Systems

Author
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2011
Field of study

Classification Accuracy Is Not Enough:On the Evaluation of Music Genre Recognition Systems

Author: Sturm Bob L.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2013
Field of study

Springer - Publisher Connector

VBN

Information extraction from the web using a search engine

Author: Geleijnse G.
Publication venue: Technische Universiteit Eindhoven
Publication date: 01/01/2008
Field of study

Repository TU/e

Audiovisual annotation procedure for multi-view field recordings

Author: AA Liu
D Turnbull
H McGurk
I Lefter
JC Pereira
L Aroyo
M Chion
O Russakovsky
R Iedema
S Bird
X Wang
Y LeCun
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/12/2018
Field of study

Audio and video parts of an audiovisual document interact to produce an audiovisual, or multi-modal, perception. Yet, automatic analysis on these documents are usually based on separate audio and video annotations. Regarding the audiovisual content, these annotations could be incomplete, or not relevant. Besides, the expanding possibilities of creating audiovisual documents lead to consider different kinds of contents, including videos filmed in uncontrolled conditions (i.e. fields recordings), or scenes filmed from different points of view (multi-view). In this paper we propose an original procedure to produce manual annotations in different contexts, including multi-modal and multi-view documents. This procedure, based on using both audio and video annotations, ensures consistency considering audio or video only, and provides additionally audiovisual information at a richer level. Finally, different applications are made possible when considering such annotated data. In particular, we present an example application in a network of recordings in which our annotations allow multi-source retrieval using mono or multi-modal queries