6 research outputs found
A fast one-pass-training feature selection technique for GMM-based acoustic event detection with audio-visual data
Acoustic event detection becomes a difficult task, even for a small number of events, in scenarios where events are produced rather spontaneously and often overlap in time. In
this work, we aim to improve the detection rate by means of feature selection. Using a one-against-all detection approach, a new fast one-pass-training algorithm, and an associated highly-precise metric are developed. Choosing a different subset of multimodal features for each acoustic event class, the results obtained from audiovisual data collected in the UPC multimodal room show an improvement in average detection rate with respect to using the whole set of features.Peer ReviewedPreprin
The GTZAN dataset: Its contents, its faults, their effects on evaluation, and its future use
The GTZAN dataset appears in at least 100 published works, and is the
most-used public dataset for evaluation in machine listening research for music
genre recognition (MGR). Our recent work, however, shows GTZAN has several
faults (repetitions, mislabelings, and distortions), which challenge the
interpretability of any result derived using it. In this article, we disprove
the claims that all MGR systems are affected in the same ways by these faults,
and that the performances of MGR systems in GTZAN are still meaningfully
comparable since they all face the same faults. We identify and analyze the
contents of GTZAN, and provide a catalog of its faults. We review how GTZAN has
been used in MGR research, and find few indications that its faults have been
known and considered. Finally, we rigorously study the effects of its faults on
evaluating five different MGR systems. The lesson is not to banish GTZAN, but
to use it with consideration of its contents.Comment: 29 pages, 7 figures, 6 tables, 128 reference
Harmonicity and dynamics-based features for audio
ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings4IV-321IPRO