Search CORE

57,977 research outputs found

The GTZAN dataset: Its contents, its faults, their effects on evaluation, and its future use

Author: Sturm Bob L.
Publication venue: 'Informa UK Limited'
Publication date: 01/01/2013
Field of study

The GTZAN dataset appears in at least 100 published works, and is the most-used public dataset for evaluation in machine listening research for music genre recognition (MGR). Our recent work, however, shows GTZAN has several faults (repetitions, mislabelings, and distortions), which challenge the interpretability of any result derived using it. In this article, we disprove the claims that all MGR systems are affected in the same ways by these faults, and that the performances of MGR systems in GTZAN are still meaningfully comparable since they all face the same faults. We identify and analyze the contents of GTZAN, and provide a catalog of its faults. We review how GTZAN has been used in MGR research, and find few indications that its faults have been known and considered. Finally, we rigorously study the effects of its faults on evaluating five different MGR systems. The lesson is not to banish GTZAN, but to use it with consideration of its contents.Comment: 29 pages, 7 figures, 6 tables, 128 reference

arXiv.org e-Print Archive

VBN

Utility of the International Classification of Functioning, Disability and Health (ICF) for educational psychologists’ work

Author: Aljunied M
Frederickson N
Publication venue
Publication date: 24/09/2014
Field of study

Despite embracing a bio-psycho-social perspective, the World Health Organization’s International Classification of Functioning, Disability and Health (ICF) assessment framework has had limited application to date with children who have special educational needs (SEN). This study examines its utility for educational psychologists’ work with children who have Autism Spectrum Disorders (ASD). Mothers of 40 children with ASD aged eight to 12 years were interviewed using a structured protocol based on the ICF framework. The Diagnostic Interview for Social and Communication Disorder (DISCO) was completed with a subset of 19 mothers. Internal consistency and inter-rater reliability of the interview assessments were found to be acceptable and there was evidence for concurrent and discriminant validity. Despite some limitations, initial support for the utility of the ICF model suggests its potential value across educational, health and care fields. Further consideration of its relevance to educational psychologists in new areas of multi-agency working is warranted

UCL Discovery

PubMed Central

TDL--- A Type Description Language for Constraint-Based Grammars

Author: Krieger Hans-Ulrich
Schäfer Ulrich
Publication venue
Publication date: 01/01/1994
Field of study

This paper presents \tdl, a typed feature-based representation language and inference system. Type definitions in \tdl\ consist of type and feature constraints over the boolean connectives. \tdl\ supports open- and closed-world reasoning over types and allows for partitions and incompatible types. Working with partially as well as with fully expanded types is possible. Efficient reasoning in \tdl\ is accomplished through specialized modules.Comment: Will Appear in Proc. COLING-9

arXiv.org e-Print Archive

CiteSeerX

Ambient Sound Helps: Audiovisual Crowd Counting in Extreme Conditions

Author: Dou Dejing
Gao Junyu
Hu Di
Hua Yuansheng
Mou Lichao
Wang Qingzhong
Zhu Xiao Xiang
Publication venue
Publication date: 16/05/2020
Field of study

Visual crowd counting has been recently studied as a way to enable people counting in crowd scenes from images. Albeit successful, vision-based crowd counting approaches could fail to capture informative features in extreme conditions, e.g., imaging at night and occlusion. In this work, we introduce a novel task of audiovisual crowd counting, in which visual and auditory information are integrated for counting purposes. We collect a large-scale benchmark, named auDiovISual Crowd cOunting (DISCO) dataset, consisting of 1,935 images and the corresponding audio clips, and 170,270 annotated instances. In order to fuse the two modalities, we make use of a linear feature-wise fusion module that carries out an affine transformation on visual and auditory features. Finally, we conduct extensive experiments using the proposed dataset and approach. Experimental results show that introducing auditory information can benefit crowd counting under different illumination, noise, and occlusion conditions. The dataset and code will be released. Code and data have been made availabl

arXiv.org e-Print Archive

Institute of Transport Research:Publications

Recommended from our members

Non-Negative Tensor Factorization Applied to Music Genre Classification

Author: Benetos E.
Kotropoulos C.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/11/2010
Field of study

Music genre classification techniques are typically applied to the data matrix whose columns are the feature vectors extracted from music recordings. In this paper, a feature vector is extracted using a texture window of one sec, which enables the representation of any 30 sec long music recording as a time sequence of feature vectors, thus yielding a feature matrix. Consequently, by stacking the feature matrices associated to any dataset recordings, a tensor is created, a fact which necessitates studying music genre classification using tensors. First, a novel algorithm for non-negative tensor factorization (NTF) is derived that extends the non-negative matrix factorization. Several variants of the NTF algorithm emerge by employing different cost functions from the class of Bregman divergences. Second, a novel supervised NTF classifier is proposed, which trains a basis for each class separately and employs basis orthogonalization. A variety of spectral, temporal, perceptual, energy, and pitch descriptors is extracted from 1000 recordings of the GTZAN dataset, which are distributed across 10 genre classes. The NTF classifier performance is compared against that of the multilayer perceptron and the support vector machines by applying a stratified 10-fold cross validation. A genre classification accuracy of 78.9% is reported for the NTF classifier demonstrating the superiority of the aforementioned multilinear classifier over several data matrix-based state-of-the-art classifiers

City Research Online

Spartan Daily, October 11, 1978

Author: San Jose State University School of Journalism and Mass Communications
Publication venue: SJSU ScholarWorks
Publication date: 11/10/1978
Field of study

Volume 71, Issue 27https://scholarworks.sjsu.edu/spartandaily/6386/thumbnail.jp

SJSU ScholarWorks