57,977 research outputs found
The GTZAN dataset: Its contents, its faults, their effects on evaluation, and its future use
The GTZAN dataset appears in at least 100 published works, and is the
most-used public dataset for evaluation in machine listening research for music
genre recognition (MGR). Our recent work, however, shows GTZAN has several
faults (repetitions, mislabelings, and distortions), which challenge the
interpretability of any result derived using it. In this article, we disprove
the claims that all MGR systems are affected in the same ways by these faults,
and that the performances of MGR systems in GTZAN are still meaningfully
comparable since they all face the same faults. We identify and analyze the
contents of GTZAN, and provide a catalog of its faults. We review how GTZAN has
been used in MGR research, and find few indications that its faults have been
known and considered. Finally, we rigorously study the effects of its faults on
evaluating five different MGR systems. The lesson is not to banish GTZAN, but
to use it with consideration of its contents.Comment: 29 pages, 7 figures, 6 tables, 128 reference
Utility of the International Classification of Functioning, Disability and Health (ICF) for educational psychologists’ work
Despite embracing a bio-psycho-social perspective, the World Health Organization’s International Classification of Functioning, Disability and Health (ICF) assessment framework has had limited application to date with children who have special educational needs (SEN). This study examines its utility for educational psychologists’ work with children who have Autism Spectrum Disorders (ASD). Mothers of 40 children with ASD aged eight to 12 years were interviewed using a structured protocol based on the ICF framework. The Diagnostic Interview for Social and Communication Disorder (DISCO) was completed with a subset of 19 mothers. Internal consistency and inter-rater reliability of the interview assessments were found to be acceptable and there was evidence for concurrent and discriminant validity. Despite some limitations, initial support for the utility of the ICF model suggests its potential value across educational, health and care fields. Further consideration of its relevance to educational psychologists in new areas of multi-agency working is warranted
TDL--- A Type Description Language for Constraint-Based Grammars
This paper presents \tdl, a typed feature-based representation language and
inference system. Type definitions in \tdl\ consist of type and feature
constraints over the boolean connectives. \tdl\ supports open- and closed-world
reasoning over types and allows for partitions and incompatible types. Working
with partially as well as with fully expanded types is possible. Efficient
reasoning in \tdl\ is accomplished through specialized modules.Comment: Will Appear in Proc. COLING-9
Ambient Sound Helps: Audiovisual Crowd Counting in Extreme Conditions
Visual crowd counting has been recently studied as a way to enable people
counting in crowd scenes from images. Albeit successful, vision-based crowd
counting approaches could fail to capture informative features in extreme
conditions, e.g., imaging at night and occlusion. In this work, we introduce a
novel task of audiovisual crowd counting, in which visual and auditory
information are integrated for counting purposes. We collect a large-scale
benchmark, named auDiovISual Crowd cOunting (DISCO) dataset, consisting of
1,935 images and the corresponding audio clips, and 170,270 annotated
instances. In order to fuse the two modalities, we make use of a linear
feature-wise fusion module that carries out an affine transformation on visual
and auditory features. Finally, we conduct extensive experiments using the
proposed dataset and approach. Experimental results show that introducing
auditory information can benefit crowd counting under different illumination,
noise, and occlusion conditions. The dataset and code will be released. Code
and data have been made availabl
Recommended from our members
Non-Negative Tensor Factorization Applied to Music Genre Classification
Music genre classification techniques are typically applied to the data matrix whose columns are the feature vectors extracted from music recordings. In this paper, a feature vector is extracted using a texture window of one sec, which enables the representation of any 30 sec long music recording as a time sequence of feature vectors, thus yielding a feature matrix. Consequently, by stacking the feature matrices associated to any dataset recordings, a tensor is created, a fact which necessitates studying music genre classification using tensors. First, a novel algorithm for non-negative tensor factorization (NTF) is derived that extends the non-negative matrix factorization. Several variants of the NTF algorithm emerge by employing different cost functions from the class of Bregman divergences. Second, a novel supervised NTF classifier is proposed, which trains a basis for each class separately and employs basis orthogonalization. A variety of spectral, temporal, perceptual, energy, and pitch descriptors is extracted from 1000 recordings of the GTZAN dataset, which are distributed across 10 genre classes. The NTF classifier performance is compared against that of the multilayer perceptron and the support vector machines by applying a stratified 10-fold cross validation. A genre classification accuracy of 78.9% is reported for the NTF classifier demonstrating the superiority of the aforementioned multilinear classifier over several data matrix-based state-of-the-art classifiers
Spartan Daily, October 11, 1978
Volume 71, Issue 27https://scholarworks.sjsu.edu/spartandaily/6386/thumbnail.jp
- …