761 research outputs found
Basic Filters for Convolutional Neural Networks Applied to Music: Training or Design?
When convolutional neural networks are used to tackle learning problems based
on music or, more generally, time series data, raw one-dimensional data are
commonly pre-processed to obtain spectrogram or mel-spectrogram coefficients,
which are then used as input to the actual neural network. In this
contribution, we investigate, both theoretically and experimentally, the
influence of this pre-processing step on the network's performance and pose the
question, whether replacing it by applying adaptive or learned filters directly
to the raw data, can improve learning success. The theoretical results show
that approximately reproducing mel-spectrogram coefficients by applying
adaptive filters and subsequent time-averaging is in principle possible. We
also conducted extensive experimental work on the task of singing voice
detection in music. The results of these experiments show that for
classification based on Convolutional Neural Networks the features obtained
from adaptive filter banks followed by time-averaging perform better than the
canonical Fourier-transform-based mel-spectrogram coefficients. Alternative
adaptive approaches with center frequencies or time-averaging lengths learned
from training data perform equally well.Comment: Completely revised version; 21 pages, 4 figure
Recommended from our members
Time-frequency representation of earthquake accelerograms and inelastic structural response records using the adaptive chirplet decomposition and empirical mode decomposition
In this paper, the adaptive chirplet decomposition combined with the Wigner-Ville transform and the empirical mode decomposition combined with the Hilbert transform are employed to process various non-stationary signals (strong ground motions and structural responses). The efficacy of these two adaptive techniques for capturing the temporal evolution of the frequency content of specific seismic signals is assessed. In this respect, two near-field and two far-field seismic accelerograms are analyzed. Further, a similar analysis is performed for records pertaining to the response of a 20-story steel frame benchmark building excited by one of the four accelerograms scaled by appropriate factors to simulate undamaged and severely damaged conditions for the structure. It is shown that the derived joint time–frequency representations of the response time histories capture quite effectively the influence of non-linearity on the variation of the effective natural frequencies of a structural system during the evolution of a seismic event; in this context, tracing the mean instantaneous frequency of records of critical structural responses is adopted.
The study suggests, overall, that the aforementioned techniques are quite viable tools for detecting and monitoring damage to constructed facilities exposed to seismic excitations
- …