210 research outputs found

    Large scale musical instrument identification

    Get PDF
    In this paper, automatic musical instrument identification using a variety of classifiers is addressed. Experiments are performed on a large set of recordings that stem from 20 instrument classes. Several features from general audio data classification applications as well as MPEG-7 descriptors are measured for 1000 recordings. Branch-and-bound feature selection is applied in order to select the most discriminating features for instrument classification. The first classifier is based on non-negative matrix factorization (NMF) techniques, where training is performed for each audio class individually. A novel NMF testing method is proposed, where each recording is projected onto several training matrices, which have been Gram-Schmidt orthogonalized. Several NMF variants are utilized besides the standard NMF method, such as the local NMF and the sparse NMF. In addition, 3-layered multilayer perceptrons, normalized Gaussian radial basis function networks, and support vector machines employing a polynomial kernel have also been tested as classifiers. The classification accuracy is high, ranging between 88.7% to 95.3%, outperforming the state-of-the-art techniques tested in the aforementioned experiment

    Musical instrument classification using non-negative matrix factorization algorithms

    No full text
    In this paper, a class of algorithms for automatic classification of individual musical instrument sounds is presented. Several perceptual features used in general sound classification applications were measured for 300 sound recordings consisting of 6 different musical instrument classes (piano, violin, cello, flute, bassoon and soprano saxophone). In addition, MPEG-7 basic spectral and spectral basis descriptors were considered, providing an effective combination for accurately describing the spectral and timbrai audio characteristics. The audio flies were split using 70% of the available data for training and the remaining 30% for testing. A classifier was developed based on non-negative matrix factorization (NMF) techniques, thus introducing a novel application of NMF. The standard NMF method was examined, as well as its modifications: the local, the sparse, and the discriminant NMF. Experimental results are presented to compare MPEG-7 spectral basis representations with MPEG-7 basic spectral features alongside the various NMF algorithms. The results indicate that the use of the spectrum projection coefficients for feature extraction and the standard NMF classifier yields an accuracy exceeding 95%. Ā©2006 IEEE

    Computationally Efficient and Robust BIC-Based Speaker Segmentation

    Get PDF
    An algorithm for automatic speaker segmentation based on the Bayesian information criterion (BIC) is presented. BIC tests are not performed for every window shift, as previously, but when a speaker change is most probable to occur. This is done by estimating the next probable change point thanks to a model of utterance durations. It is found that the inverse Gaussian fits best the distribution of utterance durations. As a result, less BIC tests are needed, making the proposed system less computationally demanding in time and memory, and considerably more efficient with respect to missed speaker change points. A feature selection algorithm based on branch and bound search strategy is applied in order to identify the most efficient features for speaker segmentation. Furthermore, a new theoretical formulation of BIC is derived by applying centering and simultaneous diagonalization. This formulation is considerably more computationally efficient than the standard BIC, when the covariance matrices are estimated by other estimators than the usual maximum-likelihood ones. Two commonly used pairs of figures of merit are employed and their relationship is established. Computational efficiency is achieved through the speaker utterance modeling, whereas robustness is achieved by feature selection and application of BIC tests at appropriately selected time instants. Experimental results indicate that the proposed modifications yield a superior performance compared to existing approaches

    Speaker-independent negative emotion recognition

    Get PDF
    This work aims to provide a method able to distinguish between negative and non-negative emotions in vocal interaction. A large pool of 1418 features is extracted for that purpose. Several of those features are tested in emotion recognition for the first time. Next, feature selection is applied separately to male and female utterances. In particular, a bidirectional Best First search with backtracking is applied. The first contribution is the demonstration that a significant number of features, first tested here, are retained after feature selection. The selected features are then fed as input to support vector machines with various kernel functions as well as to the K nearest neighbors classifier. The second contribution is in the speaker-independent experiments conducted in order to cope with the limited number of speakers present in the commonly used emotion speech corpora. Speaker-independent systems are known to be more robust and present a better generalization ability than the speaker-dependent ones. Experimental results are reported for the Berlin emotional speech database. The best performing classifier is found to be the support vector machine with the Gaussian radial basis function kernel. Correctly classified utterances are 86.73%Ā±3.95% for male subjects and 91.73%Ā±4.18% for female subjects. The last contribution is in the statistical analysis of the performance of the support vector machine classifier against the K nearest neighbors classifier as well as the statistical analysis of the various support vector machine kernels impact. Ā© 2010 IEEE
    • ā€¦
    corecore