18,289 research outputs found
Using non-speech sounds to provide navigation cues
This article describes 3 experiments that investigate the possibiity of using structured nonspeech audio messages called earcons to provide navigational cues in a menu hierarchy. A hierarchy of 27 nodes and 4 levels was created with an earcon for each node. Rules were defined for the creation of hierarchical earcons at each node. Participants had to identify their location in the hierarchy by listening to an earcon. Results of the first experiment showed that participants could identify their location with 81.5% accuracy, indicating that earcons were a powerful method of communicating hierarchy information. One proposed use for such navigation cues is in telephone-based interfaces (TBIs) where navigation is a problem. The first experiment did not address the particular problems of earcons in TBIs such as “does the lower quality of sound over the telephone lower recall rates,” “can users remember earcons over a period of time.” and “what effect does training type have on recall?” An experiment was conducted and results showed that sound quality did lower the recall of earcons. However; redesign of the earcons overcame this problem with 73% recalled correctly. Participants could still recall earcons at this level after a week had passed. Training type also affected recall. With personal training participants recalled 73% of the earcons, but with purely textual training results were significantly lower. These results show that earcons can provide good navigation cues for TBIs. The final experiment used compound, rather than hierarchical earcons to represent the hierarchy from the first experiment. Results showed that with sounds constructed in this way participants could recall 97% of the earcons. These experiments have developed our general understanding of earcons. A hierarchy three times larger than any previously created was tested, and this was also the first test of the recall of earcons over time
Towards efficient music genre classification using FastMap
Automatic genre classification aims to correctly categorize an unknown recording with a music genre. Recent studies use the Kullback-Leibler (KL) divergence to estimate music similarity then perform classification using k-nearest neighbours (k-NN). However, this approach is not practical for large databases. We propose an efficient genre classifier that addresses the scalability problem. It uses a combination of modified FastMap algorithm and KL divergence to return the nearest neighbours then use 1- NN for classification. Our experiments showed that high accuracies are obtained while performing classification in less than 1/20 second per track
Enhancing timbre model using MFCC and its time derivatives for music similarity estimation
One of the popular methods for content-based music similarity estimation is to model timbre with MFCC as a single multivariate Gaussian with full covariance matrix, then use symmetric Kullback-Leibler divergence. From the field of speech recognition, we propose to use the same approach on the MFCCs’ time derivatives to enhance the timbre model. The Gaussian models for the delta and acceleration coefficients are used to create their respective distance matrix. The distance matrices are then combined linearly to form a full distance matrix for music similarity estimation. In our experiments on two datasets, our novel approach performs better than using MFCC alone.Moreover, performing genre classification using k-NN showed that the accuracies obtained are already close to the state-of-the-art
Acoustic Features and Perceptive Cues of Songs and Dialogues in Whistled Speech: Convergences with Sung Speech
Whistled speech is a little studied local use of language shaped by several
cultures of the world either for distant dialogues or for rendering traditional
songs. This practice consists of an emulation of the voice thanks to a simple
modulated pitch. It is therefore the result of a transformation of the vocal
signal that implies simplifications in the frequency domain. The whistlers
adapt their productions to the way each language combines the qualities of
height perceived simultaneously by the human ear in the complex frequency
spectrum of the spoken or sung voice (pitch, timbre). As a consequence, this
practice underlines key acoustic cues for the intelligibility of the concerned
languages. The present study provides an analysis of the acoustic and phonetic
features selected by whistled speech in several traditions either in purely
oral whistles (Spanish, Turkish, Mazatec) or in whistles produced with an
instrument like a leaf (Akha, Hmong). It underlines the convergences with the
strategies of the singing voice to reach the audience or to render the phonetic
information carried by the vowel (tone, identity) and some aesthetic effects
like ornamentation
- …