Search CORE

18,289 research outputs found

Using non-speech sounds to provide navigation cues

Author: BADDELEY A.
BLATTNER M.
BREWSTER S. A.
BREWSTER S. A.
MAQUIRE M.
Stephen A. Brewster
STEVENS R. D.
WOLF C.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/1998
Field of study

This article describes 3 experiments that investigate the possibiity of using structured nonspeech audio messages called earcons to provide navigational cues in a menu hierarchy. A hierarchy of 27 nodes and 4 levels was created with an earcon for each node. Rules were defined for the creation of hierarchical earcons at each node. Participants had to identify their location in the hierarchy by listening to an earcon. Results of the first experiment showed that participants could identify their location with 81.5% accuracy, indicating that earcons were a powerful method of communicating hierarchy information. One proposed use for such navigation cues is in telephone-based interfaces (TBIs) where navigation is a problem. The first experiment did not address the particular problems of earcons in TBIs such as “does the lower quality of sound over the telephone lower recall rates,” “can users remember earcons over a period of time.” and “what effect does training type have on recall?” An experiment was conducted and results showed that sound quality did lower the recall of earcons. However; redesign of the earcons overcame this problem with 73% recalled correctly. Participants could still recall earcons at this level after a week had passed. Training type also affected recall. With personal training participants recalled 73% of the earcons, but with purely textual training results were significantly lower. These results show that earcons can provide good navigation cues for TBIs. The final experiment used compound, rather than hierarchical earcons to represent the hierarchy from the first experiment. Results showed that with sounds constructed in this way participants could recall 97% of the earcons. These experiments have developed our general understanding of earcons. A hierarchy three times larger than any previously created was tested, and this was also the first test of the recall of earcons over time

CiteSeerX

Crossref

Enlighten

Towards efficient music genre classification using FastMap

Author: de Leon Franz
Martinez Kirk
Publication venue
Publication date: 17/09/2012
Field of study

Automatic genre classification aims to correctly categorize an unknown recording with a music genre. Recent studies use the Kullback-Leibler (KL) divergence to estimate music similarity then perform classification using k-nearest neighbours (k-NN). However, this approach is not practical for large databases. We propose an efficient genre classifier that addresses the scalability problem. It uses a combination of modified FastMap algorithm and KL divergence to return the nearest neighbours then use 1- NN for classification. Our experiments showed that high accuracies are obtained while performing classification in less than 1/20 second per track

Southampton (e-Prints Soton)

Enhancing timbre model using MFCC and its time derivatives for music similarity estimation

Author: de leon Franz
Martinez Kirk
Publication venue
Publication date: 27/08/2012
Field of study

One of the popular methods for content-based music similarity estimation is to model timbre with MFCC as a single multivariate Gaussian with full covariance matrix, then use symmetric Kullback-Leibler divergence. From the field of speech recognition, we propose to use the same approach on the MFCCs’ time derivatives to enhance the timbre model. The Gaussian models for the delta and acceleration coefficients are used to create their respective distance matrix. The distance matrices are then combined linearly to form a full distance matrix for music similarity estimation. In our experiments on two datasets, our novel approach performs better than using MFCC alone.Moreover, performing genre classification using k-NN showed that the accuracies obtained are already close to the state-of-the-art

Southampton (e-Prints Soton)

ZENODO

Acoustic Features and Perceptive Cues of Songs and Dialogues in Whistled Speech: Convergences with Sung Speech

Author: Meyer Julien
Publication venue
Publication date: 01/01/2007
Field of study

Whistled speech is a little studied local use of language shaped by several cultures of the world either for distant dialogues or for rendering traditional songs. This practice consists of an emulation of the voice thanks to a simple modulated pitch. It is therefore the result of a transformation of the vocal signal that implies simplifications in the frequency domain. The whistlers adapt their productions to the way each language combines the qualities of height perceived simultaneously by the human ear in the complex frequency spectrum of the spoken or sung voice (pitch, timbre). As a consequence, this practice underlines key acoustic cues for the intelligibility of the concerned languages. The present study provides an analysis of the acoustic and phonetic features selected by whistled speech in several traditions either in purely oral whistles (Spanish, Turkish, Mazatec) or in whistles produced with an instrument like a leaf (Akha, Hmong). It underlines the convergences with the strategies of the singing voice to reach the audience or to render the phonetic information carried by the vowel (tone, identity) and some aesthetic effects like ornamentation

arXiv.org e-Print Archive