10,345 research outputs found
Automatic Detectors for Underwater Soundscape Measurements
Environmental impact regulations require that marine industrial operators quantify their contribution to underwater noise scenes. Automation of such assessments becomes feasible with the successful categorisation of sounds into broader classes based on source types – biological, anthropogenic and physical. Previous approaches to passive acoustic monitoring have mostly been limited to a few specific sources of interest. In this study, source-independent signal detectors are developed and a framework is presented for the automatic categorisation of underwater sounds into the aforementioned classes
Individual differences in the discrimination of novel speech sounds: effects of sex, temporal processing, musical and cognitive abilities
This study examined whether rapid temporal auditory processing, verbal working memory capacity, non-verbal intelligence, executive functioning, musical ability and prior foreign language experience predicted how well native English speakers (N = 120) discriminated Norwegian tonal and vowel contrasts as well as a non-speech analogue of the tonal contrast and a native vowel contrast presented over noise. Results confirmed a male advantage for temporal and tonal processing, and also revealed that temporal processing was associated with both non-verbal intelligence and speech processing. In contrast, effects of musical ability on non-native speech-sound processing and of inhibitory control on vowel discrimination were not mediated by temporal processing. These results suggest that individual differences in non-native speech-sound processing are to some extent determined by temporal auditory processing ability, in which males perform better, but are also determined by a host of other abilities that are deployed flexibly depending on the characteristics of the target sounds
Recommended from our members
Audio Cartography: Visual Encoding of Acoustic Parameters
Our sonic environment is the matter of subject in multiple domains which developed individual means of its description. As a result, it lacks an established visual language through which knowledge can be connected and insights shared. We provide a visual communication framework for the systematic and coherent documentation of sound in large-scale environments. This consists of visual encodings and mappings of acoustic parameters into distinct graphic variables that present plausible solutions for the visualization of sound. These candidate encodings are assembled into an application-independent, multifunctional, and extensible design guide. We apply the guidelines and show example maps that acts as a basis for the exploration of audio cartography
Supervised Classification of Baboon Vocalizations
International audienceThis paper addresses automatic classification of baboon vocalizations. We considered six classes of sounds emitted by "Papio papio" baboons, and report the results of supervised classification carried out with different signal representations (audio features), classifiers, combinations and settings. Results show that up to 94.1\% of correct recognition of pre-segmented elementary segments of vocalizations can be obtained using Mel-Frequency Cepstral Coefficients representation and Support Vector Machines classifiers. Results for other configurations are also presented and discussed, and a possible extension to the "Sound-spotting'' problem, i.e. online joint detection and classification of a vocalization from a continuous audio stream is illustrated and discussed
Change blindness: eradication of gestalt strategies
Arrays of eight, texture-defined rectangles were used as stimuli in a one-shot change blindness (CB) task where there was a 50% chance that one rectangle would change orientation between two successive presentations separated by an interval. CB was eliminated by cueing the target rectangle in the first stimulus, reduced by cueing in the interval and unaffected by cueing in the second presentation. This supports the idea that a representation was formed that persisted through the interval before being 'overwritten' by the second presentation (Landman et al, 2003 Vision Research 43149–164]. Another possibility is that participants used some kind of grouping or Gestalt strategy. To test this we changed the spatial position of the rectangles in the second presentation by shifting them along imaginary spokes (by ±1 degree) emanating from the central fixation point. There was no significant difference seen in performance between this and the standard task [F(1,4)=2.565, p=0.185]. This may suggest two things: (i) Gestalt grouping is not used as a strategy in these tasks, and (ii) it gives further weight to the argument that objects may be stored and retrieved from a pre-attentional store during this task
Geometrical-based lip-reading using template probabilistic multi-dimension dynamic time warping
By identifying lip movements and characterizing their associations with speech sounds, the performance of speech recognition systems can be improved, particularly when operating in noisy environments. In this paper, we present a geometrical-based automatic lip reading system that extracts the lip region from images using conventional techniques, but the contour itself is extracted using a novel application of a combination of border following and convex hull approaches. Classification is carried out using an enhanced dynamic time warping technique that has the ability to operate in multiple dimensions and a template probability technique that is able to compensate for differences in the way words are uttered in the training set. The performance of the new system has been assessed in recognition of the English digits 0 to 9 as available in the CUAVE database. The experimental results obtained from the new approach compared favorably with those of existing lip reading approaches, achieving a word recognition accuracy of up to 71% with the visual information being obtained from estimates of lip height, width and their ratio
Multimodal music information processing and retrieval: survey and future challenges
Towards improving the performance in various music information processing
tasks, recent studies exploit different modalities able to capture diverse
aspects of music. Such modalities include audio recordings, symbolic music
scores, mid-level representations, motion, and gestural data, video recordings,
editorial or cultural tags, lyrics and album cover arts. This paper critically
reviews the various approaches adopted in Music Information Processing and
Retrieval and highlights how multimodal algorithms can help Music Computing
applications. First, we categorize the related literature based on the
application they address. Subsequently, we analyze existing information fusion
approaches, and we conclude with the set of challenges that Music Information
Retrieval and Sound and Music Computing research communities should focus in
the next years
- …