11,307 research outputs found

    Visualizing sound emission of elephant vocalizations: evidence for two rumble production types

    Get PDF
    Recent comparative data reveal that formant frequencies are cues to body size in animals, due to a close relationship between formant frequency spacing, vocal tract length and overall body size. Accordingly, intriguing morphological adaptations to elongate the vocal tract in order to lower formants occur in several species, with the size exaggeration hypothesis being proposed to justify most of these observations. While the elephant trunk is strongly implicated to account for the low formants of elephant rumbles, it is unknown whether elephants emit these vocalizations exclusively through the trunk, or whether the mouth is also involved in rumble production. In this study we used a sound visualization method (an acoustic camera) to record rumbles of five captive African elephants during spatial separation and subsequent bonding situations. Our results showed that the female elephants in our analysis produced two distinct types of rumble vocalizations based on vocal path differences: a nasally- and an orally-emitted rumble. Interestingly, nasal rumbles predominated during contact calling, whereas oral rumbles were mainly produced in bonding situations. In addition, nasal and oral rumbles varied considerably in their acoustic structure. In particular, the values of the first two formants reflected the estimated lengths of the vocal paths, corresponding to a vocal tract length of around 2 meters for nasal, and around 0.7 meters for oral rumbles. These results suggest that African elephants may be switching vocal paths to actively vary vocal tract length (with considerable variation in formants) according to context, and call for further research investigating the function of formant modulation in elephant vocalizations. Furthermore, by confirming the use of the elephant trunk in long distance rumble production, our findings provide an explanation for the extremely low formants in these calls, and may also indicate that formant lowering functions to increase call propagation distances in this species'

    Using a novel visualization tool for rapid survey of long-duration acoustic recordings for ecological studies of frog chorusing

    Get PDF
    Continuous recording of environmental sounds could allow long-term monitoring of vocal wildlife, and scaling of ecological studies to large temporal and spatial scales. However, such opportunities are currently limited by constraints in the analysis of large acoustic data sets. Computational methods and automation of call detection require specialist expertise and are time consuming to develop, therefore most biological researchers continue to use manual listening and inspection of spectrograms to analyze their sound recordings. False-color spectrograms were recently developed as a tool to allow visualization of long-duration sound recordings, intending to aid ecologists in navigating their audio data and detecting species of interest. This paper explores the efficacy of using this visualization method to identify multiple frog species in a large set of continuous sound recordings and gather data on the chorusing activity of the frog community. We found that, after a phase of training of the observer, frog choruses could be visually identified to species with high accuracy. We present a method to analyze such data, including a simple R routine to interactively select short segments on the false-color spectrogram for rapid manual checking of visually identified sounds. We propose these methods could fruitfully be applied to large acoustic data sets to analyze calling patterns in other chorusing species

    Influence of experimental conditions on sound pleasantness evaluations

    Get PDF
    ICA 2016, 22nd International Congress on Acoustics, BUENOS AIRES, ARGENTINE, 05-/09/2016 - 09/09/2016Being able to characterize and estimate the urban sound perception is a key point to improve the city dwellers environmental quality. In the past decade, various studies have focused on collecting perceived global sound pleasantness at specific locations. Some of them were carried out on field in order to evaluate the soundscape perception of the participants directly in their context. Other studies were realized in laboratory to better control the stimuli and to increase the number of participants who were subjected to the same sound environment. Most of the laboratory experiments are done in large or semi-anechoic chamber with calibrated and highly realistic audio reproduction in order to respect the ecological validity of the experiment. On one hand, even with a high immersive level, the laboratory context is not as rich as the field context and the two types of experiment could lead to different results. On the other hand, few studies exist showing the influence of decreasing ecological validity for the same experience. This work presents a short statistical analysis of perceptive evaluations of ten urban locations under 4 different test conditions. First, evaluations are carried out directly in-situ in the city of Paris. Then audio-visual recordings of these locations are evaluated in three different experimental conditions: (i) in a well-controlled acoustic laboratory in Paris region with French people, (ii) in an acoustic laboratory in Buenos Aires with Argentinean participants and lowest immersive conditions, (iii) in a habitational room with Argentinean participants and subjective calibration. The study reveals that both the 'country' factor and the experimental conditions in laboratory do not show any significant impact on the perceived sound pleasantness and perceived loudness assessments

    Robust sound event detection in bioacoustic sensor networks

    Full text link
    Bioacoustic sensors, sometimes known as autonomous recording units (ARUs), can record sounds of wildlife over long periods of time in scalable and minimally invasive ways. Deriving per-species abundance estimates from these sensors requires detection, classification, and quantification of animal vocalizations as individual acoustic events. Yet, variability in ambient noise, both over time and across sensors, hinders the reliability of current automated systems for sound event detection (SED), such as convolutional neural networks (CNN) in the time-frequency domain. In this article, we develop, benchmark, and combine several machine listening techniques to improve the generalizability of SED models across heterogeneous acoustic environments. As a case study, we consider the problem of detecting avian flight calls from a ten-hour recording of nocturnal bird migration, recorded by a network of six ARUs in the presence of heterogeneous background noise. Starting from a CNN yielding state-of-the-art accuracy on this task, we introduce two noise adaptation techniques, respectively integrating short-term (60 milliseconds) and long-term (30 minutes) context. First, we apply per-channel energy normalization (PCEN) in the time-frequency domain, which applies short-term automatic gain control to every subband in the mel-frequency spectrogram. Secondly, we replace the last dense layer in the network by a context-adaptive neural network (CA-NN) layer. Combining them yields state-of-the-art results that are unmatched by artificial data augmentation alone. We release a pre-trained version of our best performing system under the name of BirdVoxDetect, a ready-to-use detector of avian flight calls in field recordings.Comment: 32 pages, in English. Submitted to PLOS ONE journal in February 2019; revised August 2019; published October 201

    Access to recorded interviews: A research agenda

    Get PDF
    Recorded interviews form a rich basis for scholarly inquiry. Examples include oral histories, community memory projects, and interviews conducted for broadcast media. Emerging technologies offer the potential to radically transform the way in which recorded interviews are made accessible, but this vision will demand substantial investments from a broad range of research communities. This article reviews the present state of practice for making recorded interviews available and the state-of-the-art for key component technologies. A large number of important research issues are identified, and from that set of issues, a coherent research agenda is proposed

    Towards End-to-End Acoustic Localization using Deep Learning: from Audio Signal to Source Position Coordinates

    Full text link
    This paper presents a novel approach for indoor acoustic source localization using microphone arrays and based on a Convolutional Neural Network (CNN). The proposed solution is, to the best of our knowledge, the first published work in which the CNN is designed to directly estimate the three dimensional position of an acoustic source, using the raw audio signal as the input information avoiding the use of hand crafted audio features. Given the limited amount of available localization data, we propose in this paper a training strategy based on two steps. We first train our network using semi-synthetic data, generated from close talk speech recordings, and where we simulate the time delays and distortion suffered in the signal that propagates from the source to the array of microphones. We then fine tune this network using a small amount of real data. Our experimental results show that this strategy is able to produce networks that significantly improve existing localization methods based on \textit{SRP-PHAT} strategies. In addition, our experiments show that our CNN method exhibits better resistance against varying gender of the speaker and different window sizes compared with the other methods.Comment: 18 pages, 3 figures, 8 table

    Enhanced amplitude modulations contribute to the Lombard intelligibility benefit: Evidence from the Nijmegen Corpus of Lombard Speech

    No full text
    Speakers adjust their voice when talking in noise, which is known as Lombard speech. These acoustic adjustments facilitate speech comprehension in noise relative to plain speech (i.e., speech produced in quiet). However, exactly which characteristics of Lombard speech drive this intelligibility benefit in noise remains unclear. This study assessed the contribution of enhanced amplitude modulations to the Lombard speech intelligibility benefit by demonstrating that (1) native speakers of Dutch in the Nijmegen Corpus of Lombard Speech (NiCLS) produce more pronounced amplitude modulations in noise vs. in quiet; (2) more enhanced amplitude modulations correlate positively with intelligibility in a speech-in-noise perception experiment; (3) transplanting the amplitude modulations from Lombard speech onto plain speech leads to an intelligibility improvement, suggesting that enhanced amplitude modulations in Lombard speech contribute towards intelligibility in noise. Results are discussed in light of recent neurobiological models of speech perception with reference to neural oscillators phase-locking to the amplitude modulations in speech, guiding the processing of speech
    corecore