42 research outputs found

    Reorganization of the auditory-perceptual space across the human vocal range

    Full text link
    We analyzed the auditory-perceptual space across a substantial portion of the human vocal range (220-1046 Hz) using multidimensional scaling analysis of cochlea-scaled spectra from 250-ms vowel segments, initially studied in Friedrichs et al. (2017) J. Acoust. Soc. Am. 142 1025-1033. The dataset comprised the vowels /i y e {\o} {\epsilon} a o u/ (N=240) produced by three native German female speakers, encompassing a broad range of their respective voice frequency ranges. The initial study demonstrated that, during a closed-set identification task involving 21 listeners, the point vowels /i a u/ were significantly recognized at fundamental frequencies (fo) nearing 1 kHz, whereas the recognition of other vowels decreased at higher pitches. Building on these findings, our study revealed systematic spectral shifts associated with vowel height and frontness as fo increased, with a notable clustering around /i a u/ above 523 Hz. These observations underscore the pivotal role of spectral shape in vowel perception, illustrating the reliance on acoustic anchors at higher pitches. Furthermore, this study sheds light on the quantal nature of these vowels and their potential impact on language evolution, offering a plausible explanation for their widespread presence in the world's languages

    Vowel recognition at fundamental frequencies up to 1 kHz reveals point vowels as acoustic landmarks

    Get PDF
    The phonological function of vowels can be maintained at fundamental frequencies (fo) up to 880 Hz [Friedrichs, Maurer, and Dellwo (2015). J. Acoust. Soc. Am. 138, EL36–EL42]. Here, the influence of talker variability and multiple response options on vowel recognition at high fos is assessed. The stimuli (n = 264) consisted of eight isolated vowels (/i y e Ăž Δ a o u/) produced by three female native German talkers at 11 fos within a range of 220–1046 Hz. In a closed-set identification task, 21 listeners were presented excised 700-ms vowel nuclei with quasi-flat fo contours and resonance trajectories. The results show that listeners can identify the point vowels /i a u/ at fos up to almost 1 kHz, with a significant decrease for the vowels /y Δ/ and a drop to chance level for the vowels /e Ăž o/ toward the upper fos. Auditory excitation patterns reveal highly differentiable representations for /i a u/ that can be used as landmarks for vowel category perception at high fos. These results suggest that theories of vowel perception based on overall spectral shape will provide a fuller account of vowel perception than those based solely on formant frequency patterns

    The Zurich Corpus of Vowel and Voice Quality, Version 1.0

    Get PDF
    Existing databases of isolated vowel sounds or vowel sounds embedded in consonantal context generally document only limited variation of basic production parameters. Thus, concerning the possible variation range of vowel and voice quality-related sound characteristics, there is a lack of broad phenomenological and descriptive references that allow for a comprehensive understanding of vowel acoustics and for an evaluation of the extent to which corresponding existing approaches and models can be generalised. In order to contribute to the building up of such references, a novel database of vowel sounds that exceeds any existing collection by size and diversity of vocalic characteristics is presented here, comprised of c. 34 600 utterances of 70 speakers (46 nonprofessional speakers, children, women and men, and 24 professional actors/actresses and singers of straight theatre, contemporary singing, and European classical singing). The database focuses on sounds of the long Standard German vowels /i-y-e-Ăž-a-o-u/ produced with varying basic production parameters such as phonation type, vocal effort, fundamental frequency, vowel context and speaking or singing style. In addition, a read text and, for professionals, songs are also included. The database is accessible for scientific use, and further extensions are in progress

    Acoustics of the Vowel - Preliminaries

    Get PDF
    It seems as if the fundamentals of how we produce vowels and how they are acoustically represented have been clarified: we phonate and articulate. Using our vocal chords, we produce a vocal sound or noise which is then shaped into a specific vowel sound by the resonances of the pharyngeal, oral, and nasal cavities, that is, the vocal tract. Accordingly, the acoustic description of vowels relates to vowel-specific patterns of relative energy maxima in the sound spectra, known as patterns of formants. The intellectual and empirical reasoning presented in this treatise, however, gives rise to scepticism with respect to this understanding of the sound of the vowel. The reflections and materials presented provide reason to argue that, up to now, a comprehensible theory of the acoustics of the voice and of voiced speech sounds is lacking, and consequently, no satisfying understanding of vowels as an achievement and particular formal accomplishment of the voice exists. Thus, the question of the acoustics of the vowel—and with it the question of the acoustics of the voice itself—proves to be an unresolved fundamental problem

    Eros, Beauty, and Phon-Aesthetic Judgements of Language Sound. We Like It Flat and Fast, but Not Melodious. Comparing Phonetic and Acoustic Features of 16 European Languages

    Get PDF
    This paper concerns sound aesthetic preferences for European foreign languages. We investigated the phonetic-acoustic dimension of the linguistic aesthetic pleasure to describe the “music” found in European languages. The Romance languages, French, Italian, and Spanish, take a lead when people talk about melodious language – the music-like effects in the language (a.k.a., phonetic chill). On the other end of the melodiousness spectrum are German and Arabic that are often considered sounding harsh and un-attractive. Despite the public interest, limited research has been conducted on the topic of phonaesthetics, i.e., the subfield of phonetics that is concerned with the aesthetic properties of speech sounds (Crystal, 2008). Our goal is to fill the existing research gap by identifying the acoustic features that drive the auditory perception of language sound beauty. What is so music-like in the language that makes people say “it is music in my ears”? We had 45 central European participants listening to 16 auditorily presented European languages and rating each language in terms of 22 binary characteristics (e.g., beautiful – ugly, funny - boring) plus indicating their language familiarities, L2 backgrounds, speaker voice liking, demographics and musicality levels. Findings revealed that all factors in complex interplay explain a certain percentage of variance: familiarity and expertise in foreign languages, speaker voice characteristics, phonetic complexity, musical acoustic properties, and finally musical expertise of the listener. The most important discovery was the trade-off between speech tempo and so-called linguistic melody (pitch variance): the faster the language, the flatter/more atonal it is in terms of the pitch (speech melody), making it highly appealing acoustically (sounding beautiful and sexy), but not so melodious in a “musical” sense

    Singing Voice Recognition for Music Information Retrieval

    Get PDF
    This thesis proposes signal processing methods for analysis of singing voice audio signals, with the objectives of obtaining information about the identity and lyrics content of the singing. Two main topics are presented, singer identification in monophonic and polyphonic music, and lyrics transcription and alignment. The information automatically extracted from the singing voice is meant to be used for applications such as music classification, sorting and organizing music databases, music information retrieval, etc. For singer identification, the thesis introduces methods from general audio classification and specific methods for dealing with the presence of accompaniment. The emphasis is on singer identification in polyphonic audio, where the singing voice is present along with musical accompaniment. The presence of instruments is detrimental to voice identification performance, and eliminating the effect of instrumental accompaniment is an important aspect of the problem. The study of singer identification is centered around the degradation of classification performance in presence of instruments, and separation of the vocal line for improving performance. For the study, monophonic singing was mixed with instrumental accompaniment at different signal-to-noise (singing-to-accompaniment) ratios and the classification process was performed on the polyphonic mixture and on the vocal line separated from the polyphonic mixture. The method for classification including the step for separating the vocals is improving significantly the performance compared to classification of the polyphonic mixtures, but not close to the performance in classifying the monophonic singing itself. Nevertheless, the results show that classification of singing voices can be done robustly in polyphonic music when using source separation. In the problem of lyrics transcription, the thesis introduces the general speech recognition framework and various adjustments that can be done before applying the methods on singing voice. The variability of phonation in singing poses a significant challenge to the speech recognition approach. The thesis proposes using phoneme models trained on speech data and adapted to singing voice characteristics for the recognition of phonemes and words from a singing voice signal. Language models and adaptation techniques are an important aspect of the recognition process. There are two different ways of recognizing the phonemes in the audio: one is alignment, when the true transcription is known and the phonemes have to be located, other one is recognition, when both transcription and location of phonemes have to be found. The alignment is, obviously, a simplified form of the recognition task. Alignment of textual lyrics to music audio is performed by aligning the phonetic transcription of the lyrics with the vocal line separated from the polyphonic mixture, using a collection of commercial songs. The word recognition is tested for transcription of lyrics from monophonic singing. The performance of the proposed system for automatic alignment of lyrics and audio is sufficient for facilitating applications such as automatic karaoke annotation or song browsing. The word recognition accuracy of the lyrics transcription from singing is quite low, but it is shown to be useful in a query-by-singing application, for performing a textual search based on the words recognized from the query. When some key words in the query are recognized, the song can be reliably identified

    The music industry and popular song in 1930s and 1940s Shanghai, a historial and stylistic analysis

    Get PDF
    In 1930s and 1940s Shanghai, musicians and artists from different cultures and varied backgrounds joined and made the golden age of Shanghai popular song which suggests the beginnings of Chinese popular music in modern times. However, Shanghai popular song has long been neglected in most works about the modern history of Chinese music and remains an unexplored area in Shanghai studies. This study aims to reconstruct a historical view of the Shanghai popular music industry and make a stylistic analysis of its musical products. The research is undertaken at two levels: first, understanding the operating mechanism of the ‘platform’ and second, investigating the components of the ‘products’. By contrasting the hypothetical flowchart of the Shanghai popular music industry, details of the producing, selling and consuming processes are retrieved from various historical sources to reconstruct the industry platform. Through the first level of research, it is found that the rising new media and the flourishing entertainment industry profoundly influenced the development of Shanghai popular song. In addition, social and political changes and changes in business practices and the organisational structure of foreign record companies also contributed to the vast production, popularity and commercial success of Shanghai popular song. From the composition-performance view of song creation, the second level of research reveals that Chinese and Western musical elements both existed in the musical products. The Chinese vocal technique, Western bel canto and instruments from both musical traditions were all found in historical recordings. When ignoring the distinctive nature of pentatonicism but treating Chinese melodies as those on Western scales, Chinese-style tunes could be easily accompanied by chordal harmony. However, the Chinese heterophonic feature was lost in the Western accompaniment texture. Moreover, it is also found that the traditional rules governing the relationship between words and the melody was dismissed in Shanghai popular songwriting. The findings of this study fill in the neglected part in modern history of Chinese music and add to the literature on the under-explored musical area in Shanghai studies. Moreover, this study also demonstrates that against a map illustrating how musical products moved from record companies to consumers along with all other involved participants, the history of popular music can be rediscovered systematically by using songs as evidence, treating media material carefully and tracking down archives and surviving participants

    The Musical Semiotics of Timbre in the Human VoiceandStatic Takes Love's Body

    Get PDF
    In exploring the semiotics of vocal timbre as a general phenomenon within music, theoretical engagement of the history of timbre and of muscial meaning bolsters my illustrative analyses of Laurie Anderson and Louis Armstrong. I outline first its reliance on subtractive filtering imparted physically by the performer's vocal tract, demonstrating that its signification is itself a subtractive process where meaning lies in the silent space between spectral formants. Citing Merleau-Ponty's phenomenology and placing the body's perceptual experience as the basis of existential reality, I then argue that the human voice offers self actualization in a way that other sensory categories cannot, because the voice gives us control over what and how we hear in a way that we cannot control, through our own bodies alone, our sight, touch, taste, and smell. This idea combines with a listener's imagined performance of vocal music, in which I propose that because of our familiarity with the articulations of human sound, as we hear a voice we are able to imagine and mimic the choreography of the vocal tract, engaging a physical and bodily listening, thereby making not only performance but also listening a self-affirming bodily reflection on being. Finally I consider vocal timbre as internally lexical and externally bound by a linguistic context. Citing Peirce and Derrida, and incorporating previous points, I show vocal timbre as a canvas on which a linguistic and musical foreground is painted, all interpreted by the body. Accompanying theoretical discussions is a concerto addressing relevant compositional issues
    corecore