52 research outputs found
Investigating the Singing Voice: Quantitative and Qualitative Approaches to Studying Cross-Cultural Vocal Production
This thesis was motivated by an experiment carried out in the 1960s that studied the relationship between vocal performance practice and society by means of statistical analysis. Using a comprehensive corpus of audio recordings of singing from around the world collected over several decades, the ethnomusicologist Alan Lomax devised the Cantometrics project, the largest comparative study of music, in which 36 performance practice characteristics were rated for each recording. With particular interest in vocal production, we intended to formalise the knowledge of vocal production to enable statistical and computational approaches in the spirit of Cantometrics.
Three models of vocal production were investigated: the perceptual model from Cantometrics, a physical model from voice science and a physiological model from singing education. We built on Johan Sundberg's vocal source parameters and Jo Estill's physiological building blocks as the basis to develop an ontology of vocal production.
Two approaches to automated characterisation of the ontological descriptors were considered. For the incremental approach a proof-of-concept experiment on automatic labelling of phonation modes was presented, based on reconstructing the vocal source waveform by means of inverse filtering. We created a dataset of sustained sung vowels with annotations on pitch, vowel and phonation mode on which our model was trained. Steps to generalise this experiment to more complex data were outlined, discussing the challenges of such generalisation.
The integrated approach addressed the full variance in the data, turning to the methodology of expert knowledge elicitation in order to annotate the original Cantometrics dataset with our descriptors. We performed an investigative mixed-methods study in which 13 vocal physiology experts from different professional backgrounds were interviewed; they used our ontology to analyse vocal production in the Cantometrics dataset. The goal of the study was to: a) validate the acceptance of our ontological terms, b) verify the consensus between experts on the values of the descriptors, c) collect reliable annotations. While the acceptance of the ontology was good for most terms, quantitative analysis showed good agreement between experts for only two out of 11 descriptors (larynx height, aryepiglottic sphincter). A detailed qualitative analysis of the interview data (over 33 hours) was followed by a meta-analysis extracting common themes and confounding issues which point to probable reasons for the disagreement. For aryepiglottic sphincter and larynx height we collected the average ratings, which constitute the first set of reliable annotations on vocal production. A strong correlation was found between larynx height and the vocal width parameter from Cantometrics; larynx height was therefore a good candidate to replace vocal width as a more objective descriptor.
The current work was based on knowledge from a number of research disciplines, and its results are discussed from the viewpoint of several fields – MIR, vocal pedagogy, Cantometrics – for which they present significant implications. Future research is suggested for each of the fields. Based on the meta-analysis, we account for the reasons for disagreement between experts on the subject of vocal production, from music information retrieval (MIR) and singing education perspectives. We further explain the various kinds of bias that affect raters.
We conclude that vocal physiology, though offering a more objective language than perceptual descriptors, is not well-suited as an ontological middle layer for statistical approaches to singing given the current state of knowledge. A mixed perceptual-objective path to ontology building is suggested and ways to collect reliable annotations are outlined.
In the domain of vocal pedagogy we touch on the issue of communication on vocal physiology between experts, between teacher and student; we consider the future of teaching vocal technique and make suggestions for new experiments in the field.
A plan is presented for revising and scaling up Cantometrics as an interdisciplinary collaboration. Possible contributions of MIR, ethnomusicologists and vocal production specialists are specified
Ontological description of vocal production in world's music cultures – a physiological approach
We present our investigative study into vocal production ontology intended for comparative cross- cultural analysis of singing style. Such an ontology should provide a baseline vocabulary to explicitly define and compare vocal production, helping to formalise the discourse on vocal quality and singing style within and across disciplines, including ethnomusicology, voice science, singing education and music informatics. Our study examines the viability of using physiological and functional descriptors for modelling of vocal production.Vocal quality is usually described in subjective, perceptual terms such as bright or dark sound, metallic, heavy, brassy, lyrical or round. These descriptions are not only tradition specific, but more often than not they are highly subjective. While many disciplines have approached vocal production (Johan Sundberg in voice science, Jo Estill in singing education, Alan Lomax in ethnomusicology), these approaches still have limited, discipline specific applications and some of them display methodological weaknesses.Our study is based on interviews with 13 world-class experts in vocal physiology - otolaryngologists, speech language therapists, singing teachers. They performed perceptual and physiological analysis of 19 singing fragments from 11 cultures. Physiological analysis was conducted using our preliminary ontology of vocal production based on state of the art concepts in voice science and singing education. The aim of our study is to verify the consistency of experts' ratings and the inter-rater agreement, a strong agreement indicating a general validity of physiological approach.Our study design combines quantitative and qualitative research methods. We present the results obtained through a detailed statistical analysis of inter-participant agreement, triangulated via qualitative analysis of the interviews. We also examine the relationship between experts' perceptual and physiological ratings. We discuss the implications of our results for further ontological work in the field of vocal production
Breathy, Resonant, Pressed - Automatic Detection Of Phonation Mode From Audio Recordings of Singing
In this paper we present an experiment on automatic detection of phonation modes from recordings of sustained sung vowels. We created an open dataset specifically for this experiment, containing recordings of nine vowels from multiple languages, sung by a female singer on all pitches in her vocal range in phonation modes breathy, neutral, flow (resonant) and pressed. The dataset is available under a Creative Commons license at .
First, glottal flow waveform is estimated via inverse filtering (IAIF) from audio recordings. Then six parameters of the glottal flow waveform are calculated. A 4-class Support Vector Machine classifier is constructed to separate these features into phonation mode classes. We automated the IAIF approach by computing the values of the input arguments – lip radiation and formant count – leading to the best-performing SVM classifiers (average classification accuracy over 60%), yielding a physical model for the articulation of the vowels.
We examine the steps needed to generalise and extend the experimental work presented in this paper in order to apply this method in ethnomusicological investigations
Breathy or Resonant - A Controlled and Curated Dataset for Phonation Mode Detection in Singing
This paper presents a new reference dataset of sustained, sung vowels with attached labels indicating the phonation mode. The dataset is intended for training computational models for automated phonation mode detection.
Four phonation modes are distinguished by Johan Sundberg: breathy, neutral, flow (or resonant) and pressed. The presented dataset consists of ca. 700 recordings of nine vowels from several languages, sung at various pitches in various phonation modes. The recorded sounds were produced by one female singer under controlled conditions, following recommendations by voice acoustics researchers.
While datasets on phonation modes in speech exist, such resources for singing are not available. Our dataset closes this gap and offers researchers in various disciplines a reference and a training set. It will be made available online under Creative Commons license. Also, the format of the dataset is extensible. Further content additions and future support for the dataset are planned
From music ontology towards ethno-music-ontology
This paper presents exploratory work investigating the suitability of the Music Ontology [33] - the most widely used formal specification of the music domain - for modelling non-Western musical traditions. Four contrasting case studies from a variety of musical cultures are analysed: Dutch folk song research, reconstructive performance of rural Russian traditions, contemporary performance and composition of Persian classical music, and recreational use of a personal world music collection. We propose semantic models describing the respective domains and examine the applications of the Music Ontology for these case studies: which concepts can be successfully reused, where they need adjustments, and which parts of the reality in these case studies are not covered by the Music Ontology. The variety of traditions, contexts and modelling goals covered by our case studies sheds light on the generality of the Music Ontology and on the limits of generalisation “for all musics” that could be aspired for on the Semantic Web
The VocalNotes Dataset
The VocalNotes dataset is a collection of audio and annotations for excerpts of vocal performances from five musical traditions - Japanese Minyo, Chinese Hebei Bangzi opera, Russian traditional singing, Alpine yodel and Jewish Romaniote chant. For each tradition the dataset contains: about 10 minutes of audio; documentation for the songs from which annotated fragments originate; f0, independent onset, offset and note pitch annotations created by two or three experts; The dataset was created as part of the VocalNotes project [1]. It is released under CC-BY-NC-SA license and can be accessed by filling out a request form
Globally, songs and instrumental melodies are slower, higher, and use more stable pitches than speech: a registered report
Both music and language are found in all known human societies, yet no studies have compared similarities and differences between song, speech, and instrumental music on a global scale. In this Registered Report, we analyzed two global datasets: (i) 300 annotated audio recordings representing matched sets of traditional songs, recited lyrics, conversational speech, and instrumental melodies from our 75 coauthors speaking 55 languages; and (ii) 418 previously published adult-directed song and speech recordings from 209 individuals speaking 16 languages. Of our six preregistered predictions, five were strongly supported: Relative to speech, songs use (i) higher pitch, (ii) slower temporal rate, and (iii) more stable pitches, while both songs and speech used similar (iv) pitch interval size and (v) timbral brightness. Exploratory analyses suggest that features vary along a “musi-linguistic” continuum when including instrumental melodies and recited lyrics. Our study provides strong empirical evidence of cross-cultural regularities in music and speech
- …