3 research outputs found

    The psychoacoustics and synthesis of singing harmony

    No full text
    The human singing voice is a remarkable instrument that compounds an immense amount of expressivity onto a single dimension. Apart from semantics and melody (pitch, duration and dynamics), accent, age, gender and emotion are all carried in the singing voice. While a single singing voice on its own is aesthetically pleasing to the ear, the addition of concurrent voices of different pitch is commonly known to be capable of producing a pleasing effect far greater than the sum of that produced by each contributing voice. This motivates the use of harmony in singing. Unfortunately, accompaniment voices are difficult to sing, even for professional singers. Thankfully singing synthesis has made it viable for this task to be undertaken by machines. The overall objective of this thesis is to advance today’s understanding of singing harmony and ultimately develop novel techniques for its synthetic reproduction. This is broken down into three parts. The first focuses on a psychophysical basis of harmony, the second focuses on the synthesis of the singing voice, while the third combines the first two to focus on the synthesis of harmonized singing. The first contribution is an attempt to find a psychoacoustic basis of harmony and presented in chapter 2. Apart from stationary harmony (chords, or sonorities: the aesthetics of a group of concurrent notes at one point of time), this also includes transitional harmony (chord progression, or resolution: the aesthetics of a similar group of notes progressing to another). In order to explain both stationary and transitional harmony, it introduces a theory of harmony based on the notions of interharmonic and subharmonic modulations. Acoustic measures of stationary and transitional harmony are proposed and the answers to five fundamental questions of psychoacoustic harmony are presented, both based on this theory. Correlations with existing music theory and perception statistics support this contribution with both stationary and transitional harmony. The second contribution is in the synthesis of the singing voice and presented in chapter 3. Modern singing synthesis methods are at best capable of word- level runtime synthesis, with only two known ones dedicated to realtime synthesis. This means that they are applicable only towards offline music production. A large part of the art of music and singing, however, is in realtime performance. With both of the existing realtime singing synthesis methods bounded by a phone- coverage to realtime-capability tradeoff, a need for one that overcomes it remains. A novel realtime singing synthesis system, SERAPHIM, is proposed as an answer to this. Apart from overcoming this phone-coverage to realtime-capability trade- off, subjective listening tests also showed that listeners preferred voices synthesized by SERAPHIM as opposed to other realtime systems. The third contribution is in the synthesis of singing harmony and presented in chapter 4. With this contribution, a novel method for singing harmony synthesis is proposed. Current implementations can be classified into pitch-inaccurate rule- based systems, timing-inaccurate inference-based systems, and hybrid systems that trade off between pitch inaccuracies and timing inaccuracies. This means that existing systems are vulnerable to either pitch errors, timing errors or both in different degrees of compromise. The challenge in the task was to overcome this compromise to develop a robust technique that is simultaneously resilient to both pitch and timing errors while producing harmonious accompaniment. Our strategy was to leverage on the pitch-accurate inference-based method while eliminating timing inaccuracies by use of machine-synchronization. Spectrograms revealed that harmonized voices produced by this method contain the least dissonances amongst existing methods. Subjective listening tests also showed that harmonized voices produced by this method are perceived to be the best sounding, both by vocal experts and by casual listeners. All in all, the work presented in this thesis contributes to the advancement of the psychoacoustic understanding and machine synthesis of singing harmony across one journal paper, three conference papers and three patents.Doctor of Philosoph

    The Science of Harmony: A Psychophysical Basis for Perceptual Tensions and Resolutions in Music

    No full text
    This paper attempts to establish a psychophysical basis for both stationary (tension in chord sonorities) and transitional (resolution in chord progressions) harmony. Harmony studies the phenomenon of combining notes in music to produce a pleasing effect greater than the sum of its parts. Being both aesthetic and mathematical in nature, it has baffled some of the brightest minds in physics and mathematics for centuries. With stationary harmony acoustics, traditional theories explaining consonances and dissonances that have been widely accepted are centred around two schools: rational relationships (commonly credited to Pythagoras) and Helmholtz’s beating frequencies. The first is more of an attribution than a psychoacoustic explanation while electrophysiological (amongst other) discrepancies with the second still remain disputed. Transitional harmony, on the other hand, is a more complex problem that has remained largely elusive to acoustic science even today. In order to address both stationary and transitional harmony, we first propose the notion of interharmonic and subharmonic modulations to address the summation of adjacent and distant sinusoids in a chord. Based on this, earlier parts of this paper then bridges the two schools and shows how they stem from a single equation. Later parts of the paper focuses on subharmonic modulations to explain aspects of harmony that interharmonic modulations cannot. Introducing the concept of stationary and transitional subharmonic tensions, we show how it can explain perceptual concepts such as tension in stationary harmony and resolution in transitional harmony, by which we also address the five fundamental questions of psychoacoustic harmony such as why the pleasing effect of harmony is greater than that of the sum of its parts. Finally, strong correlations with traditional music theory and perception statistics affirm our theory with stationary and transitional harmony

    Opportunities and Challenges of Designing Assistive Technologies for Aphasia Patients in Singapore: The Case of a Speech Evaluation Prototype

    No full text
    Aphasia is a language disorder caused by brain damage, resulting in difficulties with speaking, understanding, reading, and writing. This study focuses on addressing challenges faced by local therapists in aphasia treatment. Through an ethnographic study involving observations and interviews, critical issues in current technological solutions for aphasia treatment were identified. These issues include the lack of feedback during patient training, limited localized content, repetitive materials, and a lack of options for conversational speech training. In our study, we tried to address one of these issues concerning automatic evaluation of patient speech pronunciation. By utilizing public datasets and local patient data, we developed a system that provides accurate pronunciation scores and personalized feedback, assisting therapists in guiding patient progress. The system supports customized pronunciations, including local accents and dialects. The system is designed for multiple platforms, ensuring accessibility, and can be extended to involve speech therapists to enhance its capabilities. This study emphasizes the importance of integrating research insights with clinical practice, empowering therapists, and enhancing the quality of aphasia treatment
    corecore