Search CORE

264,215 research outputs found

Singing Voice Synthesis Using Differentiable LPC and Glottal-Flow-Inspired Wavetables

Author: Fazekas G
International Society for Music Information Retrieval (ISMIR)
Yu C-Y
Publication venue: International Society for Music Information Retrieval (ISMIR)
Publication date: 01/01/2023
Field of study

This paper introduces GlOttal-flow LPC Filter (GOLF), a novel method for singing voice synthesis (SVS) that exploits the physical characteristics of the human voice using differentiable digital signal processing. GOLF employs a glottal model as the harmonic source and IIR filters to simulate the vocal tract, resulting in an interpretable and efficient approach. We show it is competitive with state-of-the-art singing voice vocoders, requiring fewer synthesis parameters and less memory to train, and runs an order of magnitude faster for inference. Additionally, we demonstrate that GOLF can model the phase components of the human voice, which has immense potential for rendering and analysing singing voices in a differentiable manner. Our results highlight the effectiveness of incorporating the physical properties of the human voice mechanism into SVS and underscore the advantages of signal-processing-based approaches, which offer greater interpretability and efficiency in synthesis

Queen Mary Research Online

Singing Voice Synthesis Using Differentiable LPC and Glottal-Flow-Inspired Wavetables

Author: Fazekas György
Yu Chin-Yun
Publication venue
Publication date: 12/07/2023
Field of study

arXiv.org e-Print Archive

Voice Feature Extraction for Gender and Emotion Recognition

Author: Anupama Subramanian
Khalife Sarah
Nair Vani
Nashipudimath Dr. Madhu
Pillai Pooja
Publication venue: 'Auricle Technologies, Pvt., Ltd.'
Publication date: 31/05/2021
Field of study

Voice recognition plays a key role in spoken communication that helps to identify the emotions of a person that reflects in the voice. Gender classification through speech is a widely used Human Computer Interaction (HCI) as it is not easy to identify gender by computer. This led to the development of a model for “Voice feature extraction for Emotion and Gender Recognition”. The speech signal consists of semantic information, speaker information (gender, age, emotional state), accompanied by noise. Females and males have different voice characteristics due to their acoustical and perceptual differences along with a variety of emotions which convey their own unique perceptions. In order to explore this area, feature extraction requires pre- processing of data, which is necessary for increasing the accuracy. The proposed model follows steps such as data extraction, pre- processing using Voice Activity Detector (VAD), feature extraction using Mel-Frequency Cepstral Coefficient (MFCC), feature reduction by Principal Component Analysis (PCA) and Support Vector Machine (SVM) classifier. The proposed combination of techniques produced better results which can be useful in the healthcare sector, virtual assistants, security purposes and other fields related to the Human Machine Interaction domain.&nbsp

International Journal on Recent and Innovation Trends in Computing and Communication

Tracking the Sound of Human Affection: EEG Signals Reveal Online Decoding of Socio-Emotional Expression in Human Speech and Voice

Author: Jiang Xiaoming
Publication venue: 'IntechOpen'
Publication date: 08/02/2017
Field of study

This chapter provides a perspective from the latest EEG evidence in how brain signals enlighten the neurophysiological and neurocognitive mechanisms underlying the recognition of socioemotional expression conveyed in human speech and voice, drawing upon event‐related potentials’ studies (ERPs). Human sound can encode emotional meanings by different vocal parameters in words, real‐ vs. pseudo‐speeches, and vocalizations. Based on the ERP findings, recent development of the three‐stage model in vocal processing has highlighted initial‐ and late‐stage processing of vocal emotional stimuli. These processes, depending on which ERP components they were mapped onto, can be divided into the acoustic analysis, relevance and motivational processing, fine‐grained meaning analysis/integration/access, and higher‐level social inference, as the unfolding of the time scale. ERP studies on vocal socioemotions, such as happiness, anger, fear, sadness, neutral, sincerity, confidence, and sarcasm in the human voice and speech have employed different experimental paradigms such as crosssplicing, crossmodality priming, oddball, stroop, etc. Moreover, task demand and listener characteristics affect the neural responses underlying the decoding processes, revealing the role of attention deployment and interpersonal sensitivity in the neural decoding of vocal emotional stimuli. Cultural orientation affects our ability to decode emotional meaning in the voice. Neurophysiological patterns were compared between normal and abnormal emotional processing in the vocal expressions, especially in schizophrenia and in congenital amusia. Future directions highlight the study on human vocal expression aligning with other nonverbal cues, such as facial and body language, and the need to synchronize listener\u27s brain potentials with other peripheral measures

IntechOpen

Speaker identification based on hybrid feature extraction techniques

Author: Abualadas Feras E.
Al-Ani Muzhir Shaban
M.Khedher Akram M. Zeki
Messikh Az-Eddine
Publication venue: 'The Science and Information Organization'
Publication date: 01/01/2019
Field of study

One of the most exciting areas of signal processing is speech processing; speech contains many features or characteristics that can discriminate the identity of the person. The human voice is considered one of the important biometric characteristics that can be used for person identification. This work is concerned with studying the effect of appropriate extracted features from various levels of discrete wavelet transformation (DWT) and the concatenation of two techniques (discrete wavelet and curvelet transform) and study the effect of reducing the number of features by using principal component analysis (PCA) on speaker identification. Backpropagation (BP) neural network was also introduced as a classifier

The International Islamic University Malaysia Repository

Blood Pressure Estimation from Speech Recordings: Exploring the Role of Voice-over Artists

Author: Mulay Preeti
Raje Rajeev
Rajput Vaishali
Publication venue: Auricle Global Society of Education and Research
Publication date: 07/10/2023
Field of study

Hypertension, a prevalent global health concern, is associated with cardiovascular diseases and significant morbidity and mortality. Accurate and prompt Blood Pressure monitoring is crucial for early detection and successful management. Traditional cuff-based methods can be inconvenient, leading to the exploration of non-invasive and continuous estimation methods. This research aims to bridge the gap between speech processing and health monitoring by investigating the relationship between speech recordings and Blood Pressure estimation. Speech recordings offer promise for non-invasive Blood Pressure estimation due to the potential link between vocal characteristics and physiological responses. In this study, we focus on the role of Voice-over Artists, known for their ability to convey emotions through voice. By exploring the expertise of Voice-over Artists in controlling speech and expressing emotions, we seek valuable insights into the potential correlation between speech characteristics and Blood Pressure. This research sheds light on presenting an innovative and convenient approach to health assessment. By unraveling the specific role of Voice-over Artists in this process, the study lays the foundation for future advancements in healthcare and human-robot interactions. Through the exploration of speech characteristics and emotional expression, this investigation offers valuable insights into the correlation between vocal features and Blood Pressure levels. By leveraging the expertise of Voice-over Artists in conveying emotions through voice, this study enriches our understanding of the intricate relationship between speech recordings and physiological responses, opening new avenues for the integration of voice-related factors in healthcare technologies

International Journal on Recent and Innovation Trends in Computing and Communication

The acoustics of concentric sources and receivers – human voice and hearing applications

Author: Miranda Jofre Luis Alejandro
Publication venue: Faculty of Architecture, Design and Planning
Publication date: 01/01/2016
Field of study

One of the most common ways in which we experience environments acoustically is by listening to the reflections of our own voice in a space. By listening to our own voice we adjust its characteristics to suit the task and audience. This is of particular importance in critical voice tasks such as actors or singers on a stage with no additional electroacoustic or other amplification (e.g. in ear monitors, loudspeakers, etc.). Despite the usualness of this situation, there are very few acoustic measurements aimed to quantify it and even fewer that address the problem of having a source and receiver that are very closely located. The aim of this thesis is to introduce new measurement transducers and methods that quantify correctly this situation. This is achieved by analysing the characteristics of the human as a source, a receiver and their interaction in close proximity when placed in acoustical environments. The characteristics of the human voice and human ear are analysed in this thesis in a similar manner as a loudspeaker or microphone would be analysed. This provides the basis for further analysis by making them analogous to measurement transducers. These results are then used to explore the consequences of having a source and receiver very closely located using acoustic room simulation. Different techniques for processing data using directional transducers in real rooms are introduced. The majority of the data used in this thesis was obtained in rooms used for performance. The final chapters of this thesis include details of the design and construction of a concentric directional transducer, where an array of microphones and loudspeakers occupy the same structure. Finally, sample measurements with this transducer are presented

Sydney eScholarship

Breaking voice identity perception: Expressive voices are more confusable for listeners.

Author: Burston LF
Knight S
Ladwa P
Lavan N
McGettigan C
Merriman SE
Publication venue: 'SAGE Publications'
Publication date: 26/02/2019
Field of study

The human voice is a highly flexible instrument for self-expression, yet voice identity perception is largely studied using controlled speech recordings. Using two voice-sorting tasks with naturally varying stimuli, we compared the performance of listeners who were familiar and unfamiliar with the TV show Breaking Bad. Listeners organised audio clips of speech with (1) low-expressiveness and (2) high-expressiveness into perceived identities. We predicted that increased expressiveness (e.g., shouting, strained voice) would significantly impair performance. Overall, while unfamiliar listeners were less able to generalise identity across exemplars, the two groups performed equivalently well when telling voices apart when dealing with low-expressiveness stimuli. However, high vocal expressiveness significantly impaired telling apart in both the groups: this led to increased misidentifications, where sounds from one character were assigned to the other. These misidentifications were highly consistent for familiar listeners but less consistent for unfamiliar listeners. Our data suggest that vocal flexibility has powerful effects on identity perception, where changes in the acoustic properties of vocal signals introduced by expressiveness lead to effects apparent in familiar and unfamiliar listeners alike. At the same time, expressiveness appears to have affected other aspects of voice identity processing selectively in one listener group but not the other, thus revealing complex interactions of stimulus properties and listener characteristics (i.e., familiarity) in identity processing

Crossref

UCL Discovery

Queen Mary Research Online

Poetry in Pandemic: A Multimodal Neuroaesthetic Study on the Emotional Reaction to the Divina Commedia Poem

Author: Alessia Vozzi
Andrea Giorgi
Bianca Maria Serena Inguscio
Fabio Babiloni
Giulia Cartocci
Paolo Canettieri
Silvia Ferrara
Simone Palmieri
Stefano Menicocci
Publication venue: place:Basel, Switzerland
Publication date: 01/01/2023
Field of study

Poetry elicits emotions, and emotion is a fundamental component of human ontogeny. Although neuroaesthetics is a rapidly developing field of research, few studies focus on poetry, and none address its different modalities of fruition (MOF) of universal cultural heritage works, such as the Divina Commedia (DC) poem. Moreover, alexithymia (AX) resulted in being a psychological risk factor during the Covid-19 pandemic. The present study aims to investigate the emotional response to poetry excerpts from different cantica (Inferno, Purgatorio, Paradiso) of DC with the dual objective of assessing the impact of both the structure of the poem and MOF and that of the characteristics of the acting voice in experts and non-experts, also considering AX. Online emotion facial coding biosignal (BS) techniques, self-reported and psychometric measures were applied to 131 literary (LS) and scientific (SS) university students. BS results show that LS globally manifest more JOY than SS in both reading and listening MOF and more FEAR towards Inferno. Furthermore, LS and SS present different results regarding NEUTRAL emotion about acting voice. AX influences listening in NEUTRAL and SURPRISE expressions. DC’s structure affects DISGUST and SADNESS during listening, regardless of participant characteristics. PLEASANTNESS varies according to DC’s structure and the acting voice, as well as AROUSAL, which is also correlated with AX. Results are discussed in light of recent findings in affective neuroscience and neuroaesthetics, suggesting the critical role of poetry and listening in supporting human emotional processing

Archivio della ricerca- Università di Roma La Sapienza

Hey Dona! Can you help me with student course registration?

Author: Jenq John
Kalvakurthi Vishesh
Varde Aparna S.
Publication venue
Publication date: 21/03/2023
Field of study

In this paper, we present a demo of an intelligent personal agent called Hey Dona (or just Dona) with virtual voice assistance in student course registration. It is a deployed project in the theme of AI for education. In this digital age with a myriad of smart devices, users often delegate tasks to agents. While pointing and clicking supersedes the erstwhile command-typing, modern devices allow users to speak commands for agents to execute tasks, enhancing speed and convenience. In line with this progress, Dona is an intelligent agent catering to student needs by automated, voice-operated course registration, spanning a multitude of accents, entailing task planning optimization, with some language translation as needed. Dona accepts voice input by microphone (Bluetooth, wired microphone), converts human voice to computer understandable language, performs query processing as per user commands, connects with the Web to search for answers, models task dependencies, imbibes quality control, and conveys output by speaking to users as well as displaying text, thus enabling human-AI interaction by speech cum text. It is meant to work seamlessly on desktops, smartphones etc. and in indoor as well as outdoor settings. To the best of our knowledge, Dona is among the first of its kind as an intelligent personal agent for voice assistance in student course registration. Due to its ubiquitous access for educational needs, Dona directly impacts AI for education. It makes a broader impact on smart city characteristics of smart living and smart people due to its contributions to providing benefits for new ways of living and assisting 21st century education, respectively

arXiv.org e-Print Archive