Search CORE

28,340 research outputs found

Reconstructing Voices within the Multiple-Average-Voice-Model framework

Author: Gales Mark J F
King Simon
Lanchantin Pierre
Veaux Christophe
Yamagishi Junichi
Publication venue
Publication date: 01/09/2015
Field of study

Speech Processing in Computer Vision Applications

Author: Waterworth Nicholas
Publication venue: ScholarWorks@UARK
Publication date: 01/05/2020
Field of study

Deep learning has been recently proven to be a viable asset in determining features in the field of Speech Analysis. Deep learning methods like Convolutional Neural Networks facilitate the expansion of specific feature information in waveforms, allowing networks to create more feature dense representations of data. Our work attempts to address the problem of re-creating a face given a speaker\u27s voice and speaker identification using deep learning methods. In this work, we first review the fundamental background in speech processing and its related applications. Then we introduce novel deep learning-based methods to speech feature analysis. Finally, we will present our deep learning approaches to speaker identification and speech to face synthesis. The presented method can convert a speaker audio sample to an image of their predicted face. This framework is composed of several chained together networks, each with an essential step in the conversion process. These include Audio embedding, encoding, and face generation networks, respectively. Our experiments show that certain features can map to the face and that with a speaker\u27s voice, DNNs can create their face and that a GUI could be used in conjunction to display a speaker recognition network\u27s data

ScholarWorks@UARK

UARK (University of Arkansas )

Voice Conversion Based on Cross-Domain Features Using Variational Auto Encoders

Author: Huang Wen-Chin
Hwang Hsin-Te
Peng Yu-Huai
Tsao Yu
Wang Hsin-Min
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 29/08/2018
Field of study

An effective approach to non-parallel voice conversion (VC) is to utilize deep neural networks (DNNs), specifically variational auto encoders (VAEs), to model the latent structure of speech in an unsupervised manner. A previous study has confirmed the ef- fectiveness of VAE using the STRAIGHT spectra for VC. How- ever, VAE using other types of spectral features such as mel- cepstral coefficients (MCCs), which are related to human per- ception and have been widely used in VC, have not been prop- erly investigated. Instead of using one specific type of spectral feature, it is expected that VAE may benefit from using multi- ple types of spectral features simultaneously, thereby improving the capability of VAE for VC. To this end, we propose a novel VAE framework (called cross-domain VAE, CDVAE) for VC. Specifically, the proposed framework utilizes both STRAIGHT spectra and MCCs by explicitly regularizing multiple objectives in order to constrain the behavior of the learned encoder and de- coder. Experimental results demonstrate that the proposed CD- VAE framework outperforms the conventional VAE framework in terms of subjective tests.Comment: Accepted to ISCSLP 201

arXiv.org e-Print Archive

Crossref

Community Foundations: Learning from a Collective Experience: Process of Systematization

Author: Elena Luengas
Laura Sarvide
Magdalena Rubio
Vivian Blair
Publication venue: Government of Uganda Office of the Prime Minister
Publication date: 07/07/2004
Field of study

The report of a community foundation strengthening program involving eight Mexican community foundations: Tecate CF, Frontera Norte CF, Matamoros CF, Oaxaca CF, Puebla CF, Fundación Comunidad, Fundación del Empresariado Chihuahuense (FECHAC), and Fundación Internacional de la Comunidad (FIC). The report is also available in Spanish

IssueLab

Speaker-normalized sound representations in the human auditory cortex

Author: Chang E.
Fox N.
Johnson K.
Sjerps M.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2019
Field of study

The acoustic dimensions that distinguish speech sounds (like the vowel differences in “boot” and “boat”) also differentiate speakers’ voices. Therefore, listeners must normalize across speakers without losing linguistic information. Past behavioral work suggests an important role for auditory contrast enhancement in normalization: preceding context affects listeners’ perception of subsequent speech sounds. Here, using intracranial electrocorticography in humans, we investigate whether and how such context effects arise in auditory cortex. Participants identified speech sounds that were preceded by phrases from two different speakers whose voices differed along the same acoustic dimension as target words (the lowest resonance of the vocal tract). In every participant, target vowels evoke a speaker-dependent neural response that is consistent with the listener’s perception, and which follows from a contrast enhancement model. Auditory cortex processing thus displays a critical feature of normalization, allowing listeners to extract meaningful content from the voices of diverse speakers

eScholarship - University of California

Radboud Repository

MPG.PuRe

Exploring the Margins of Kotha Culture : Reconstructing a Courtesan’s life in Neelum Saran Gour’s \u3cem\u3eRequiem in Raga Janki\u3c/em\u3e

Author: Das Chhandita
Tripathi Priyanka
Publication venue: 'Purdue University (bepress)'
Publication date: 11/02/2022
Field of study

In their article, “Exploring the Margins of Kotha Culture: Reconstructing a Courtesan’s life in Neelum Saran Gour’s Requiem in Raga Janki,” Chhandita Das and Priyanka Tripathi discuss the invisible challenges in life of a famous courtesan Janki Bai Ilahabadi through close analysis of Neelum Saran Gour’s 2018 novel, Requiem in Raga Janki. In this novel, Janki belongs to the infamous kotha but she never fails to seek her subjectivity. This marginal place of Janaki’s belonging will be discussed by appropriating and the theoretical framework of Indian feminist Lata Singh’s (2007) for whom courtesans have been represented as “‘other’ in history” (1677). Other than Singh, bell hooks’ ‘margin as a space of radical openness’ (Yearning 228), Veena Oldenburg’s spectacular scholarship on courtesans’ in ‘Lifestyle as Resistance’ (1990) will be synthesized to deconstruct the social hierarchy. Although baijis or tawaifs in India possess rich artistic heritage but surprisingly enough they have been often in a questionable space wherein their individual and social integrity has been compromised. Gour attempts to rewrite life of a courtesan from Allahabad and in the process creates an alternative discourse or understanding of a courtesan’s life through Janki, matron, yes! not patron of Indian classical music and tradition

Purdue E-Pubs

Tensions in Creating Possibilities for Youth Voice in School Choice: An Ethnographer’s Reflexive Story of Research

Author: Cotnam-Kappel Megan
Publication venue: Canadian Society for the Study of Education / Société canadienne pour l'étude de l'éducation
Publication date: 18/07/2014
Field of study

The following article relates a reflexive ethnographic research project that focuses on youth voice in relation to the process of choosing a high school and a language of instruction in Ontario, Canada. The purpose of this methodological article is to relate a story of research and explore the tensions between theory and practice experienced by a young researcher during and after fieldwork. To do so, I explore the theoretical and epistemological underpinnings of the relevance and importance of youth-centred research and uncover some of the complexities of conducting participant observation, interviews, andco-analysis activities with youth participants.RésuméCet article présente un projet de recherche ethnographique réflexif qui se concentre sur les perspectives des jeunes dans leur processus de sélection d’une école secondaire et d’une langue d’enseignement dans la province Canadienne de l’Ontario. Le but de cet article méthodologique est de raconter l’histoire d’un projet de recherche et d’explorer les tensions entre la théorie et la pratique vécue par une jeune chercheure pendant et après le travail de terrain. Pour ce faire, j’explore les fondements théoriques et épistémologiques de la pertinence et de l’importance de la recherche centrée sur les enfants, et de mettre en lumière les complexités de l’observation participante, des entretiens et des activités de co-analyse de données avec les jeunes participants

Canadian Journal of Education (CJE) / Revue canadienne de l'éducation

Reconstruction of Phonated Speech from Whispers Using Formant-Derived Plausible Pitch Modulation

Author: Beigi Homayoon
Hamid Reza Sharifzadeh
Ian V. Mcloughlin
Jingjie Li
Joliveau Elodie
McLoughlin Ian Vince
Netsell Ronald
Rothenberg Martin
Sharifzadeh Hamid Reza
Sharifzadeh Hamid Reza
Su Lim Tan
Sundberg Johan
Toda Tomoki
Yan Song
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 11/05/2015
Field of study

Whispering is a natural, unphonated, secondary aspect of speech communications for most people. However, it is the primary mechanism of communications for some speakers who have impaired voice production mechanisms, such as partial laryngectomees, as well as for those prescribed voice rest, which often follows surgery or damage to the larynx. Unlike most people, who choose when to whisper and when not to, these speakers may have little choice but to rely on whispers for much of their daily vocal interaction. Even though most speakers will whisper at times, and some speakers can only whisper, the majority of today’s computational speech technology systems assume or require phonated speech. This article considers conversion of whispers into natural-sounding phonated speech as a noninvasive prosthetic aid for people with voice impairments who can only whisper. As a by-product, the technique is also useful for unimpaired speakers who choose to whisper. Speech reconstruction systems can be classified into those requiring training and those that do not. Among the latter, a recent parametric reconstruction framework is explored and then enhanced through a refined estimation of plausible pitch from weighted formant differences. The improved reconstruction framework, with proposed formant-derived artificial pitch modulation, is validated through subjective and objective comparison tests alongside state-of-the-art alternatives

Crossref

Kent Academic Repository