28,340 research outputs found

    Speech Processing in Computer Vision Applications

    Get PDF
    Deep learning has been recently proven to be a viable asset in determining features in the field of Speech Analysis. Deep learning methods like Convolutional Neural Networks facilitate the expansion of specific feature information in waveforms, allowing networks to create more feature dense representations of data. Our work attempts to address the problem of re-creating a face given a speaker\u27s voice and speaker identification using deep learning methods. In this work, we first review the fundamental background in speech processing and its related applications. Then we introduce novel deep learning-based methods to speech feature analysis. Finally, we will present our deep learning approaches to speaker identification and speech to face synthesis. The presented method can convert a speaker audio sample to an image of their predicted face. This framework is composed of several chained together networks, each with an essential step in the conversion process. These include Audio embedding, encoding, and face generation networks, respectively. Our experiments show that certain features can map to the face and that with a speaker\u27s voice, DNNs can create their face and that a GUI could be used in conjunction to display a speaker recognition network\u27s data

    Voice Conversion Based on Cross-Domain Features Using Variational Auto Encoders

    Full text link
    An effective approach to non-parallel voice conversion (VC) is to utilize deep neural networks (DNNs), specifically variational auto encoders (VAEs), to model the latent structure of speech in an unsupervised manner. A previous study has confirmed the ef- fectiveness of VAE using the STRAIGHT spectra for VC. How- ever, VAE using other types of spectral features such as mel- cepstral coefficients (MCCs), which are related to human per- ception and have been widely used in VC, have not been prop- erly investigated. Instead of using one specific type of spectral feature, it is expected that VAE may benefit from using multi- ple types of spectral features simultaneously, thereby improving the capability of VAE for VC. To this end, we propose a novel VAE framework (called cross-domain VAE, CDVAE) for VC. Specifically, the proposed framework utilizes both STRAIGHT spectra and MCCs by explicitly regularizing multiple objectives in order to constrain the behavior of the learned encoder and de- coder. Experimental results demonstrate that the proposed CD- VAE framework outperforms the conventional VAE framework in terms of subjective tests.Comment: Accepted to ISCSLP 201

    Community Foundations: Learning from a Collective Experience: Process of Systematization

    Get PDF
    The report of a community foundation strengthening program involving eight Mexican community foundations: Tecate CF, Frontera Norte CF, Matamoros CF, Oaxaca CF, Puebla CF, FundaciĂłn Comunidad, FundaciĂłn del Empresariado Chihuahuense (FECHAC), and FundaciĂłn Internacional de la Comunidad (FIC). The report is also available in Spanish

    Speaker-normalized sound representations in the human auditory cortex

    Get PDF
    The acoustic dimensions that distinguish speech sounds (like the vowel differences in “boot” and “boat”) also differentiate speakers’ voices. Therefore, listeners must normalize across speakers without losing linguistic information. Past behavioral work suggests an important role for auditory contrast enhancement in normalization: preceding context affects listeners’ perception of subsequent speech sounds. Here, using intracranial electrocorticography in humans, we investigate whether and how such context effects arise in auditory cortex. Participants identified speech sounds that were preceded by phrases from two different speakers whose voices differed along the same acoustic dimension as target words (the lowest resonance of the vocal tract). In every participant, target vowels evoke a speaker-dependent neural response that is consistent with the listener’s perception, and which follows from a contrast enhancement model. Auditory cortex processing thus displays a critical feature of normalization, allowing listeners to extract meaningful content from the voices of diverse speakers

    Exploring the Margins of Kotha Culture : Reconstructing a Courtesan’s life in Neelum Saran Gour’s \u3cem\u3eRequiem in Raga Janki\u3c/em\u3e

    Get PDF
    In their article, “Exploring the Margins of Kotha Culture: Reconstructing a Courtesan’s life in Neelum Saran Gour’s Requiem in Raga Janki,” Chhandita Das and Priyanka Tripathi discuss the invisible challenges in life of a famous courtesan Janki Bai Ilahabadi through close analysis of Neelum Saran Gour’s 2018 novel, Requiem in Raga Janki. In this novel, Janki belongs to the infamous kotha but she never fails to seek her subjectivity. This marginal place of Janaki’s belonging will be discussed by appropriating and the theoretical framework of Indian feminist Lata Singh’s (2007) for whom courtesans have been represented as “‘other’ in history” (1677). Other than Singh, bell hooks’ ‘margin as a space of radical openness’ (Yearning 228), Veena Oldenburg’s spectacular scholarship on courtesans’ in ‘Lifestyle as Resistance’ (1990) will be synthesized to deconstruct the social hierarchy. Although baijis or tawaifs in India possess rich artistic heritage but surprisingly enough they have been often in a questionable space wherein their individual and social integrity has been compromised. Gour attempts to rewrite life of a courtesan from Allahabad and in the process creates an alternative discourse or understanding of a courtesan’s life through Janki, matron, yes! not patron of Indian classical music and tradition

    Tensions in Creating Possibilities for Youth Voice in School Choice: An Ethnographer’s Reflexive Story of Research

    Get PDF
    The following article relates a reflexive ethnographic research project that focuses on youth voice in relation to the process of choosing a high school and a language of instruction in Ontario, Canada. The purpose of this methodological article is to relate a story of research and explore the tensions between theory and practice experienced by a young researcher during and after fieldwork. To do so, I explore the theoretical and epistemological underpinnings of the relevance and importance of youth-centred research and uncover some of the complexities of conducting participant observation, interviews, andco-analysis activities with youth participants.RĂ©sumĂ©Cet article prĂ©sente un projet de recherche ethnographique rĂ©flexif qui se concentre sur les perspectives des jeunes dans leur processus de sĂ©lection d’une Ă©cole secondaire et d’une langue d’enseignement dans la province Canadienne de l’Ontario. Le but de cet article mĂ©thodologique est de raconter l’histoire d’un projet de recherche et d’explorer les tensions entre la thĂ©orie et la pratique vĂ©cue par une jeune chercheure pendant et aprĂšs le travail de terrain. Pour ce faire, j’explore les fondements thĂ©oriques et Ă©pistĂ©mologiques de la pertinence et de l’importance de la recherche centrĂ©e sur les enfants, et de mettre en lumiĂšre les complexitĂ©s de l’observation participante, des entretiens et des activitĂ©s de co-analyse de donnĂ©es avec les jeunes participants

    Reconstruction of Phonated Speech from Whispers Using Formant-Derived Plausible Pitch Modulation

    Get PDF
    Whispering is a natural, unphonated, secondary aspect of speech communications for most people. However, it is the primary mechanism of communications for some speakers who have impaired voice production mechanisms, such as partial laryngectomees, as well as for those prescribed voice rest, which often follows surgery or damage to the larynx. Unlike most people, who choose when to whisper and when not to, these speakers may have little choice but to rely on whispers for much of their daily vocal interaction. Even though most speakers will whisper at times, and some speakers can only whisper, the majority of today’s computational speech technology systems assume or require phonated speech. This article considers conversion of whispers into natural-sounding phonated speech as a noninvasive prosthetic aid for people with voice impairments who can only whisper. As a by-product, the technique is also useful for unimpaired speakers who choose to whisper. Speech reconstruction systems can be classified into those requiring training and those that do not. Among the latter, a recent parametric reconstruction framework is explored and then enhanced through a refined estimation of plausible pitch from weighted formant differences. The improved reconstruction framework, with proposed formant-derived artificial pitch modulation, is validated through subjective and objective comparison tests alongside state-of-the-art alternatives
    • 

    corecore