Search CORE

195 research outputs found

Adapting Prosody in a Text-to-Speech System

Author: Caglayan Erdem
Janez Stergar
Publication venue: 'IntechOpen'
Publication date: 02/11/2010
Field of study

IntechOpen

A Vector Quantized Variational Autoencoder (VQ-VAE) Autoregressive Neural F0 Model for Statistical Parametric Speech Synthesis

Author: King Simon
Takaki Shinji
Tokuda Keiichi
Wang Xin
Yamagishi Junichi
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2020
Field of study

Edinburgh Research Explorer

Multiple Feed-forward Deep Neural Networks for Statistical Parametric Speech Synthesis

Author: Kim JongJin
Kim SangJin
Takaki Shinji
Yamagishi Junichi
Publication venue
Publication date: 01/01/2015
Field of study

Edinburgh Research Explorer

Autoregressive neural F0 model for statistical parametric speech synthesis

Author: Takaki Shinji
Wang Xin
Yamagishi Junichi
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 19/04/2018
Field of study

Crossref

Edinburgh Research Explorer

An Overview of Indian Spoken Language Recognition from Machine Learning Perspective

Author: Dey Spandan
Saha Goutam
Sahidullah Md
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/11/2022
Field of study

International audienceAutomatic spoken language identification (LID) is a very important research field in the era of multilingual voice-command-based human-computer interaction (HCI). A front-end LID module helps to improve the performance of many speech-based applications in the multilingual scenario. India is a populous country with diverse cultures and languages. The majority of the Indian population needs to use their respective native languages for verbal interaction with machines. Therefore, the development of efficient Indian spoken language recognition systems is useful for adapting smart technologies in every section of Indian society. The field of Indian LID has started gaining momentum in the last two decades, mainly due to the development of several standard multilingual speech corpora for the Indian languages. Even though significant research progress has already been made in this field, to the best of our knowledge, there are not many attempts to analytically review them collectively. In this work, we have conducted one of the very first attempts to present a comprehensive review of the Indian spoken language recognition research field. In-depth analysis has been presented to emphasize the unique challenges of low-resource and mutual influences for developing LID systems in the Indian contexts. Several essential aspects of the Indian LID research, such as the detailed description of the available speech corpora, the major research contributions, including the earlier attempts based on statistical modeling to the recent approaches based on different neural network architectures, and the future research trends are discussed. This review work will help assess the state of the present Indian LID research by any active researcher or any research enthusiasts from related fields

INRIA a CCSD electronic archive server

Proceedings of the Sixteenth Australasian International Conference on Speech Science and Technology

Author
Publication venue: ASSTA
Publication date: 31/12/2016
Field of study

UCL Discovery

Temporal contextual descriptors and applications to emotion analysis.

Author: Balti Haythem
Publication venue: ThinkIR: The University of Louisville\u27s Institutional Repository
Publication date: 01/12/2014
Field of study

The current trends in technology suggest that the next generation of services and devices allows smarter customization and automatic context recognition. Computers learn the behavior of the users and can offer them customized services depending on the context, location, and preferences. One of the most important challenges in human-machine interaction is the proper understanding of human emotions by machines and automated systems. In the recent years, the progress made in machine learning and pattern recognition led to the development of algorithms that are able to learn the detection and identification of human emotions from experience. These algorithms use different modalities such as image, speech, and physiological signals to analyze and learn human emotions. In many settings, the vocal information might be more available than other modalities due to widespread of voice sensors in phones, cars, and computer systems in general. In emotion analysis from speech, an audio utterance is represented by an ordered (in time) sequence of features or a multivariate time series. Typically, the sequence is further mapped into a global descriptor representative of the entire utterance/sequence. This descriptor is used for classification and analysis. In classic approaches, statistics are computed over the entire sequence and used as a global descriptor. This often results in the loss of temporal ordering from the original sequence. Emotion is a succession of acoustic events. By discarding the temporal ordering of these events in the mapping, the classic approaches cannot detect acoustic patterns that lead to a certain emotion. In this dissertation, we propose a novel feature mapping framework. The proposed framework maps temporally ordered sequence of acoustic features into data-driven global descriptors that integrate the temporal information from the original sequence. The framework contains three mapping algorithms. These algorithms integrate the temporal information implicitly and explicitly in the descriptor\u27s representation. In the rst algorithm, the Temporal Averaging Algorithm, we average the data temporally using leaky integrators to produce a global descriptor that implicitly integrates the temporal information from the original sequence. In order to integrate the discrimination between classes in the mapping, we propose the Temporal Response Averaging Algorithm which combines the temporal averaging step of the previous algorithm and unsupervised learning to produce data driven temporal contextual descriptors. In the third algorithm, we use the topology preserving property of the Self-Organizing Maps and the continuous nature of speech to map a temporal sequence into an ordered trajectory representing the behavior over time of the input utterance on a 2-D map of emotions. The temporal information is integrated explicitly in the descriptor which makes it easier to monitor emotions in long speeches. The proposed mapping framework maps speech data of different length to the same equivalent representation which alleviates the problem of dealing with variable length temporal sequences. This is advantageous in real time setting where the size of the analysis window can be variable. Using the proposed feature mapping framework, we build a novel data-driven speech emotion detection and recognition system that indexes speech databases to facilitate the classification and retrieval of emotions. We test the proposed system using two datasets. The first corpus is acted. We showed that the proposed mapping framework outperforms the classic approaches while providing descriptors that are suitable for the analysis and visualization of humans’ emotions in speech data. The second corpus is an authentic dataset. In this dissertation, we evaluate the performances of our system using a collection of debates. For that purpose, we propose a novel debate collection that is one of the first initiatives in the literature. We show that the proposed system is able to learn human emotions from debates

University of Louisville

Products and Services

Author
Publication venue: 'IntechOpen'
Publication date: 20/04/2021
Field of study

Todayâ€™s global economy offers more opportunities, but is also more complex and competitive than ever before. This fact leads to a wide range of research activity in different fields of interest, especially in the so-called high-tech sectors. This book is a result of widespread research and development activity from many researchers worldwide, covering the aspects of development activities in general, as well as various aspects of the practical application of knowledge

Directory of Open Access Books (DOAB)

The hearing hippocampus

Author: Billig AJ
Griffiths TD
Lad M
Sedley W
Publication venue: 'Elsevier BV'
Publication date: 01/11/2022
Field of study

The hippocampus has a well-established role in spatial and episodic memory but a broader function has been proposed including aspects of perception and relational processing. Neural bases of sound analysis have been described in the pathway to auditory cortex, but wider networks supporting auditory cognition are still being established. We review what is known about the role of the hippocampus in processing auditory information, and how the hippocampus itself is shaped by sound. In examining imaging, recording, and lesion studies in species from rodents to humans, we uncover a hierarchy of hippocampal responses to sound including during passive exposure, active listening, and the learning of associations between sounds and other stimuli. We describe how the hippocampus' connectivity and computational architecture allow it to track and manipulate auditory information – whether in the form of speech, music, or environmental, emotional, or phantom sounds. Functional and structural correlates of auditory experience are also identified. The extent of auditory-hippocampal interactions is consistent with the view that the hippocampus makes broad contributions to perception and cognition, beyond spatial and episodic memory. More deeply understanding these interactions may unlock applications including entraining hippocampal rhythms to support cognition, and intervening in links between hearing loss and dementia

UCL Discovery