Search CORE

30 research outputs found

Deep Relational Model: A Joint Probabilistic Model with a Hierarchical Structure for Bidirectional Estimation of Image and Labels

Author: Toru NAKASHIKA
Publication venue: 'Institute of Electronics, Information and Communications Engineers (IEICE)'
Publication date: 01/01/2018
Field of study

Two different types of representations, such as an image and its manually-assigned corresponding labels, generally have complex and strong relationships to each other. In this paper, we represent such deep relationships between two different types of visible variables using an energy-based probabilistic model, called a deep relational model (DRM) to improve the prediction accuracies. A DRM stacks several layers from one visible layer on to another visible layer, sandwiching several hidden layers between them. As with restricted Boltzmann machines (RBMs) and deep Boltzmann machines (DBMs), all connections (weights) between two adjacent layers are undirected. During maximum likelihood (ML) -based training, the network attempts to capture the latent complex relationships between two visible variables with its deep architecture. Unlike deep neural networks (DNNs), 1) the DRM is a totally generative model and 2) allows us to generate one visible variables given the other, and 2) the parameters can be optimized in a probabilistic manner. The DRM can be also fine-tuned using DNNs, like deep belief nets (DBNs) or DBMs pre-training. This paper presents experiments conduced to evaluate the performance of a DRM in image recognition and generation tasks using the MNIST data set. In the image recognition experiments, we observed that the DRM outperformed DNNs even without fine-tuning. In the image generation experiments, we obtained much more realistic images generated from the DRM more than those from the other generative models

Crossref

Creative Repository of Electro-Communications

Voice conversion using speaker-dependent conditional restricted Boltzmann machine

Author: Ariki Yasuo
Nakashika Toru
Takiguchi Tetsuya
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 25/02/2015
Field of study

Kobe University Repository Kernel

Complex-valued restricted Boltzmann machine for direct learning of frequency spectra

Author: Nakashika Toru
Takaki Shinji
Yamagishi Junichi
Publication venue: 'International Speech Communication Association'
Publication date: 24/08/2017
Field of study

Crossref

Edinburgh Research Explorer

Non-Parallel Training in Voice Conversion Using an Adaptive Restricted Boltzmann Machine

Author: Tetsuya Takiguchi
Toru Nakashika
Yasuhiro Minami
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/11/2016
Field of study

In this paper, we present a voice conversion (VC) method that does not use any parallel data while training the model. VC is a technique where only speaker-specific information in source speech is converted while keeping the phonological information unchanged. Most of the existing VC methods rely on parallel data-pairs of speech data from the source and target speakers uttering the same sentences. However, the use of parallel data in training causes several problems: 1) the data used for the training are limited to the predefined sentences, 2) the trained model is only applied to the speaker pair used in the training, and 3) mismatches in alignment may occur. Although it is, thus, fairly preferable in VC not to use parallel data, a nonparallel approach is considered difficult to learn. In our approach, we achieve nonparallel training based on a speaker adaptation technique and capturing latent phonological information. This approach assumes that speech signals are produced from a restricted Boltzmann machine-based probabilistic model, where phonological information and speaker-related information are defined explicitly. Speaker-independent and speaker-dependent parameters are simultaneously trained under speaker adaptive training. In the conversion stage, a given speech signal is decomposed into phonological and speaker-related information, the speaker-related information is replaced with that of the desired speaker, and then voice-converted speech is obtained by mixing the two. Our experimental results showed that our approach outperformed another nonparallel approach, and produced results similar to those of the popular conventional Gaussian mixture models-based method that used parallel data in subjective and objective criteria

Creative Repository of Electro-Communications

STFT Spectral Loss for Training a Neural Speech Waveform Model

Author: Nakashika Toru
Takaki Shinji
Wang Xin
Yamagishi Junichi
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 30/10/2018
Field of study

This paper proposes a new loss using short-time Fourier transform (STFT) spectra for the aim of training a high-performance neural speech waveform model that predicts raw continuous speech waveform samples directly. Not only amplitude spectra but also phase spectra obtained from generated speech waveforms are used to calculate the proposed loss. We also mathematically show that training of the waveform model on the basis of the proposed loss can be interpreted as maximum likelihood training that assumes the amplitude and phase spectra of generated speech waveforms following Gaussian and von Mises distributions, respectively. Furthermore, this paper presents a simple network architecture as the speech waveform model, which is composed of uni-directional long short-term memories (LSTMs) and an auto-regressive structure. Experimental results showed that the proposed neural model synthesized high-quality speech waveforms.Comment: Submitted to the 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP

arXiv.org e-Print Archive

Crossref

Edinburgh Research Explorer

Small-parallel exemplar-based voice conversion in noisy environments using affine non-negative matrix factorization

Author: A Kurematsu
A Mouchtaris
E Helander
JF Gemmeke
N Kitaoka
R Aihara
R Takashima
Ryo Aihara
T Toda
T Virtanen
Takao Fujii
Tetsuya Takiguchi
Toru Nakashika
Y Stylianou
Yasuo Ariki
Z Wu
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Speaker-adaptive-trainable Boltzmann machine and its application to non-parallel voice conversion

Author: Toru Nakashika
Yasuhiro Minami
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/06/2017
Field of study

Abstract In this paper, we present a voice conversion (VC) method that does not use any parallel data while training the model. Voice conversion is a technique where only speaker-specific information in the source speech is converted while keeping the phonological information unchanged. Most of the existing VC methods rely on parallel data—pairs of speech data from the source and target speakers uttering the same sentences. However, the use of parallel data in training causes several problems: (1) the data used for the training is limited to the pre-defined sentences, (2) the trained model is only applied to the speaker pair used in the training, and (3) a mismatch in alignment may occur. Although it is generally preferable in VC to not use parallel data, a non-parallel approach is considered difficult to learn. In our approach, we realize the non-parallel training based on speaker-adaptive training (SAT). Speech signals are represented using a probabilistic model based on the Boltzmann machine that defines phonological information and speaker-related information explicitly. Speaker-independent (SI) and speaker-dependent (SD) parameters are simultaneously trained using SAT. In the conversion stage, a given speech signal is decomposed into phonological and speaker-related information, the speaker-related information is replaced with that of the desired speaker, and then voice-converted speech is obtained by combining the two. Our experimental results showed that our approach outperformed the conventional non-parallel approach regarding objective and subjective criteria

Directory of Open Access Journals

Pre-Training of DNN-Based Speech Synthesis Based on Bidirectional Conversion between Text and Speech

Author: Kentaro SONE
Toru NAKASHIKA
Publication venue: 'Institute of Electronics, Information and Communications Engineers (IEICE)'
Publication date
Field of study

Crossref