2,450 research outputs found
Flow-based Intrinsic Curiosity Module
In this paper, we focus on a prediction-based novelty estimation strategy
upon the deep reinforcement learning (DRL) framework, and present a flow-based
intrinsic curiosity module (FICM) to exploit the prediction errors from optical
flow estimation as exploration bonuses. We propose the concept of leveraging
motion features captured between consecutive observations to evaluate the
novelty of observations in an environment. FICM encourages a DRL agent to
explore observations with unfamiliar motion features, and requires only two
consecutive frames to obtain sufficient information when estimating the
novelty. We evaluate our method and compare it with a number of existing
methods on multiple benchmark environments, including Atari games, Super Mario
Bros., and ViZDoom. We demonstrate that FICM is favorable to tasks or
environments featuring moving objects, which allow FICM to utilize the motion
features between consecutive observations. We further ablatively analyze the
encoding efficiency of FICM, and discuss its applicable domains
comprehensively.Comment: The SOLE copyright holder is IJCAI (International Joint Conferences
on Artificial Intelligence), all rights reserved. The link is provided as
follows: https://www.ijcai.org/Proceedings/2020/28
Rhythm-Flexible Voice Conversion without Parallel Data Using Cycle-GAN over Phoneme Posteriorgram Sequences
Speaking rate refers to the average number of phonemes within some unit time,
while the rhythmic patterns refer to duration distributions for realizations of
different phonemes within different phonetic structures. Both are key
components of prosody in speech, which is different for different speakers.
Models like cycle-consistent adversarial network (Cycle-GAN) and variational
auto-encoder (VAE) have been successfully applied to voice conversion tasks
without parallel data. However, due to the neural network architectures and
feature vectors chosen for these approaches, the length of the predicted
utterance has to be fixed to that of the input utterance, which limits the
flexibility in mimicking the speaking rates and rhythmic patterns for the
target speaker. On the other hand, sequence-to-sequence learning model was used
to remove the above length constraint, but parallel training data are needed.
In this paper, we propose an approach utilizing sequence-to-sequence model
trained with unsupervised Cycle-GAN to perform the transformation between the
phoneme posteriorgram sequences for different speakers. In this way, the length
constraint mentioned above is removed to offer rhythm-flexible voice conversion
without requiring parallel data. Preliminary evaluation on two datasets showed
very encouraging results.Comment: 8 pages, 6 figures, Submitted to SLT 201
- …