Search CORE

391 research outputs found

Voice Conversion Based on Cross-Domain Features Using Variational Auto Encoders

Author: Huang Wen-Chin
Hwang Hsin-Te
Peng Yu-Huai
Tsao Yu
Wang Hsin-Min
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 29/08/2018
Field of study

An effective approach to non-parallel voice conversion (VC) is to utilize deep neural networks (DNNs), specifically variational auto encoders (VAEs), to model the latent structure of speech in an unsupervised manner. A previous study has confirmed the ef- fectiveness of VAE using the STRAIGHT spectra for VC. How- ever, VAE using other types of spectral features such as mel- cepstral coefficients (MCCs), which are related to human per- ception and have been widely used in VC, have not been prop- erly investigated. Instead of using one specific type of spectral feature, it is expected that VAE may benefit from using multi- ple types of spectral features simultaneously, thereby improving the capability of VAE for VC. To this end, we propose a novel VAE framework (called cross-domain VAE, CDVAE) for VC. Specifically, the proposed framework utilizes both STRAIGHT spectra and MCCs by explicitly regularizing multiple objectives in order to constrain the behavior of the learned encoder and de- coder. Experimental results demonstrate that the proposed CD- VAE framework outperforms the conventional VAE framework in terms of subjective tests.Comment: Accepted to ISCSLP 201

arXiv.org e-Print Archive

Crossref

Poly[diaqua-μ4-biphenyl-4,4′-dicarboxylato-magnesium(II)]

Author: Chia-Her Lin
Hsin-Kuan Liu
Kitagawa
Sheldrick
Xiang-Wen Peng
Publication venue: International Union of Crystallography
Publication date: 01/02/2009
Field of study

The solvothermal reaction of magnesium nitrate with biphenyl-4,4′-dicarboxylic acid in N,N-dimethylformamide and water leads to the formation of crystals of the title complex, [Mg(C14H8O4)(H2O)2]n. In the crystal structure, the Mg cations are coordinated by six O atoms from two water molecules and four symmetry-related biphenyl-4,4′-dicarboxylate anions within slightly distorted octahedra. The Mg cations are located on a center of inversion, the biphenyl-4,4′-dicarboxylate anions around a twofold rotation axis and the water molecule in a general position. The Mg cations are linked by the anions into a three-dimensional framework

Crossref

Directory of Open Access Journals

PubMed Central

Transformer-based Image Compression with Variable Image Quality Objectives

Author: Chen Yi-Hsin
Chien Cheng
Chiu Wei-Chen
Kao Chia-Hao
Peng Wen-Hsiao
Publication venue
Publication date: 22/09/2023
Field of study

This paper presents a Transformer-based image compression system that allows for a variable image quality objective according to the user's preference. Optimizing a learned codec for different quality objectives leads to reconstructed images with varying visual characteristics. Our method provides the user with the flexibility to choose a trade-off between two image quality objectives using a single, shared model. Motivated by the success of prompt-tuning techniques, we introduce prompt tokens to condition our Transformer-based autoencoder. These prompt tokens are generated adaptively based on the user's preference and input image through learning a prompt generation network. Extensive experiments on commonly used quality metrics demonstrate the effectiveness of our method in adapting the encoding and/or decoding processes to a variable quality objective. While offering the additional flexibility, our proposed method performs comparably to the single-objective methods in terms of rate-distortion performance

arXiv.org e-Print Archive

Transformer-based Variable-rate Image Compression with Region-of-interest Control

Author: Chen Yi-Hsin
Chiu Wei-Chen
Kao Chia-Hao
Peng Wen-Hsiao
Weng Ying-Chieh
Publication venue
Publication date: 14/07/2023
Field of study

This paper proposes a transformer-based learned image compression system. It is capable of achieving variable-rate compression with a single model while supporting the region-of-interest (ROI) functionality. Inspired by prompt tuning, we introduce prompt generation networks to condition the transformer-based autoencoder of compression. Our prompt generation networks generate content-adaptive tokens according to the input image, an ROI mask, and a rate parameter. The separation of the ROI mask and the rate parameter allows an intuitive way to achieve variable-rate and ROI coding simultaneously. Extensive experiments validate the effectiveness of our proposed method and confirm its superiority over the other competing methods.Comment: Accepted to IEEE ICIP 202

arXiv.org e-Print Archive

TransTIC: Transferring Transformer-based Image Compression from Human Visualization to Machine Perception

Author: Chen Yi-Hsin
Chien Cheng
Chiu Wei-Chen
Kao Chia-Hao
Peng Wen-Hsiao
Weng Ying-Chieh
Publication venue
Publication date: 08/06/2023
Field of study

This work aims for transferring a Transformer-based image compression codec from human vision to machine perception without fine-tuning the codec. We propose a transferable Transformer-based image compression framework, termed TransTIC. Inspired by visual prompt tuning, we propose an instance-specific prompt generator to inject instance-specific prompts to the encoder and task-specific prompts to the decoder. Extensive experiments show that our proposed method is capable of transferring the codec to various machine tasks and outshining the competing methods significantly. To our best knowledge, this work is the first attempt to utilize prompting on the low-level image compression task

arXiv.org e-Print Archive

Evaluating Self-supervised Speech Models on a Taiwanese Hokkien Corpus

Author: Bi Alice Wen-Hsin
Chang Kalvin
Chang Winnie
Chen Bryan Y.
Chen Noel
Chiang Jo-Peng
Chou Yi-Hui
Cui Chenxuan
Ou Winston
Pai Rong-Wei
Phoann Iu-Tshian
Shi Jiatong
Wu Meng-Ju
Yang Carol
Yeh Po-Yen
Publication venue
Publication date: 05/12/2023
Field of study

Taiwanese Hokkien is declining in use and status due to a language shift towards Mandarin in Taiwan. This is partly why it is a low resource language in NLP and speech research today. To ensure that the state of the art in speech processing does not leave Taiwanese Hokkien behind, we contribute a 1.5-hour dataset of Taiwanese Hokkien to ML-SUPERB's hidden set. Evaluating ML-SUPERB's suite of self-supervised learning (SSL) speech representations on our dataset, we find that model size does not consistently determine performance. In fact, certain smaller models outperform larger ones. Furthermore, linguistic alignment between pretraining data and the target language plays a crucial role.Comment: Accepted to ASRU 202

arXiv.org e-Print Archive