Search CORE

73 research outputs found

Multi-Speaker Multi-Lingual VQTTS System for LIMMITS 2023 Challenge

Author: Du Chenpeng
Guo Yiwei
Shen Feiyu
Yu Kai
Publication venue
Publication date: 25/04/2023
Field of study

In this paper, we describe the systems developed by the SJTU X-LANCE team for LIMMITS 2023 Challenge, and we mainly focus on the winning system on naturalness for track 1. The aim of this challenge is to build a multi-speaker multi-lingual text-to-speech (TTS) system for Marathi, Hindi and Telugu. Each of the languages has a male and a female speaker in the given dataset. In track 1, only 5 hours data from each speaker can be selected to train the TTS model. Our system is based on the recently proposed VQTTS that utilizes VQ acoustic feature rather than mel-spectrogram. We introduce additional speaker embeddings and language embeddings to VQTTS for controlling the speaker and language information. In the cross-lingual evaluations where we need to synthesize speech in a cross-lingual speaker's voice, we provide a native speaker's embedding to the acoustic model and the target speaker's embedding to the vocoder. In the subjective MOS listening test on naturalness, our system achieves 4.77 which ranks first.Comment: Accepted by ICASSP 2023 Special Session for Grand Challenge

arXiv.org e-Print Archive

Acoustic BPE for Speech Generation with Discrete Tokens

Author: Chen Xie
Du Chenpeng
Guo Yiwei
Shen Feiyu
Yu Kai
Publication venue
Publication date: 15/01/2024
Field of study

Discrete audio tokens derived from self-supervised learning models have gained widespread usage in speech generation. However, current practice of directly utilizing audio tokens poses challenges for sequence modeling due to the length of the token sequence. Additionally, this approach places the burden on the model to establish correlations between tokens, further complicating the modeling process. To address this issue, we propose acoustic BPE which encodes frequent audio token patterns by utilizing byte-pair encoding. Acoustic BPE effectively reduces the sequence length and leverages the prior morphological information present in token sequence, which alleviates the modeling challenges of token correlation. Through comprehensive investigations on a speech language model trained with acoustic BPE, we confirm the notable advantages it offers, including faster inference and improved syntax capturing capabilities. In addition, we propose a novel rescore method to select the optimal synthetic speech among multiple candidates generated by rich-diversity TTS system. Experiments prove that rescore selection aligns closely with human preference, which highlights acoustic BPE's potential to other speech generation tasks.Comment: 5 pages, 2 figures; accepted to ICASSP 202

arXiv.org e-Print Archive

Effects of Climate Change and Human Activities on Surface Runoff in the Luan River Basin

Author: Chesheng Zhan
Feiyu Wang
Fubao Sun
Hong Du
Sidong Zeng
Publication venue: 'Hindawi Limited'
Publication date: 01/01/2015
Field of study

Quantifying the effects of climate change and human activities on runoff changes is the focus of climate change and hydrological research. This paper presents an integrated method employing the Budyko-based Fu model, hydrological modeling, and climate elasticity approaches to separate the effects of the two driving factors on surface runoff in the Luan River basin, China. The Budyko-based Fu model and the double mass curve method are used to analyze runoff changes during the period 1958~2009. Then two types of hydrological models (the distributed Soil and Water Assessment Tool model and the lumped SIMHYD model) and seven climate elasticity methods (including a nonparametric method and six Budyko-based methods) are applied to estimate the contributions of climate change and human activities to runoff change. The results show that all quantification methods are effective, and the results obtained by the nine methods are generally consistent. During the study period, the effects of climate change on runoff change accounted for 28.3~46.8% while those of human activities contributed with 53.2~71.7%, indicating that both factors have significant effects on the runoff decline in the basin, and that the effects of human activities are relatively stronger than those of climate change

Crossref

Directory of Open Access Journals

Research on The Offensive Characteristics of La Liga Team Based on Social Network Analysis

Author: Du Wei
Li Feiyu
Li Jingwen
Publication venue: 'EDP Sciences'
Publication date: 01/01/2023
Field of study

To explore the difference of social network parameters between the network of passing before scoring and the network of passing before missing the goal, and to explore the correlation between social network parameters and team performance, this paper establishes the offensive pass network of 20 teams in the La Liga from 2017 to 2018, and 11 social network parameters are calculated. The Pearson correlation test is used to explore the linear correlation between 11 social network parameters and team performance. The results show that the linear correlation between the network parameters of passing before scoring and team performance is stronger than the network parameters of passing before missing the goal. According to the results, we can provide reliable and effective information to the football coaches to help improve the performance of football matches

Directory of Open Access Journals

Rapid detection of sulfamethoxazole in plasma and food samples with in-syringe membrane SPE coupled with solid-phase fluorescence spectrometry

Author: Du Yiping
Iqbal Jibran
Li Hui
Li Long
Wu Ting
Zhang Feiyu
Zhu Ying
Publication venue: 'Elsevier BV'
Publication date: 01/08/2020
Field of study

© 2020 Elsevier Ltd In this work, in-syringe membrane solid-phase extraction (MSPE) device was fabricated for the on-site sampling of sulfamethoxazole (SMX) in food samples followed by solid-phase fluorescence spectra analysis. The samples and fluorescamine (FA) were added to a syringe for derivation. Then, the derivative of SMX was extracted by a membrane in the syringe SPE device. Subsequently, the derivative on the membrane was measured immediately without additional elution procedure. The method was successfully applied in plasma, milk, and egg samples for the trace SMX detection, with the recovery of 98%–102%, RSDs from 1% to 6%. Compared with liquid chromatography, direct detection of the concentrated analyte significantly improved the sensitivity. Moreover, fluorescamine made it unnecessary to separate SMX from the interference. Consequently, it was a time-saving, low-cost, and easy-operation method, which demonstrated the potential of in-syringe SPE as a promising candidate for on-site analysis

ZU Scholars (Zayed University)

Crossref

Towards Universal Speech Discrete Tokens: A Case Study for ASR and TTS

Author: Chen Xie
Du Chenpeng
Ma Ziyang
Povey Daniel
Shen Feiyu
Yang Yifan
Yu Kai
Publication venue
Publication date: 13/09/2023
Field of study

Self-supervised learning (SSL) proficiency in speech-related tasks has driven research into utilizing discrete tokens for speech tasks like recognition and translation, which offer lower storage requirements and great potential to employ natural language processing techniques. However, these studies, mainly single-task focused, faced challenges like overfitting and performance degradation in speech recognition tasks, often at the cost of sacrificing performance in multi-task scenarios. This study presents a comprehensive comparison and optimization of discrete tokens generated by various leading SSL models in speech recognition and synthesis tasks. We aim to explore the universality of speech discrete tokens across multiple speech tasks. Experimental results demonstrate that discrete tokens achieve comparable results against systems trained on FBank features in speech recognition tasks and outperform mel-spectrogram features in speech synthesis in subjective and objective metrics. These findings suggest that universal discrete tokens have enormous potential in various speech-related tasks. Our work is open-source and publicly available to facilitate research in this direction

arXiv.org e-Print Archive

Chitosan/Al\u3csub\u3e2\u3c/sub\u3eO\u3csub\u3e3\u3c/sub\u3e-HA nanocomposite beads for efficient removal of estradiol and chrysoidin from aqueous solution

Author: Chen Wanchao
Du Yiping
Iqbal Jibran
Li Long
Wang Fang
Wu Ting
Zhang Feiyu
Zhu Ying
Publication venue: 'Elsevier BV'
Publication date: 15/02/2020
Field of study

© 2019 Elsevier B.V. Alumina, as a support material, was loaded together with chitosan and hydroxyapatite to form chitosan/Al2O3-HA composite beads and was used for estradiol and chrysoidin removal from aqueous solution in the present work. The physicochemical properties of the beads were studied with Scanning Electron Microscopy (SEM), Fourier Transform Infrared Spectrometry (FTIR), thermogravimetric analysis (TGA) and Brunauer-Emmett-Teller (BET) surface area analysis. FTIR spectra confirmed that the chitosan was loaded successfully on Al2O3-HA, and functional groups were immobilized onto the surface of the beads after the synthesis. The adsorption condition including pH, the amount of adsorbent, initial concentration and time were evaluated during the batch experiments. Isotherm data best matched the Langmuir model and the pseudo-second-order model best described the adsorption kinetics. The maximum adsorption capacity was found to be 39.78 mg/g and 23.26 mg/g for estradiol and chrysoidine, respectively. The adsorbed estradiol and chrysoidin were completely eluted from the composite beads with the eluent of 0.1 M H2SO4/MeOH and the regenerated material was used in several cycles without deterioration in its initial performances. This study suggests that the developed composite beads have high potential for the efficient removal estradiol and chrysoidin from aqueous solution

ZU Scholars (Zayed University)

UniCATS: A Unified Context-Aware Text-to-Speech Framework with Contextual VQ-Diffusion and Vocoding

Author: Chen Xie
Du Chenpeng
Guo Yiwei
Liang Zheng
Liu Zhijun
Shen Feiyu
Wang Shuai
Yu Kai
Zhang Hui
Publication venue
Publication date: 13/06/2023
Field of study

The utilization of discrete speech tokens, divided into semantic tokens and acoustic tokens, has been proven superior to traditional acoustic feature mel-spectrograms in terms of naturalness and robustness for text-to-speech (TTS) synthesis. Recent popular models, such as VALL-E and SPEAR-TTS, allow zero-shot speaker adaptation through auto-regressive (AR) continuation of acoustic tokens extracted from a short speech prompt. However, these AR models are restricted to generate speech only in a left-to-right direction, making them unsuitable for speech editing where both preceding and following contexts are provided. Furthermore, these models rely on acoustic tokens, which have audio quality limitations imposed by the performance of audio codec models. In this study, we propose a unified context-aware TTS framework called UniCATS, which is capable of both speech continuation and editing. UniCATS comprises two components, an acoustic model CTX-txt2vec and a vocoder CTX-vec2wav. CTX-txt2vec employs contextual VQ-diffusion to predict semantic tokens from the input text, enabling it to incorporate the semantic context and maintain seamless concatenation with the surrounding context. Following that, CTX-vec2wav utilizes contextual vocoding to convert these semantic tokens into waveforms, taking into consideration the acoustic context. Our experimental results demonstrate that CTX-vec2wav outperforms HifiGAN and AudioLM in terms of speech resynthesis from semantic tokens. Moreover, we show that UniCATS achieves state-of-the-art performance in both speech continuation and editing

arXiv.org e-Print Archive

Genome sequencing as an alternative to cytogenetic analysis in myeloid cancers

Author: Baty Jack D.
Bohannon Andrew
Christopher Matthew J.
DiPersio John F.
Du Feiyu
Duncavage Eric J.
Fulton Robert S.
Garza John
Heath Sharon E.
Hughes Andrew E.O.
Hughes Emma
Jacoby Meagan A.
Kruchowski Scott
Ley Timothy J.
Link Daniel C.
MacMillan Sandra
Miller Christopher A.
Neidich Julie
O\u27Laughlin Michele
Payton Jacqueline E.
Robinson Josh
Schroeder Molly C.
Spencer David H.
Uy Geoffrey L.
Walter Matthew J.
Westervelt Peter
Wilson Roxanne
Publication venue: Digital Commons@Becker
Publication date: 01/01/2021
Field of study

Digital Commons@Becker