253 research outputs found

    PIAVE: A Pose-Invariant Audio-Visual Speaker Extraction Network

    Full text link
    It is common in everyday spoken communication that we look at the turning head of a talker to listen to his/her voice. Humans see the talker to listen better, so do machines. However, previous studies on audio-visual speaker extraction have not effectively handled the varying talking face. This paper studies how to take full advantage of the varying talking face. We propose a Pose-Invariant Audio-Visual Speaker Extraction Network (PIAVE) that incorporates an additional pose-invariant view to improve audio-visual speaker extraction. Specifically, we generate the pose-invariant view from each original pose orientation, which enables the model to receive a consistent frontal view of the talker regardless of his/her head pose, therefore, forming a multi-view visual input for the speaker. Experiments on the multi-view MEAD and in-the-wild LRS3 dataset demonstrate that PIAVE outperforms the state-of-the-art and is more robust to pose variations.Comment: Interspeech 202

    Controllable Accented Text-to-Speech Synthesis

    Full text link
    Accented text-to-speech (TTS) synthesis seeks to generate speech with an accent (L2) as a variant of the standard version (L1). Accented TTS synthesis is challenging as L2 is different from L1 in both in terms of phonetic rendering and prosody pattern. Furthermore, there is no easy solution to the control of the accent intensity in an utterance. In this work, we propose a neural TTS architecture, that allows us to control the accent and its intensity during inference. This is achieved through three novel mechanisms, 1) an accent variance adaptor to model the complex accent variance with three prosody controlling factors, namely pitch, energy and duration; 2) an accent intensity modeling strategy to quantify the accent intensity; 3) a consistency constraint module to encourage the TTS system to render the expected accent intensity at a fine level. Experiments show that the proposed system attains superior performance to the baseline models in terms of accent rendering and intensity control. To our best knowledge, this is the first study of accented TTS synthesis with explicit intensity control.Comment: To be submitted for possible journal publicatio

    FCTalker: Fine and Coarse Grained Context Modeling for Expressive Conversational Speech Synthesis

    Full text link
    Conversational Text-to-Speech (TTS) aims to synthesis an utterance with the right linguistic and affective prosody in a conversational context. The correlation between the current utterance and the dialogue history at the utterance level was used to improve the expressiveness of synthesized speech. However, the fine-grained information in the dialogue history at the word level also has an important impact on the prosodic expression of an utterance, which has not been well studied in the prior work. Therefore, we propose a novel expressive conversational TTS model, termed as FCTalker, that learn the fine and coarse grained context dependency at the same time during speech generation. Specifically, the FCTalker includes fine and coarse grained encoders to exploit the word and utterance-level context dependency. To model the word-level dependencies between an utterance and its dialogue history, the fine-grained dialogue encoder is built on top of a dialogue BERT model. The experimental results show that the proposed method outperforms all baselines and generates more expressive speech that is contextually appropriate. We release the source code at: https://github.com/walker-hyf/FCTalker.Comment: 5 pages, 4 figures, 1 table. Submitted to ICASSP 2023. We release the source code at: https://github.com/walker-hyf/FCTalke

    FluentEditor: Text-based Speech Editing by Considering Acoustic and Prosody Consistency

    Full text link
    Text-based speech editing (TSE) techniques are designed to enable users to edit the output audio by modifying the input text transcript instead of the audio itself. Despite much progress in neural network-based TSE techniques, the current techniques have focused on reducing the difference between the generated speech segment and the reference target in the editing region, ignoring its local and global fluency in the context and original utterance. To maintain the speech fluency, we propose a fluency speech editing model, termed \textit{FluentEditor}, by considering fluency-aware training criterion in the TSE training. Specifically, the \textit{acoustic consistency constraint} aims to smooth the transition between the edited region and its neighboring acoustic segments consistent with the ground truth, while the \textit{prosody consistency constraint} seeks to ensure that the prosody attributes within the edited regions remain consistent with the overall style of the original utterance. The subjective and objective experimental results on VCTK demonstrate that our \textit{FluentEditor} outperforms all advanced baselines in terms of naturalness and fluency. The audio samples and code are available at \url{https://github.com/Ai-S2-Lab/FluentEditor}.Comment: Submitted to ICASSP'202

    CI431, an Aqueous Compound from Ciona intestinalis L., Induces Apoptosis through a Mitochondria-Mediated Pathway in Human Hepatocellular Carcinoma Cells

    Get PDF
    In the present studies, a novel compound with potent anti-tumor activity from Ciona intestinalis L. was purified by acetone fractionation, ultrafiltration, gel chromatography and High Performance Liquid Chromatography. The molecular weight of the highly purified compound, designated CI431, was 431Da as determined by HPLC-MS analysis. CI431 exhibited significant cytotoxicity to several cancer cell types. However, only a slight inhibitory effect was found when treating the benign human liver cell line BEL-7702 with the compound. To explore its mechanism against hepatocellular carcinoma, BEL-7402 cells were treated with CI431 in vitro. We found that CI431 induced apoptotic death in BEL-7402 cells in a dose- and time-dependent manner. Cell cycle analysis demonstrated that CI431 caused cell cycle arrest at the G2/M phase, and a sub-G1 peak appeared after 24 h. The mitochondrial-mediated pathway was implicated in this CI431-induced apoptosis as evidenced by the disruption of mitochondrial membrane potential. The results suggest that the CI431 induces apoptosis in BEL-7402 human hepatoma cells by intrinsic mitochondrial pathway
    • …
    corecore