35 research outputs found

    Rep2wav: Noise Robust text-to-speech Using self-supervised representations

    Full text link
    Benefiting from the development of deep learning, text-to-speech (TTS) techniques using clean speech have achieved significant performance improvements. The data collected from real scenes often contains noise and generally needs to be denoised by speech enhancement models. Noise-robust TTS models are often trained using the enhanced speech, which thus suffer from speech distortion and background noise that affect the quality of the synthesized speech. Meanwhile, it was shown that self-supervised pre-trained models exhibit excellent noise robustness on many speech tasks, implying that the learned representation has a better tolerance for noise perturbations. In this work, we therefore explore pre-trained models to improve the noise robustness of TTS models. Based on HiFi-GAN, we first propose a representation-to-waveform vocoder, which aims to learn to map the representation of pre-trained models to the waveform. We then propose a text-to-representation FastSpeech2 model, which aims to learn to map text to pre-trained model representations. Experimental results on the LJSpeech and LibriTTS datasets show that our method outperforms those using speech enhancement methods in both subjective and objective metrics. Audio samples are available at: https://zqs01.github.io/rep2wav.Comment: 5 pages,2 figure

    VATLM: Visual-Audio-Text Pre-Training with Unified Masked Prediction for Speech Representation Learning

    Full text link
    Although speech is a simple and effective way for humans to communicate with the outside world, a more realistic speech interaction contains multimodal information, e.g., vision, text. How to design a unified framework to integrate different modal information and leverage different resources (e.g., visual-audio pairs, audio-text pairs, unlabeled speech, and unlabeled text) to facilitate speech representation learning was not well explored. In this paper, we propose a unified cross-modal representation learning framework VATLM (Visual-Audio-Text Language Model). The proposed VATLM employs a unified backbone network to model the modality-independent information and utilizes three simple modality-dependent modules to preprocess visual, speech, and text inputs. In order to integrate these three modalities into one shared semantic space, VATLM is optimized with a masked prediction task of unified tokens, given by our proposed unified tokenizer. We evaluate the pre-trained VATLM on audio-visual related downstream tasks, including audio-visual speech recognition (AVSR), visual speech recognition (VSR) tasks. Results show that the proposed VATLM outperforms previous the state-of-the-art models, such as audio-visual pre-trained AV-HuBERT model, and analysis also demonstrates that VATLM is capable of aligning different modalities into the same space. To facilitate future research, we release the code and pre-trained models at https://aka.ms/vatlm.Comment: 10 page

    In situ Observation of Sodium Dendrite Growth and Concurrent Mechanical Property Measurements Using an Environmental Transmission Electron Microscopy–Atomic Force Microscopy (ETEM-AFM) Platform

    Get PDF
    Akin to Li, Na deposits in a dendritic form to cause a short circuit in Na metal batteries. However, the growth mechanisms and related mechanical properties of Na dendrites remain largely unknown. Here we report real-time characterizations of Na dendrite growth with concurrent mechanical property measurements using an environmental transmission electron microscopy–atomic force microscopy (ETEM-AFM) platform. In situ electrochemical plating produces Na deposits stabilized with a thin Na2CO3 surface layer (referred to as Na dendrites). These Na dendrites have characteristic dimensions of a few hundred nanometers and exhibit different morphologies, including nanorods, polyhedral nanocrystals, and nanospheres. In situ mechanical measurements show that the compressive and tensile strengths of Na dendrites with a Na2CO3 surface layer vary from 36 to >203 MPa, which are much larger than those of bulk Na. In situ growth of Na dendrites under the combined overpotential and mechanical confinement can generate high stress in these Na deposits. These results provide new baseline data on the electrochemical and mechanical behavior of Na dendrites, which have implications for the development of Na metal batteries toward practical energy-storage applications

    In Situ Measurements of the Mechanical Properties of Electrochemically Deposited Li₂CO₃ and Li₂O Nanorods

    Get PDF
    Solid-electrolyte interface (SEI) is “the most important but least understood (component) in rechargeable Li-ion batteries”. The ideal SEI requires high elastic strength and can resist the penetration of a Li dendrite mechanically, which is vital for inhibiting the dendrite growth in lithium batteries. Even though Li2_{2}CO3_{3} and Li2_{2}O are identified as the major components of SEI, their mechanical properties are not well understood. Herein, SEI-related materials such as Li2_{2}CO3_{3} and Li2_{2}O were electrochemically deposited using an environmental transmission electron microscopy (ETEM), and their mechanical properties were assessed by in situ atomic force microscopy (AFM) and inverse finite element simulations. Both Li2_{2}CO3_{3} and Li2_{2}O exhibit nanocrystalline structures and good plasticity. The ultimate strength of Li2_{2}CO3_{3} ranges from 192 to 330 MPa, while that of Li2_{2}O is less than 100 MPa. These results provide a new understanding of the SEI and its related dendritic problems in lithium batteries

    DPHL: A DIA Pan-human Protein Mass Spectrometry Library for Robust Biomarker Discovery

    Get PDF
    To address the increasing need for detecting and validating protein biomarkers in clinical specimens, mass spectrometry (MS)-based targeted proteomic techniques, including the selected reaction monitoring (SRM), parallel reaction monitoring (PRM), and massively parallel data-independent acquisition (DIA), have been developed. For optimal performance, they require the fragment ion spectra of targeted peptides as prior knowledge. In this report, we describe a MS pipeline and spectral resource to support targeted proteomics studies for human tissue samples. To build the spectral resource, we integrated common open-source MS computational tools to assemble a freely accessible computational workflow based on Docker. We then applied the workflow to generate DPHL, a comprehensive DIA pan-human library, from 1096 data-dependent acquisition (DDA) MS raw files for 16 types of cancer samples. This extensive spectral resource was then applied to a proteomic study of 17 prostate cancer (PCa) patients. Thereafter, PRM validation was applied to a larger study of 57 PCa patients and the differential expression of three proteins in prostate tumor was validated. As a second application, the DPHL spectral resource was applied to a study consisting of plasma samples from 19 diffuse large B cell lymphoma (DLBCL) patients and 18 healthy control subjects. Differentially expressed proteins between DLBCL patients and healthy control subjects were detected by DIA-MS and confirmed by PRM. These data demonstrate that the DPHL supports DIA and PRM MS pipelines for robust protein biomarker discovery. DPHL is freely accessible at https://www.iprox.org/page/project.html?id=IPX0001400000

    Finishing the euchromatic sequence of the human genome

    Get PDF
    The sequence of the human genome encodes the genetic instructions for human physiology, as well as rich information about human evolution. In 2001, the International Human Genome Sequencing Consortium reported a draft sequence of the euchromatic portion of the human genome. Since then, the international collaboration has worked to convert this draft into a genome sequence with high accuracy and nearly complete coverage. Here, we report the result of this finishing process. The current genome sequence (Build 35) contains 2.85 billion nucleotides interrupted by only 341 gaps. It covers ∼99% of the euchromatic genome and is accurate to an error rate of ∼1 event per 100,000 bases. Many of the remaining euchromatic gaps are associated with segmental duplications and will require focused work with new methods. The near-complete sequence, the first for a vertebrate, greatly improves the precision of biological analyses of the human genome including studies of gene number, birth and death. Notably, the human enome seems to encode only 20,000-25,000 protein-coding genes. The genome sequence reported here should serve as a firm foundation for biomedical research in the decades ahead

    In-vivo full depth of eye imaging spectral domain optical coherence tomography

    No full text
    It is necessary to apply the spectral-domain optical coherence tomography (SD-OCT) to image the whole eye segment for practically iatrical application, but the imaging depth of SD-OCT is limited by the spectral resolution of the spectrometer. By now, no result about this research has been reported. In our study, a new dual channel dual focus OCT system is adopted to image the whole eye segment. The cornea and the crystalline lens are simultaneously imaged by using full range complex spectral-domain OCT in one channel, the retina is detected by the other. The new system was successfully tested in imaging of the volunteer' eye in vivo. The preliminary results presented in this paper demonstrated the feasibility of this approach. ? 2011 Copyright Society of Photo-Optical Instrumentation Engineers (SPIE).EI

    Dual band dual focus optical coherence tomography for imaging the whole eye segment

    No full text
    We developed an improved dual band dual focus spectral domain optical coherence tomography (SD-OCT) for in vivo 2D/3D imaging of the whole eye segment, including the whole anterior segment and retina. The system featured two OCT channels with two different bands centered at 840 nm and 1050 nm, which were designed to image the retina and the anterior segments of the eye, respectively. By combing the two probe light beams for co-axial scanning and separating them for focusing at different segments of the eye with a combination of three dichroic mirrors, we not only minimized the loss of the backscattered light from the sample but also improved the imaging depth, scan range and resolution. The full resolved complex (FRC) method was applied to double the imaging depth for the whole anterior segment imaging, with which an imaging depth of 36.71 mm in air was achieved. We demonstrated that this system was capable of measuring the dynamic changes of ocular dimensions, including the asphericity of the cornea and lens, during accommodation. (C) 2015 Optical Society of AmericaNational Basic Research Program of China [2011CB707504]; National Natural Science Foundation of China [81171377]SCI(E)[email protected]; [email protected]
    corecore