200 research outputs found

    An Enhanced Drought-Tolerant Method Using SA-Loaded PAMPS Polymer Materials Applied on Tobacco Pelleted Seeds

    Get PDF
    Drought is one of the most important stress factors limiting the seed industry and crop production. Present study was undertaken to create novel drought-resistant pelleted seeds using the combined materials with superabsorbent polymer, poly(2-acrylamide-2-methyl propane sulfonic acid) (PAMPS) hydrogel, and drought resistance agent, salicylic acid (SA). The optimized PAMPS hydrogel was obtained as the molar ratio of 2-acrylamido-2-methyl-propanesulfonic acid (AMPS) to potassium peroxydisulfate (KPS) and N, N′-methylene-bis-acrylamide (MBA) was 1 : 0.00046 : 0.00134. The hydrogel weight after swelling in deionized water for 24 h reached 4306 times its own dry weight. The water retention ratio (RR) of PAMPS was significantly higher as compared with the control. It could keep as high as 85.3% of original weight after 30 min at 110°C; even at 25°C for 40 d, the PAMPS still kept RR at 33.67%. PAMPS disintegration ratio increased gradually and reached around 30% after embedding in soil or activated sludge for 60 d. In addition, there were better seed germination performance and seedling growth in the pelleted treatments with SA-loaded PAMPS hydrogel under drought stress than control. It suggested that SA-loaded PAMPS hydrogel, a nontoxic superabsorbent polymer, could be used as an effective drought resistance material applied to tobacco pelleted seeds

    Learning Speech Representation From Contrastive Token-Acoustic Pretraining

    Full text link
    For fine-grained generation and recognition tasks such as minimally-supervised text-to-speech (TTS), voice conversion (VC), and automatic speech recognition (ASR), the intermediate representations extracted from speech should serve as a "bridge" between text and acoustic information, containing information from both modalities. The semantic content is emphasized, while the paralinguistic information such as speaker identity and acoustic details should be de-emphasized. However, existing methods for extracting fine-grained intermediate representations from speech suffer from issues of excessive redundancy and dimension explosion. Contrastive learning is a good method for modeling intermediate representations from two modalities. However, existing contrastive learning methods in the audio field focus on extracting global descriptive information for downstream audio classification tasks, making them unsuitable for TTS, VC, and ASR tasks. To address these issues, we propose a method named "Contrastive Token-Acoustic Pretraining (CTAP)", which uses two encoders to bring phoneme and speech into a joint multimodal space, learning how to connect phoneme and speech at the frame level. The CTAP model is trained on 210k speech and phoneme text pairs, achieving minimally-supervised TTS, VC, and ASR. The proposed CTAP method offers a promising solution for fine-grained generation and recognition downstream tasks in speech processing

    High-Fidelity Speech Synthesis with Minimal Supervision: All Using Diffusion Models

    Full text link
    Text-to-speech (TTS) methods have shown promising results in voice cloning, but they require a large number of labeled text-speech pairs. Minimally-supervised speech synthesis decouples TTS by combining two types of discrete speech representations(semantic \& acoustic) and using two sequence-to-sequence tasks to enable training with minimal supervision. However, existing methods suffer from information redundancy and dimension explosion in semantic representation, and high-frequency waveform distortion in discrete acoustic representation. Autoregressive frameworks exhibit typical instability and uncontrollability issues. And non-autoregressive frameworks suffer from prosodic averaging caused by duration prediction models. To address these issues, we propose a minimally-supervised high-fidelity speech synthesis method, where all modules are constructed based on the diffusion models. The non-autoregressive framework enhances controllability, and the duration diffusion model enables diversified prosodic expression. Contrastive Token-Acoustic Pretraining (CTAP) is used as an intermediate semantic representation to solve the problems of information redundancy and dimension explosion in existing semantic coding methods. Mel-spectrogram is used as the acoustic representation. Both semantic and acoustic representations are predicted by continuous variable regression tasks to solve the problem of high-frequency fine-grained waveform distortion. Experimental results show that our proposed method outperforms the baseline method. We provide audio samples on our website.Comment: Accepted by ICASSP 2024. arXiv admin note: substantial text overlap with arXiv:2307.15484; text overlap with arXiv:2309.0042

    Semi-supervised Cardiac Image Segmentation via Label Propagation and Style Transfer

    Full text link
    Accurate segmentation of cardiac structures can assist doctors to diagnose diseases, and to improve treatment planning, which is highly demanded in the clinical practice. However, the shortage of annotation and the variance of the data among different vendors and medical centers restrict the performance of advanced deep learning methods. In this work, we present a fully automatic method to segment cardiac structures including the left (LV) and right ventricle (RV) blood pools, as well as for the left ventricular myocardium (MYO) in MRI volumes. Specifically, we design a semi-supervised learning method to leverage unlabelled MRI sequence timeframes by label propagation. Then we exploit style transfer to reduce the variance among different centers and vendors for more robust cardiac image segmentation. We evaluate our method in the M&Ms challenge 7 , ranking 2nd place among 14 competitive teams
    • …
    corecore