96 research outputs found
Transcriptome and Comparative Gene Expression Analysis of Sogatella furcifera (Horváth) in Response to Southern Rice Black-Streaked Dwarf Virus
BACKGROUND: The white backed planthopper (WBPH), Sogatella furcifera (Horváth), causes great damage to many crops by direct feeding or transmitting plant viruses. Southern rice black-streaked dwarf virus (SRBSDV), transmitted by WBPH, has become a great threat to rice production in East Asia. METHODOLOGY/PRINCIPAL FINDINGS: By de novo transcriptome assembling and massive parallel pyrosequencing, we constructed two transcriptomes of WBPH and profiled the alternation of gene expression in response to SRBSDV infection in transcriptional level. Over 25 million reads of high-quality DNA sequences and 81388 different unigenes were generated using Illumina technology from both viruliferous and non-viruliferous WBPH. WBPH has a very similar gene ontological distribution to other two closely related rice planthoppers, Nilaparvata lugens and Laodelphax striatellus. 7291 microsatellite loci were also predicted which could be useful for further evolutionary analysis. Furthermore, comparative analysis of the two transcriptomes generated from viruliferous and non-viruliferous WBPH provided a list of candidate transcripts that potentially were elicited as a response to viral infection. Pathway analyses of a subset of these transcripts indicated that SRBSDV infection may perturb primary metabolism and the ubiquitin-proteasome pathways. In addition, 5.5% (181 out of 3315) of the genes in cell cytoskeleton organization pathway showed obvious changes. Our data also demonstrated that SRBSDV infection activated the immunity regulatory systems of WBPH, such as RNA interference, autophagy and antimicrobial peptide production. CONCLUSIONS/SIGNIFICANCE: We employed massively parallel pyrosequencing to collect ESTs from viruliferous and non-viruliferous samples of WBPH. 81388 different unigenes have been obtained. We for the first time described the direct effects of a Reoviridae family plant virus on global gene expression profiles of its insect vector using high-throughput sequencing. Our study will provide a road map for future investigations of the fascinating interactions between Reoviridae viruses and their insect vectors, and provide new strategies for crop protection
A simplified multi-model statistical approach for predicting the effects of forest management on land surface temperature in Fennoscandia
Forests interact with the local climate through a variety of biophysical mechanisms. Observational and modelling studies have investigated the effects of forested vs. non-forested areas, but the influence of forest management on surface temperature has received far less attention owing to the inherent challenges to adapt climate models to cope with forest dynamics. Further, climate models are complex and highly parameterized, and the time and resource intensity of their use limit applications. The availability of simple yet reliable statistical models based on high resolution maps of forest attributes representative of different development stages can link individual forest management practices to local temperature changes, and ultimately support the design of improved strategies. In this study, we investigate how forest management influences local surface temperature (LSTs) in Fennoscandia through a set of machine learning algorithms. We find that more developed forests are typically associated with higher LST than young or undeveloped forests. The mean multi-model estimates from our statistical system can accurately reproduce the observed LST. Relative to the present state of Fennoscandian forests, fully develop forests are found to induce an annual mean warming of 0.26 °C (0.03/0.69 °C as 5th/95th percentile), and an average cooling effect in the summer daytime from -0.85 to -0.23 °C (depending on the model). On the contrary, a scenario with undeveloped forests induces an annual average cooling of -0.29 °C (-0.61/-0.01 °C), but daytime warming in the summer that can be higher than 1 °C. A weak annual mean cooling of -0.01 °C is attributed to forest harvest from 2015 to 2018, with an increased daytime temperature in summer of about 0.04 °C. Overall, this approach is a flexible option to study effects of forest management on LST that can be applied at various scales and for alternative management scenarios, thereby helping to improve local management strategies with consideration of effects on local climate
Leveraging Pre-trained AudioLDM for Text to Sound Generation: A Benchmark Study
Deep neural networks have recently achieved breakthroughs in sound generation
with text prompts. Despite their promising performance, current text-to-sound
generation models face issues on small-scale datasets (e.g., overfitting),
significantly limiting their performance. In this paper, we investigate the use
of pre-trained AudioLDM, the state-of-the-art model for text-to-audio
generation, as the backbone for sound generation. Our study demonstrates the
advantages of using pre-trained models for text-to-sound generation, especially
in data-scarcity scenarios. In addition, experiments show that different
training strategies (e.g., training conditions) may affect the performance of
AudioLDM on datasets of different scales. To facilitate future studies, we also
evaluate various text-to-sound generation systems on several frequently used
datasets under the same evaluation protocols, which allow fair comparisons and
benchmarking of these methods on the common ground.Comment: EUSIPCO 202
Text-Driven Foley Sound Generation With Latent Diffusion Model
Foley sound generation aims to synthesise the background sound for multimedia
content. Previous models usually employ a large development set with labels as
input (e.g., single numbers or one-hot vector). In this work, we propose a
diffusion model based system for Foley sound generation with text conditions.
To alleviate the data scarcity issue, our model is initially pre-trained with
large-scale datasets and fine-tuned to this task via transfer learning using
the contrastive language-audio pertaining (CLAP) technique. We have observed
that the feature embedding extracted by the text encoder can significantly
affect the performance of the generation model. Hence, we introduce a trainable
layer after the encoder to improve the text embedding produced by the encoder.
In addition, we further refine the generated waveform by generating multiple
candidate audio clips simultaneously and selecting the best one, which is
determined in terms of the similarity score between the embedding of the
candidate clips and the embedding of the target text label. Using the proposed
method, our system ranks among the systems submitted to DCASE
Challenge 2023 Task 7. The results of the ablation studies illustrate that the
proposed techniques significantly improve sound generation performance. The
codes for implementing the proposed system are available online.Comment: Submit to DCASE-workshop 2023. arXiv admin note: text overlap with
arXiv:2305.1590
AudioLDM: Text-to-Audio Generation with Latent Diffusion Models
Text-to-audio (TTA) system has recently gained attention for its ability to
synthesize general audio based on text descriptions. However, previous studies
in TTA have limited generation quality with high computational costs. In this
study, we propose AudioLDM, a TTA system that is built on a latent space to
learn the continuous audio representations from contrastive language-audio
pretraining (CLAP) latents. The pretrained CLAP models enable us to train LDMs
with audio embedding while providing text embedding as a condition during
sampling. By learning the latent representations of audio signals and their
compositions without modeling the cross-modal relationship, AudioLDM is
advantageous in both generation quality and computational efficiency. Trained
on AudioCaps with a single GPU, AudioLDM achieves state-of-the-art TTA
performance measured by both objective and subjective metrics (e.g., frechet
distance). Moreover, AudioLDM is the first TTA system that enables various
text-guided audio manipulations (e.g., style transfer) in a zero-shot fashion.
Our implementation and demos are available at https://audioldm.github.io.Comment: Accepted by ICML 2023. Demo and implementation at
https://audioldm.github.io. Evaluation toolbox at
https://github.com/haoheliu/audioldm_eva
Sparks of Large Audio Models: A Survey and Outlook
This survey paper provides a comprehensive overview of the recent
advancements and challenges in applying large language models to the field of
audio signal processing. Audio processing, with its diverse signal
representations and a wide range of sources--from human voices to musical
instruments and environmental sounds--poses challenges distinct from those
found in traditional Natural Language Processing scenarios. Nevertheless,
\textit{Large Audio Models}, epitomized by transformer-based architectures,
have shown marked efficacy in this sphere. By leveraging massive amount of
data, these models have demonstrated prowess in a variety of audio tasks,
spanning from Automatic Speech Recognition and Text-To-Speech to Music
Generation, among others. Notably, recently these Foundational Audio Models,
like SeamlessM4T, have started showing abilities to act as universal
translators, supporting multiple speech tasks for up to 100 languages without
any reliance on separate task-specific systems. This paper presents an in-depth
analysis of state-of-the-art methodologies regarding \textit{Foundational Large
Audio Models}, their performance benchmarks, and their applicability to
real-world scenarios. We also highlight current limitations and provide
insights into potential future research directions in the realm of
\textit{Large Audio Models} with the intent to spark further discussion,
thereby fostering innovation in the next generation of audio-processing
systems. Furthermore, to cope with the rapid development in this area, we will
consistently update the relevant repository with relevant recent articles and
their open-source implementations at
https://github.com/EmulationAI/awesome-large-audio-models.Comment: work in progress, Repo URL:
https://github.com/EmulationAI/awesome-large-audio-model
AudioLDM 2: Learning Holistic Audio Generation with Self-supervised Pretraining
Although audio generation shares commonalities across different types of
audio, such as speech, music, and sound effects, designing models for each type
requires careful consideration of specific objectives and biases that can
significantly differ from those of other types. To bring us closer to a unified
perspective of audio generation, this paper proposes a framework that utilizes
the same learning method for speech, music, and sound effect generation. Our
framework introduces a general representation of audio, called "language of
audio" (LOA). Any audio can be translated into LOA based on AudioMAE, a
self-supervised pre-trained representation learning model. In the generation
process, we translate any modalities into LOA by using a GPT-2 model, and we
perform self-supervised audio generation learning with a latent diffusion model
conditioned on LOA. The proposed framework naturally brings advantages such as
in-context learning abilities and reusable self-supervised pretrained AudioMAE
and latent diffusion models. Experiments on the major benchmarks of
text-to-audio, text-to-music, and text-to-speech demonstrate state-of-the-art
or competitive performance against previous approaches. Our code, pretrained
model, and demo are available at https://audioldm.github.io/audioldm2.Comment: AudioLDM 2 project page is https://audioldm.github.io/audioldm
WavJourney: Compositional Audio Creation with Large Language Models
Large Language Models (LLMs) have shown great promise in integrating diverse
expert models to tackle intricate language and vision tasks. Despite their
significance in advancing the field of Artificial Intelligence Generated
Content (AIGC), their potential in intelligent audio content creation remains
unexplored. In this work, we tackle the problem of creating audio content with
storylines encompassing speech, music, and sound effects, guided by text
instructions. We present WavJourney, a system that leverages LLMs to connect
various audio models for audio content generation. Given a text description of
an auditory scene, WavJourney first prompts LLMs to generate a structured
script dedicated to audio storytelling. The audio script incorporates diverse
audio elements, organized based on their spatio-temporal relationships. As a
conceptual representation of audio, the audio script provides an interactive
and interpretable rationale for human engagement. Afterward, the audio script
is fed into a script compiler, converting it into a computer program. Each line
of the program calls a task-specific audio generation model or computational
operation function (e.g., concatenate, mix). The computer program is then
executed to obtain an explainable solution for audio generation. We demonstrate
the practicality of WavJourney across diverse real-world scenarios, including
science fiction, education, and radio play. The explainable and interactive
design of WavJourney fosters human-machine co-creation in multi-round
dialogues, enhancing creative control and adaptability in audio production.
WavJourney audiolizes the human imagination, opening up new avenues for
creativity in multimedia content creation.Comment: Project Page: https://audio-agi.github.io/WavJourney_demopage
- …