52 research outputs found

    TESSP: Text-Enhanced Self-Supervised Speech Pre-training

    Full text link
    Self-supervised speech pre-training empowers the model with the contextual structure inherent in the speech signal while self-supervised text pre-training empowers the model with linguistic information. Both of them are beneficial for downstream speech tasks such as ASR. However, the distinct pre-training objectives make it challenging to jointly optimize the speech and text representation in the same model. To solve this problem, we propose Text-Enhanced Self-Supervised Speech Pre-training (TESSP), aiming to incorporate the linguistic information into speech pre-training. Our model consists of three parts, i.e., a speech encoder, a text encoder and a shared encoder. The model takes unsupervised speech and text data as the input and leverages the common HuBERT and MLM losses respectively. We also propose phoneme up-sampling and representation swapping to enable joint modeling of the speech and text information. Specifically, to fix the length mismatching problem between speech and text data, we phonemize the text sequence and up-sample the phonemes with the alignment information extracted from a small set of supervised data. Moreover, to close the gap between the learned speech and text representations, we swap the text representation with the speech representation extracted by the respective private encoders according to the alignment information. Experiments on the Librispeech dataset shows the proposed TESSP model achieves more than 10% improvement compared with WavLM on the test-clean and test-other sets. We also evaluate our model on the SUPERB benchmark, showing our model has better performance on Phoneme Recognition, Acoustic Speech Recognition and Speech Translation compared with WavLM.Comment: 9 pages, 4 figure

    Multiplex genomic structure variation mediated by TALEN and ssODN

    Get PDF
    BACKGROUND: Genomic structure variation (GSV) is widely distributed in various organisms and is an important contributor to human diversity and disease susceptibility. Efficient approaches to induce targeted genomic structure variation are crucial for both analytic and therapeutic studies of GSV. Here, we presented an efficient strategy to induce targeted GSV including chromosomal deletions, duplications and inversions in a precise manner. RESULTS: Utilizing Transcription Activator-Like Effector Nucleases (TALEN) designed to target two distinct sites, we demonstrated targeted deletions, duplications and inversions of an 8.9 Mb chromosomal segment, which is about one third of the entire chromosome. We developed a novel method by combining TALEN-induced GSV and single stranded oligodeoxynucleotide (ssODN) mediated gene modifications to reduce unwanted mutations occurring during the targeted GSV using TALEN or Zinc finger nuclease (ZFN). Furthermore, we showed that co-introduction of TALEN and ssODN generated unwanted complex structure variation other than the expected chromosomal deletion. CONCLUSIONS: We demonstrated the ability of TALEN to induce targeted GSV and provided an efficient strategy to perform GSV precisely. Furthermore, it is the first time to show that co-introduction of TALEN and ssODN generated unwanted complex structure variation. It is plausible to believe that the strategies developed in this study can be applied to other organisms, and will help understand the biological roles of GSV and therapeutic applications of TALEN and ssODN. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/1471-2164-15-41) contains supplementary material, which is available to authorized users

    Genetic Code Expansion System for Tight Control of Gene Expression in Bombyx mori Cell Lines

    No full text
    Inducible gene expression systems are important tools for studying gene function and to control protein synthesis. With the completion of the detailed map of the silkworm (Bombyx mori) genome, the study of Bombyx mori has entered the post-genome era. While the functions of many genes have been described in detail, many coding genes remain unidentified. Except for the available tetracycline induction system, there is currently a dearth of other effective induction systems for B. mori. A genetic code expansion system can be used for protein labeling and to regulate gene expression. Here, we have established a genetic code expansion system for B. mori based on the well-researched tRNAPyl/PylRS pair from Methanosarcina mazei. We used H-Lys(Boc)-OH, which is a lysine derivative to efficiently and tightly control the expression of the reporter gene DsRed[TAG]EGFP (D[TAG]G), which encoded a H-Lys(Boc)-OH-bearing protein fused with DsRed and EGFP (here regarded as D[Boc]G) in B. mori cell lines BmE and BmNs. In D[TAG]G, the amber stop codon is recognized as the orthogonal tRNAPyl. Successful application of genetic code expansion system in silkworm cell lines will support the research into the function of silkworm genes and paves the way for the identification of new genes and protein markers in silkworm

    Overview and Evolution of Insect Fibroin Heavy Chain (FibH)

    No full text
    The FibH gene, crucial for silk spinning in insects, encodes a protein that significantly influences silk fiber mechanics. Due to its large size and repetitive sequences, limited known sequences of insect FibH impede comprehensive understanding. Here, we analyzed 114 complete FibH gene sequences from Lepidoptera (71 moths, 24 butterflies) and 13 Trichoptera, revealing single-copy FibH in most species, with 2–3 copies in Hesperinae and Heteropterinae (subfamily of skippers). All FibH genes are structured with two exons and one intron (39–45 bp), with the second exon being notably longer. Moths exhibit higher GC content in FibH compared to butterflies and Trichoptera. The FibH composition varies among species, with moths and butterflies favoring Ala, Gly, Ser, Pro, Gln, and Asn, while Trichoptera FibH is enriched in Gly, Ser, and Arg, and has less Ala. Unique to Trichoptera FibH are Tyr, Val, Arg, and Trp, whereas Lepidoptera FibH is marked by polyAla (polyalanine), polySer (polyserine), and the hexapeptide GAGSGA. A phylogenetic analysis suggests that Lepidoptera FibH evolved from Trichoptera, with skipper FibH evolving from Papilionoidea. This study substantially expands the FibH repertoire, providing a foundation for the development of artificial silk

    Precise Characterization of Bombyx mori Fibroin Heavy Chain Gene Using Cpf1-Based Enrichment and Oxford Nanopore Technologies

    No full text
    To study the evolution of gene function and a species, it is essential to characterize the tandem repetitive sequences distributed across the genome. Cas9-based enrichment combined with nanopore sequencing is an important technique for targeting repetitive sequences. Cpf1 has low molecular weight, low off-target efficiency, and the same editing efficiency as Cas9. There are numerous studies on enrichment sequencing using Cas9 combined with nanopore, while there are only a few studies on the enrichment sequencing of long and highly repetitive genes using Cpf1. We developed Cpf1-based enrichment combined with ONT sequencing (CEO) to characterize the B. mori FibH gene, which is composed of many repeat units with a long and GC-rich sequence up to 17 kb and is not easily amplified by means of a polymerase chain reaction (PCR). CEO has four steps: the dephosphorylation of genomic DNA, the Cpf1 targeted cleavage of FibH, adapter ligation, and ONT sequencing. Using CEO, we determined the fine structure of B. moriFibH, which is 16,845 bp long and includes 12 repetitive domains separated by amorphous regions. Except for the difference of three bases in the intron from the reference gene, the other sequences are identical. Surprisingly, many methylated CG sites were found and distributed unevenly on the FibH repeat unit. The CEO we established is an available means to depict highly repetitive genes, but also a supplement to the enrichment method based on Cas9

    CRISPR-Mediated Endogenous Activation of Fibroin Heavy Chain Gene Triggers Cellular Stress Responses in Bombyx mori Embryonic Cells

    No full text
    The silkworm Bombyx mori is an economically important insect, as it is the main producer of silk. Fibroin heavy chain (FibH) gene, encoding the core component of silk protein, is specifically and highly expressed in silk gland cells but not in the other cells. Although the silkworm FibH gene has been well studied in transcriptional regulation, its biological functions in the development of silk gland cells remain elusive. In this study, we constructed a CRISPRa system to activate the endogenous transcription of FibH in Bombyx mori embryonic (BmE) cells, and the mRNA expression of FibH was successfully activated. In addition, we found that FibH expression was increased to a maximum at 60 h after transient transfection of sgRNA/dCas9-VPR at a molar ratio of 9:1. The qRT-PCR analysis showed that the expression levels of cellular stress response-related genes were significantly up-regulated along with activated FibH gene. Moreover, the lyso-tracker red and monodansylcadaverine (MDC) staining assays revealed an apparent appearance of autophagy in FibH-activated BmE cells. Therefore, we conclude that the activation of FibH gene leads to up-regulation of cellular stress responses-related genes in BmE cells, which is essential for understanding silk gland development and the fibroin secretion process in B. mori

    Bacillus bombysepticus α-Toxin Binding to G Protein-Coupled Receptor Kinase 2 Regulates cAMP/PKA Signaling Pathway to Induce Host Death.

    No full text
    Bacterial pathogens and their toxins target host receptors, leading to aberrant behavior or host death by changing signaling events through subversion of host intracellular cAMP level. This is an efficient and widespread mechanism of microbial pathogenesis. Previous studies describe toxins that increase cAMP in host cells, resulting in death through G protein-coupled receptor (GPCR) signaling pathways by influencing adenylyl cyclase or G protein activity. G protein-coupled receptor kinase 2 (GRK2) has a central role in regulation of GPCR desensitization. However, little information is available about the pathogenic mechanisms of toxins associated with GRK2. Here, we reported a new bacterial toxin-Bacillus bombysepticus (Bb) α-toxin that was lethal to host. We showed that Bb α-toxin interacted with BmGRK2. The data demonstrated that Bb α-toxin directly bound to BmGRK2 to promote death by affecting GPCR signaling pathways. This mechanism involved stimulation of Gαs, increase level of cAMP and activation of protein kinase A (PKA). Activated cAMP/PKA signal transduction altered downstream effectors that affected homeostasis and fundamental biological processes, disturbing the structural and functional integrity of cells, resulting in death. Preventing cAMP/PKA signaling transduction by inhibitions (NF449 or H-89) substantially reduced the pathogenicity of Bb α-toxin. The discovery of a toxin-induced host death specifically linked to GRK2 mediated signaling pathway suggested a new model for bacterial toxin action. Characterization of host genes whose expression and function are regulated by Bb α-toxin and GRK2 will offer a deeper understanding of the pathogenesis of infectious diseases caused by pathogens that elevate cAMP

    Comparison of Long-Read Methods for Sequencing and Assembly of Lepidopteran Pest Genomes

    No full text
    Lepidopteran species are mostly pests, causing serious annual economic losses. High-quality genome sequencing and assembly uncover the genetic foundation of pest occurrence and provide guidance for pest control measures. Long-read sequencing technology and assembly algorithm advances have improved the ability to timeously produce high-quality genomes. Lepidoptera includes a wide variety of insects with high genetic diversity and heterozygosity. Therefore, the selection of an appropriate sequencing and assembly strategy to obtain high-quality genomic information is urgently needed. This research used silkworm as a model to test genome sequencing and assembly through high-coverage datasets by de novo assemblies. We report the first nearly complete telomere-to-telomere reference genome of silkworm Bombyx mori (P50T strain) produced by Pacific Biosciences (PacBio) HiFi sequencing, and highly contiguous and complete genome assemblies of two other silkworm strains by Oxford Nanopore Technologies (ONT) or PacBio continuous long-reads (CLR) that were unrepresented in the public database. Assembly quality was evaluated by use of BUSCO, Inspector, and EagleC. It is necessary to choose an appropriate assembler for draft genome construction, especially for low-depth datasets. For PacBio CLR and ONT sequencing, NextDenovo is superior. For PacBio HiFi sequencing, hifiasm is better. Quality assessment is essential for genome assembly and can provide better and more accurate results. For chromosome-level high-quality genome construction, we recommend using 3D-DNA with EagleC evaluation. Our study references how to obtain and evaluate high-quality genome assemblies, and is a resource for biological control, comparative genomics, and evolutionary studies of Lepidopteran pests and related species
    corecore