21 research outputs found

    Chromosome-level genome assembly of largemouth bass (Micropterus salmoides) using PacBio and Hi-C technologies

    Get PDF
    The largemouth bass (Micropterus salmoides) has become a cosmopolitan species due to its widespread introduction as game or domesticated fish. Here a high-quality chromosome-level reference genome of M. salmoides was produced by combining Illumina paired-end sequencing, PacBio single molecule sequencing technique (SMRT) and High-through chromosome conformation capture (Hi-C) technologies. Ultimately, the genome was assembled into 844.88 Mb with a contig N50 of 15.68 Mb and scaffold N50 length of 35.77 Mb. About 99.9% assembly genome sequences (844.00 Mb) could be anchored to 23 chromosomes, and 98.03% assembly genome sequences could be ordered and directed. The genome contained 38.19% repeat sequences and 2693 noncoding RNAs. A total of 26,370 protein-coding genes from 3415 gene families were predicted, of which 97.69% were functionally annotated. The high-quality genome assembly will be a fundamental resource to study and understand how M. salmoides adapt to novel and changing environments around the world, and also be expected to contribute to the genetic breeding and other research.info:eu-repo/semantics/publishedVersio

    SeagrassDB: An open-source transcriptomics landscape for phylogenetically profiled seagrasses and aquatic plants

    Full text link
    © 2018, The Author(s). Seagrasses and aquatic plants are important clades of higher plants, significant for carbon sequestration and marine ecological restoration. They are valuable in the sense that they allow us to understand how plants have developed traits to adapt to high salinity and photosynthetically challenged environments. Here, we present a large-scale phylogenetically profiled transcriptomics repository covering seagrasses and aquatic plants. SeagrassDB encompasses a total of 1,052,262 unigenes with a minimum and maximum contig length of 8,831 bp and 16,705 bp respectively. SeagrassDB provides access to 34,455 transcription factors, 470,568 PFAM domains, 382,528 prosite models and 482,121 InterPro domains across 9 species. SeagrassDB allows for the comparative gene mining using BLAST-based approaches and subsequent unigenes sequence retrieval with associated features such as expression (FPKM values), gene ontologies, functional assignments, family level classification, Interpro domains, KEGG orthology (KO), transcription factors and prosite information. SeagrassDB is available to the scientific community for exploring the functional genic landscape of seagrass and aquatic plants at: http://115.146.91.129/index.php

    Long-read RNA Sequencing Improves the Annotation of the Equine Transcriptome

    Get PDF
    A high-quality reference genome assembly, a biobank of diverse equine tissues from the Functional Annotation of the Animal Genome (FAANG) initiative, and incorporation of long-read sequencing technologies, have enabled efforts to build a comprehensive and tissue-specific equine transcriptome. The equine FAANG transcriptome reported here provides up to 45% improvement in transcriptome completeness across tissue types when compared to either RefSeq or Ensembl transcriptomes. This transcriptome also provides major improvements in the identification of alternatively spliced isoforms, novel noncoding genes, and 3’ transcription termination site (TTS) annotations. The equine FAANG transcriptome will empower future functional studies of important equine traits while providing future opportunities to identify allele-specific expression and differentially expressed genes across tissues

    Integrated de novo transcriptome of Culex pipiens mosquito larvae as a resource for genetic control strategies

    Get PDF
    We present a de novo transcriptome of the mosquito vector Culex pipiens, assembled by sequences of susceptible and insecticide resistant larvae. The high quality of the assembly was confirmed by TransRate and BUSCO. A mapping percentage until 94.8% was obtained by aligning contigs to Nr, SwissProt, and TrEMBL, with 27,281 sequences that simultaneously mapped on the three databases. A total of 14,966 ORFs were also functionally annotated by using the eggNOG database. Among them, we identified ORF sequences of the main gene families involved in insecticide resistance. Therefore, this resource stands as a valuable reference for further studies of differential gene expression as well as to identify genes of interest for genetic-based control tools

    Active fungal communities in asymptomatic Eucalyptus grandis stems differ between a susceptible and resistant clone

    Get PDF
    Fungi represent a common and diverse part of the microbial communities that associate with plants. They also commonly colonise various plant parts asymptomatically. The molecular mechanisms of these interactions are, however, poorly understood. In this study we use transcriptomic data from Eucalyptus grandis, to demonstrate that RNA-seq data are a neglected source of information to study fungal–host interactions, by exploring the fungal transcripts they inevitably contain. We identified fungal transcripts from E. grandis data based on their sequence dissimilarity to the E. grandis genome and predicted biological functions. Taxonomic classifications identified, amongst other fungi, many well-known pathogenic fungal taxa in the asymptomatic tissue of E. grandis. The comparison of a clone of E. grandis resistant to Chrysoporthe austroafricana with a susceptible clone revealed a significant difference in the number of fungal transcripts, while the number of fungal taxa was not substantially affected. Classifications of transcripts based on their respective biological functions showed that the fungal communities of the two E. grandis clones associate with fundamental biological processes, with some notable differences. To shield the greater host defence machinery in the resistant E. grandis clone, fungi produce more secondary metabolites, whereas the environment for fungi associated with the susceptible E. grandis clone is more conducive for building fungal cellular structures and biomass growth. Secreted proteins included carbohydrate active enzymes that potentially are involved in fungal–plant and fungal–microbe interactions. While plant transcriptome datasets cannot replace the need for designed experiments to probe plant–microbe interactions at a molecular level, they clearly hold potential to add to the understanding of the diversity of plant–microbe interactions.Tree Protection Co-operative Programme (TPCP) and DST/NRF Centre of Excellence in Tree Health Biotechnology (CTHB).https://www.mdpi.com/journal/microorganismspm2020BiochemistryForestry and Agricultural Biotechnology Institute (FABI)GeneticsMicrobiology and Plant Patholog

    A metagenomic study of DNA viruses from samples of local varieties of common bean in Kenya

    Get PDF
    Common bean (Phaseolus vulgaris L.) is the primary source of protein and nutrients in the majority of households in sub-Saharan Africa. However, pests and viral diseases are key drivers in the reduction of bean production. To date, the majority of viruses reported in beans have been RNA viruses. In this study, we carried out a viral metagenomic analysis on virus symptomatic bean plants. Our virus detection pipeline identified three viral fragments of the double-stranded DNA virus Pelargonium vein banding virus (PVBV) (family, Caulimoviridae, genus Badnavirus). This is the first report of the dsDNA virus and specifically PVBV in legumes to our knowledge. In addition two previously reported +ssRNA viruses the bean common mosaic necrosis virus (BCMNVA) (Potyviridae) and aphid lethal paralysis virus (ALPV) (Dicistroviridae) were identified. Bayesian phylogenetic analysis of the Badnavirus (PVBV) using amino acid sequences of the RT/RNA-dependent DNA polymerase region showed the Kenyan sequence (SRF019_MK014483) was closely matched with two Badnavirus viruses: Dracaena mottle virus (DrMV) (YP_610965) and Lucky bamboo bacilliform virus (ABR01170). Phylogenetic analysis of BCMNVA was based on amino acid sequences of the Nib region. The BCMNVA phylogenetic tree resolved two clades identified as clade (I and II). Sequence from this study SRF35_MK014482, clustered within clade I with other Kenyan sequences. Conversely, Bayesian phylogenetic analysis of ALPV was based on nucleotide sequences of the hypothetical protein gene 1 and 2. Three main clades were resolved and identified as clades I–III. The Kenyan sequence from this study (SRF35_MK014481) clustered within clade II, and nested within a sub-clade; comprising of sequences from China and an earlier ALPV sequences from Kenya isolated from maize (MF458892). Our findings support the use of viral metagenomics to reveal the nascent viruses, their viral diversity and evolutionary history of these viruses. The detection of ALPV and PVBV indicate that these viruses have likely been underreported due to the unavailability of diagnostic tools

    SQANTI : extensive characterization of long-read transcript sequences for quality control in full-length transcriptome identification and quantification

    Get PDF
    High-throughput sequencing of full-length transcripts using long reads has paved the way for the discovery of thousands of novel transcripts, even in well-annotated mammalian species. The advances in sequencing technology have created a need for studies and tools that can characterize these novel variants. Here, we present SQANTI, an automated pipeline for the classification of long-read transcripts that can assess the quality of data and the preprocessing pipeline using 47 unique descriptors. We apply SQANTI to a neuronal mouse transcriptome using Pacific Biosciences (PacBio) long reads and illustrate how the tool is effective in characterizing and describing the composition of the full-length transcriptome. We perform extensive evaluation of ToFU PacBio transcripts by PCR to reveal that an important number of the novel transcripts are technical artifacts of the sequencing approach and that SQANTI quality descriptors can be used to engineer a filtering strategy to remove them. Most novel transcripts in this curated transcriptome are novel combinations of existing splice sites, resulting more frequently in novel ORFs than novel UTRs, and are enriched in both general metabolic and neural-specific functions. We show that these new transcripts have a major impact in the correct quantification of transcript levels by state-of-the-art short-read-based quantification algorithms. By comparing our iso-transcriptome with public proteomics databases, we find that alternative isoforms are elusive to proteogenomics detection. SQANTI allows the user to maximize the analytical outcome of long-read technologies by providing the tools to deliver quality-evaluated and curated full-length transcriptomes

    Improving algorithms of gene prediction in prokaryotic genomes, metagenomes, and eukaryotic transcriptomes

    Get PDF
    Next-generation sequencing has generated enormous amount of DNA and RNA sequences that potentially carry volumes of genetic information, e.g. protein-coding genes. The thesis is divided into three main parts describing i) GeneMarkS-2, ii) GeneMarkS-T, and iii) MetaGeneTack. In prokaryotic genomes, ab initio gene finders can predict genes with high accuracy. However, the error rate is not negligible and largely species-specific. Most errors in gene prediction are made in genes located in genomic regions with atypical GC composition, e.g. genes in pathogenicity islands. We describe a new algorithm GeneMarkS-2 that uses local GC-specific heuristic models for scoring individual ORFs in the first step of analysis. Predicted atypical genes are retained and serve as ‘external’ evidence in subsequent runs of self-training. GeneMarkS-2 also controls the quality of training process by effectively selecting optimal orders of the Markov chain models as well as duration parameters in the hidden semi-Markov model. GeneMarkS-2 has shown significantly improved accuracy compared with other state-of-the-art gene prediction tools. Massive parallel sequencing of RNA transcripts by the next generation technology (RNA-Seq) provides large amount of RNA reads that can be assembled to full transcriptome. We have developed a new tool, GeneMarkS-T, for ab initio identification of protein-coding regions in RNA transcripts. Unsupervised estimation of parameters of the algorithm makes unnecessary several steps in the conventional gene prediction protocols, most importantly the manually curated preparation of training sets. We have demonstrated that the GeneMarkS-T self-training is robust with respect to the presence of errors in assembled transcripts and the accuracy of GeneMarkS-T in identifying protein-coding regions and, particularly, in predicting gene starts compares favorably to other existing methods. Frameshift prediction (FS) is important for analysis and biological interpretation of metagenomic sequences. Reads in metagenomic samples are prone to sequencing errors. Insertion and deletion errors that change the coding frame impair the accurate identification of protein coding genes. Accurate frameshift prediction requires sufficient amount of data to estimate parameters of species-specific statistical models of protein-coding and non-coding regions. However, this data is not available; all we have is metagenomic sequences of unknown origin. The challenge of ab initio FS detection is, therefore, twofold: (i) to find a way to infer necessary model parameters and (ii) to identify positions of frameshifts (if any). We describe a new tool, MetaGeneTack, which uses a heuristic method to estimate parameters of sequence models used in the FS detection algorithm. It was shown on several test sets that the performance of MetaGeneTack FS detection is comparable or better than the one of earlier developed program FragGeneScan.Ph.D

    Venomous gland transcriptome and venom proteomic analysis of the scorpion Androctonus amoreuxi reveal new peptides with anti-SARS-CoV-2 activity

    Get PDF
    The recent COVID-19 pandemic shows the critical need for novel broad spectrum antiviral agents. Scorpion venoms are known to contain highly bioactive peptides, several of which have demonstrated strong antiviral activity against a range of viruses. We have generated the first annotated reference transcriptome for the Androctonus amoreuxi venom gland and used high performance liquid chromatography, transcriptome mining, circular dichroism and mass spectrometric analysis to purify and characterize twelve previously undescribed venom peptides. Selected peptides were tested for binding to the receptor-binding domain (RBD) of the SARS-CoV-2 spike protein and inhibition of the spike RBD – human angiotensin-converting enzyme 2 (hACE2) interaction using surface plasmon resonance-based assays. Seven peptides showed dose-dependent inhibitory effects, albeit with IC50 in the high micromolar range (117–1202 µM). The most active peptide was synthesized using solid phase peptide synthesis and tested for its antiviral activity against SARS-CoV-2 (Lineage B.1.1.7). On exposure to the synthetic peptide of a human lung cell line infected with replication-competent SARS-CoV-2, we observed an IC50 of 200 nM, which was nearly 600-fold lower than that observed in the RBD – hACE2 binding inhibition assay. Our results show that scorpion venom peptides can inhibit the SARS-CoV-2 replication although unlikely through inhibition of spike RBD – hACE2 interaction as the primary mode of action. Scorpion venom peptides represent excellent scaffolds for design of novel anti-SARS-CoV-2 constrained peptides. Future studies should fully explore their antiviral mode of action as well as the structural dynamics of inhibition of target virus-host interactions
    corecore