5 research outputs found

    Quantitative RNA-Seq analysis in non-model species: assessing transcriptome assemblies as a scaffold and the utility of evolutionary divergent genomic reference species

    Get PDF
    Background How well does RNA-Seq data perform for quantitative whole gene expression analysis in the absence of a genome? This is one unanswered question facing the rapidly growing number of researchers studying non-model species. Using Homo sapiens data and resources, we compared the direct mapping of sequencing reads to predicted genes from the genome with mapping to de novo transcriptomes assembled from RNA-Seq data. Gene coverage and expression analysis was further investigated in the non-model context by using increasingly divergent genomic reference species to group assembled contigs by unique genes. Results Eight transcriptome sets, composed of varying amounts of Illumina and 454 data, were assembled and assessed. Hybrid 454/Illumina assemblies had the highest transcriptome and individual gene coverage. Quantitative whole gene expression levels were highly similar between using a de novo hybrid assembly and the predicted genes as a scaffold, although mapping to the de novo transcriptome assembly provided data on fewer genes. Using non-target species as reference scaffolds does result in some loss of sequence and expression data, and bias and error increase with evolutionary distance. However, within a 100 million year window these effect sizes are relatively small. Conclusions Predicted gene sets from sequenced genomes of related species can provide a powerful method for grouping RNA-Seq reads and annotating contigs. Gene expression results can be produced that are similar to results obtained using gene models derived from a high quality genome, though biased towards conserved genes. Our results demonstrate the power and limitations of conducting RNA-Seq in non-model species.Peer reviewe

    Transcriptome sequencing reveals high isoform diversity in the ant Formica exsecta

    Get PDF
    Transcriptome resources for social insects have the potential to provide new insight into polyphenism, i.e., how divergent phenotypes arise from the same genome. Here we present a transcriptome based on paired-end RNA sequencing data for the ant Formica exsecta (Formicidae, Hymenoptera). The RNA sequencing libraries were constructed from samples of several life stages of both sexes and female castes of queens and workers, in order to maximize representation of expressed genes. We first compare the performance of common assembly and scaffolding software (Trinity, Velvet-Oases, and SOAPdenovo-trans), in producing de novo assemblies. Second, we annotate the resulting expressed contigs to the currently published genomes of ants, and other insects, including the honeybee, to filter genes that have annotation evidence of being true genes. Our pipeline resulted in a final assembly of altogether 39,262 mRNA transcripts, with an average coverage of >300X, belonging to 17,496 unique genes with annotation in the related ant species. From these genes, 536 genes were unique to one caste or sex only, highlighting the importance of comprehensive sampling. Our final assembly also showed expression of several splice variants in 6,975 genes, and we show that accounting for splice variants affects the outcome of downstream analyses such as gene ontologies. Our transcriptome provides an outstanding resource for future genetic studies on F. exsecta and other ant species, and the presented transcriptome assembly can be adapted to any non-model species that has genomic resources available from a related taxon.Peer reviewe

    A high-coverage draft genome of the mycalesine butterfly Bicyclus anynana

    Get PDF
    The mycalesine butterfly Bicyclus anynana, the “Squinting bush brown,” is a model organism in the study of lepidopteran ecology, development, and evolution. Here, we present a draft genome sequence for B. anynana to serve as a genomics resource for current and future studies of this important model species. Seven libraries with insert sizes ranging from 350 bp to 20 kb were constructed using DNA from an inbred female and sequenced using both Illumina and PacBio technology; 128 Gb of raw Illumina data was filtered to 124 Gb and assembled to a final size of 475 Mb (∼×260 assembly coverage). Contigs were scaffolded using mate-pair, transcriptome, and PacBio data into 10 800 sequences with an N50 of 638 kb (longest scaffold 5 Mb). The genome is comprised of 26% repetitive elements and encodes a total of 22 642 predicted protein-coding genes. Recovery of a BUSCO set of core metazoan genes was almost complete (98%). Overall, these metrics compare well with other recently published lepidopteran genomes. We report a high-quality draft genome sequence for Bicyclus anynana. The genome assembly and annotated gene models are available at LepBase (http://ensembl.lepbase.org/index.html).Peer reviewe

    Distribution and Characteristics of Colonic Diverticula in a United States Screening Population

    No full text
    BACKGROUND & AIMS: Colonic diverticula are the most common finding from colonoscopy examinations. Little is known about the distribution of colonic diverticula, which are responsible for symptomatic and costly diverticular disease. We aimed to assess the number, location, and characteristics of colonic diverticula in a large US screening population. METHODS: We analyzed data from a prospective study of 624 patients (mean age, 54 years) undergoing screening colonoscopy at the University of North Carolina Hospital from 2013 through 2015. The examination included a detailed assessment of colonic diverticula. To assess the association between participant characteristics and diverticula, we used logistic regression to estimate odds ratios (ORs) and 95% confidence intervals (CIs). RESULTS: Of our population, 260 patients (42%) had one or more diverticula (mean number 14; range, 1–158). Participants with diverticula were more likely to be older, male, and have a higher body mass index than those without diverticula. The distribution of diverticula differed significantly by race. Among Whites, 75% of diverticula were in the sigmoid colon, 11% in the descending splenic flexure, 6% in the transverse colon, and 8% were in the ascending colon or hepatic flexure; in Blacks 64% of diverticula were in the sigmoid colon, 8% in the descending colon or splenic flexure, 7% in the transverse colon, and 20% in the ascending colon or hepatic flexure (P=.0008). The proportion of patients with diverticula increased with age: 35% were 50 years or younger, 40% were 51–60 years, and 58% were older than 60 years. The proportion of patients with more than 10 diverticula increased with age: 8% were 50 years or younger, 15% were 51–60 years, and 30% were older than 60 years. CONCLUSIONS: Older individuals not only have a higher prevalence of diverticula than younger individuals, but also a greater density, indicating that this is a progressive disease. Blacks have a greater percentage of their diverticula in the proximal colon and fewer in the distal colon compared with Whites. Understanding the distribution and determinants of diverticula is the first step in preventing diverticulosis and its complications
    corecore