158 research outputs found

    On the importance of being finished

    Get PDF
    The publication of an increasing number of draft genome sequences presents problems that will only be resolved by improved search tools and by complete finishing of the sequences - and their deposition in publicly accessible databases

    A comparative analysis of exome capture

    Get PDF
    ABSTRACT: BACKGROUND: Human exome resequencing using commercial target capture kits has been and is being used for sequencing large numbers of individuals to search for variants associated with various human diseases. We rigorously evaluated the capabilities of two solution exome capture kits. These analyses help clarify the strengths and limitations of those data as well as systematically identify variables that should be considered in the use of those data. RESULTS: Each exome kit performed well at capturing the targets they were designed to capture, which mainly corresponds to the consensus coding sequences (CCDS) annotations of the human genome. In addition, based on their respective targets, each capture kit coupled with high coverage Illumina sequencing produced highly accurate nucleotide calls. However, other databases, such as the Reference Sequence collection (RefSeq), define the exome more broadly, and so not surprisingly, the exome kits did not capture these additional regions. CONCLUSIONS: Commercial exome capture kits provide a very efficient way to sequence select areas of the genome at very high accuracy. Here we provide the data to help guide critical analyses of sequencing data derived from these products

    EST analysis in Ginkgo biloba: an assessment of conserved developmental regulators and gymnosperm specific genes

    Get PDF
    BACKGROUND: Ginkgo biloba L. is the only surviving member of one of the oldest living seed plant groups with medicinal, spiritual and horticultural importance worldwide. As an evolutionary relic, it displays many characters found in the early, extinct seed plants and extant cycads. To establish a molecular base to understand the evolution of seeds and pollen, we created a cDNA library and EST dataset from the reproductive structures of male (microsporangiate), female (megasporangiate), and vegetative organs (leaves) of Ginkgo biloba. RESULTS: RNA from newly emerged male and female reproductive organs and immature leaves was used to create three distinct cDNA libraries from which 6,434 ESTs were generated. These 6,434 ESTs from Ginkgo biloba were clustered into 3,830 unigenes. A comparison of our Ginkgo unigene set against the fully annotated genomes of rice and Arabidopsis, and all available ESTs in Genbank revealed that 256 Ginkgo unigenes match only genes among the gymnosperms and non-seed plants – many with multiple matches to genes in non-angiosperm plants. Conversely, another group of unigenes in Gingko had highly significant homology to transcription factors in angiosperms involved in development, including MADS box genes as well as post-transcriptional regulators. Several of the conserved developmental genes found in Ginkgo had top BLAST homology to cycad genes. We also note here the presence of ESTs in G. biloba similar to genes that to date have only been found in gymnosperms and an additional 22 Ginkgo genes common only to genes from cycads. CONCLUSION: Our analysis of an EST dataset from G. biloba revealed genes potentially unique to gymnosperms. Many of these genes showed homology to fully sequenced clones from our cycad EST dataset found in common only with gymnosperms. Other Ginkgo ESTs are similar to developmental regulators in higher plants. This work sets the stage for future studies on Ginkgo to better understand seed and pollen evolution, and to resolve the ambiguous phylogenetic relationship of G. biloba among the gymnosperms

    Phylogenomic analysis of transcriptome data elucidates co-occurrence of a paleopolyploid event and the origin of bimodal karyotypes in Agavoideae (Asparagaceae)

    Get PDF
    Premise of the study: The stability of the bimodal karyotype found in Agave and closely related species has long interested botanists. The origin of the bimodal karyotype has been attributed to allopolyploidy, but this hypothesis has not been tested. Next-generation transcriptome sequence data were used to test whether a paleopolyploid event occurred on the same branch of the Agavoideae phylogenetic tree as the origin of the Yucca-Agave bimodal karyotype. Methods: Illumina RNA-seq data were generated for phylogenetically strategic species in Agavoideae. Paleopolyploidy was inferred in analyses of frequency plots for synonymous substitutions per synonymous site (K-s) between Hosta, Agave, and Chlorophytum paralogous and orthologous gene pairs. Phylogenies of gene families including paralogous genes for these species and outgroup species were estimated to place inferred paleopolyploid events on a species tree. Key results: K-s frequency plots suggested paleopolyploid events in the history of the genera Agave, Hosta, and Chlorophytum. Phylogenetic analyses of gene families estimated from transcriptome data revealed two polyploid events: one predating the last common ancestor of Agave and Hosta and one within the lineage leading to Chlorophytum. Conclusions: We found that polyploidy and the origin of the Yucca-Agave bimodal karyotype co-occur on the same lineage consistent with the hypothesis that the bimodal karyotype is a consequence of allopolyploidy. We discuss this and alternative mechanisms for the formation of the Yucca-Agave bimodal karyotype. More generally, we illustrate how the use of next-generation sequencing technology is a cost-efficient means for assessing genome evolution in nonmodel species

    Genome and transcriptome of the regeneration-competent flatworm, Macrostomum lignano.

    Get PDF
    The free-living flatworm, Macrostomum lignano has an impressive regenerative capacity. Following injury, it can regenerate almost an entirely new organism because of the presence of an abundant somatic stem cell population, the neoblasts. This set of unique properties makes many flatworms attractive organisms for studying the evolution of pathways involved in tissue self-renewal, cell-fate specification, and regeneration. The use of these organisms as models, however, is hampered by the lack of a well-assembled and annotated genome sequences, fundamental to modern genetic and molecular studies. Here we report the genomic sequence of M. lignano and an accompanying characterization of its transcriptome. The genome structure of M. lignano is remarkably complex, with ∼75% of its sequence being comprised of simple repeats and transposon sequences. This has made high-quality assembly from Illumina reads alone impossible (N50=222 bp). We therefore generated 130× coverage by long sequencing reads from the Pacific Biosciences platform to create a substantially improved assembly with an N50 of 64 Kbp. We complemented the reference genome with an assembled and annotated transcriptome, and used both of these datasets in combination to probe gene-expression patterns during regeneration, examining pathways important to stem cell function.This work is supported by National Institutes of Health Grants R37 GM062534 (to G.J.H.) and R01-HG006677 (to M.S.); National Science Foundation Grant DBI-1350041 (to M.S.); and a Swiss National Science Foundation Grant 31003A-143732 (to L.S.). This work was performed with assistance from Cold Spring Harbor Laboratory Shared Resources, which are funded, in part, by Cancer Center Support Grant 5P30CA045508.This is the final version of the article. It first appeared from PNAS via http://dx.doi.org/10.1073/pnas.151671811

    DNA sequence level analyses reveal potential phenotypic modifiers in a large family with psychiatric disorders

    Get PDF
    Psychiatric disorders are a group of genetically related diseases with highly polygenic architectures. Genome-wide association analyses have made substantial progress towards understanding the genetic architecture of these disorders. More recently, exome- and whole-genome sequencing of cases and families have identified rare, high penetrant variants that provide direct functional insight. There remains, however, a gap in the heritability explained by these complementary approaches. To understand how multiple genetic variants combine to modify both severity and penetrance of a highly penetrant variant, we sequenced 48 whole genomes from a family with a high loading of psychiatric disorder linked to a balanced chromosomal translocation. The (1;11)(q42;q14.3) translocation directly disrupts three genes: DISC1, DISC2, DISC1FP and has been linked to multiple brain imaging and neurocognitive outcomes in the family. Using DNA sequence-level linkage analysis, functional annotation and population-based association, we identified common and rare variants in GRM5 (minor allele frequency (MAF) > 0.05), PDE4D (MAF > 0.2) and CNTN5 (MAF < 0.01) that may help explain the individual differences in phenotypic expression in the family. We suggest that whole-genome sequencing in large families will improve the understanding of the combined effects of the rare and common sequence variation underlying psychiatric phenotypes

    Deciphering the genome structure and paleohistory of _Theobroma cacao_

    Get PDF
    We sequenced and assembled the genome of _Theobroma cacao_, an economically important tropical fruit tree crop that is the source of chocolate. The assembly corresponds to 76% of the estimated genome size and contains almost all previously described genes, with 82% of them anchored on the 10 _T. cacao_ chromosomes. Analysis of this sequence information highlighted specific expansion of some gene families during evolution, for example flavonoid-related genes. It also provides a major source of candidate genes for _T. cacao_ disease resistance and quality improvement. Based on the inferred paleohistory of the T. cacao genome, we propose an evolutionary scenario whereby the ten _T. cacao_ chromosomes were shaped from an ancestor through eleven chromosome fusions. The _T. cacao_ genome can be considered as a simple living relic of higher plant evolution

    A genome triplication associated with early diversification of the core eudicots

    Get PDF
    Background: Although it is agreed that a major polyploidy event, gamma, occurred within the eudicots, the phylogenetic placement of the event remains unclear. Results: To determine when this polyploidization occurred relative to speciation events in angiosperm history, we employed a phylogenomic approach to investigate the timing of gene set duplications located on syntenic gamma blocks. We populated 769 putative gene families with large sets of homologs obtained from public transcriptomes of basal angiosperms, magnoliids, asterids, and more than 91.8 gigabases of new next-generation transcriptome sequences of non-grass monocots and basal eudicots. The overwhelming majority (95%) of well-resolved gamma duplications was placed before the separation of rosids and asterids and after the split of monocots and eudicots, providing strong evidence that the gamma polyploidy event occurred early in eudicot evolution. Further, the majority of gene duplications was placed after the divergence of the Ranunculales and core eudicots, indicating that the gamma appears to be restricted to core eudicots. Molecular dating estimates indicate that the duplication events were intensely concentrated around 117 million years ago. Conclusions: The rapid radiation of core eudicot lineages that gave rise to nearly 75% of angiosperm species appears to have occurred coincidentally or shortly following the gamma triplication event. Reconciliation of gene trees with a species phylogeny can elucidate the timing of major events in genome evolution, even when genome sequences are only available for a subset of species represented in the gene trees. Comprehensive transcriptome datasets are valuable complements to genome sequences for high-resolution phylogenomic analysis

    Impact of preoperative antibiotics and other variables on integrated microbiome-host transcriptomic data generated from colorectal cancer resections.

    Get PDF
    BACKGROUND: Integrative multi-omic approaches have been increasingly applied to discovery and functional studies of complex human diseases. Short-term preoperative antibiotics have been adopted to reduce site infections in colorectal cancer (CRC) resections. We hypothesize that the antibiotics will impact analysis of multi-omic datasets generated from resection samples to investigate biological CRC risk factors. AIM: To assess the impact of preoperative antibiotics and other variables on integrated microbiome and human transcriptomic data generated from archived CRC resection samples. METHODS: Genomic DNA (gDNA) and RNA were extracted from prospectively collected 51 pairs of frozen sporadic CRC tumor and adjacent non-tumor mucosal samples from 50 CRC patients archived at a single medical center from 2010-2020. The 16S rRNA gene sequencing (V3V4 region, paired end, 300 bp) and confirmatory quantitative polymerase chain reaction (qPCR) assays were conducted on gDNA. RNA sequencing (IPE, 125 bp) was performed on parallel tumor and non-tumor RNA samples with RNA Integrity Numbers scores ≥ 6. RESULTS: PERMANOVA detected significant effects of tumor vs nontumor histology (P = 0.002) and antibiotics (P = 0.001) on microbial β-diversity, but CRC tumor location (left vs right), diabetes mellitus vs not diabetic and Black/African Ancestry (AA) vs not Black/AA, did not reach significance. Linear mixed models detected significant tumor vs nontumor histology*antibiotics interaction terms for 14 genus level taxa. QPCR confirmed increased Fusobacterium abundance in tumor vs nontumor groups, and detected significantly reduced bacterial load in the (+)antibiotics group. Principal coordinate analysis of the transcriptomic data showed a clear separation between tumor and nontumor samples. Differentially expressed genes obtained from separate analyses of tumor and nontumor samples, are presented for the antibiotics, CRC location, diabetes and Black/AA race groups. CONCLUSION: Recent adoption of additional preoperative antibiotics as standard of care, has a measurable impact on -omics analysis of resected specimens. This study still confirmed increased Fusobacterium nucleatum in tumor
    corecore