    Empirical comparison of ab initio repeat finding programs

    Identification of dispersed repetitive elements can be difficult, especially when elements share little or no homology with previously described repeats. Consequently, a growing number of computational tools have been designed to identify repetitive elements in an ab initio manner, i.e. without using prior sequence data. Here we present the results of side-by-side evaluations of six of the most widely used ab initio repeat finding programs. Using sequence from rice chromosome 12, tools were compared with regard to time requirements, ability to find known repeats, utility in identifying potential novel repeats, number and types of repeat elements recognized and compactness of family descriptions. The study reveals profound differences in the utility of the tools with some identifying virtually their entire substrate as repetitive, others making reasonable estimates of repetition, and some missing almost all repeats. Of note, even when tools recognized similar numbers of repeats they often showed marked differences in the nature and number of repeat families identified. Within the context of this comparative study, ReAS and RepeatScout showed the most promise in analysis of sequence reads and assembled genomic regions, respectively. Our results should help biologists identify the program(s), if any, that is best suited for their needs

    A quick guide for student-driven community genome annotation

    High quality gene models are necessary to expand the molecular and genetic tools available for a target organism, but these are available for only a handful of model organisms that have undergone extensive curation and experimental validation over the course of many years. The majority of gene models present in biological databases today have been identified in draft genome assemblies using automated annotation pipelines that are frequently based on orthologs from distantly related model organisms. Manual curation is time consuming and often requires substantial expertise, but is instrumental in improving gene model structure and identification. Manual annotation may seem to be a daunting and cost-prohibitive task for small research communities but involving undergraduates in community genome annotation consortiums can be mutually beneficial for both education and improved genomic resources. We outline a workflow for efficient manual annotation driven by a team of primarily undergraduate annotators. This model can be scaled to large teams and includes quality control processes through incremental evaluation. Moreover, it gives students an opportunity to increase their understanding of genome biology and to participate in scientific research in collaboration with peers and senior researchers at multiple institutions

    Differential gene expression of Asian citrus psyllids infected with ‘Ca. Liberibacter asiaticus’ reveals hyper-susceptibility to invasion by instar fourth-fifth and teneral adult stages

    The bacterial pathogen Candidatus Liberibacter asiaticus (CLas) is the causal agent of citrus greening disease. This unusual plant pathogenic bacterium also infects its psyllid host, the Asian citrus psyllid (ACP). To investigate gene expression profiles with a focus on genes involved in infection and circulation within the psyllid host of CLas, RNA-seq libraries were constructed from CLas-infected and CLas-free ACP representing the five different developmental stages, namely, nymphal instars 1-2, 3, and 4-5, and teneral and mature adults. The Gbp paired-end reads (296) representing the transcriptional landscape of ACP across all life stages and the official gene set (OGSv3) were annotated based on the chromosomal-length v3 reference genome and used for de novo transcript discovery resulting in 25,410 genes with 124,177 isoforms. Differential expression analysis across all ACP developmental stages revealed instar-specific responses to CLas infection, with greater overall responses by nymphal instars, compared to mature adults. More genes were over-or under-expressed in the 4-5th nymphal instars and young (teneral) adults than in instars 1-3, or mature adults, indicating that late immature instars and young maturing adults were highly responsive to CLas infection. Genes identified with potential for direct or indirect involvement in the ACP-CLas circulative, propagative transmission pathway were predominantly responsive during early invasion and infection processes and included canonical cytoskeletal remodeling and endo-exocytosis pathway genes. Genes with predicted functions in defense, development, and immunity exhibited the greatest responsiveness to CLas infection. These results shed new light on ACP-CLas interactions essential for pathogenesis of the psyllid host, some that share striking similarities with effector protein-animal host mechanisms reported for other culturable and/or fastidious bacterial- or viral- host pathosystems

    An annotated genetic map of loblolly pine based on microsatellite and cDNA markers

    BACKGROUND: Previous loblolly pine (Pinus taeda L.) genetic linkage maps have been based on a variety of DNA polymorphisms, such as AFLPs, RAPDs, RFLPs, and ESTPs, but only a few SSRs (simple sequence repeats), also known as simple tandem repeats or microsatellites, have been mapped in P. taeda. The objective of this study was to integrate a large set of SSR markers from a variety of sources and published cDNA markers into a composite P. taeda genetic map constructed from two reference mapping pedigrees. A dense genetic map that incorporates SSR loci will benefit complete pine genome sequencing, pine population genetics studies, and pine breeding programs. Careful marker annotation using a variety of references further enhances the utility of the integrated SSR map. RESULTS: The updated P. taeda genetic map, with an estimated genome coverage of 1,515 cM((Kosambi)) across 12 linkage groups, incorporated 170 new SSR markers and 290 previously reported SSR, RFLP, and ESTP markers. The average marker interval was 3.1 cM. Of 233 mapped SSR loci, 84 were from cDNA-derived sequences (EST-SSRs) and 149 were from non-transcribed genomic sequences (genomic-SSRs). Of all 311 mapped cDNA-derived markers, 77% were associated with NCBI Pta UniGene clusters, 67% with RefSeq proteins, and 62% with functional Gene Ontology (GO) terms. Duplicate (i.e., redundant accessory) and paralogous markers were tentatively identified by evaluating marker sequences by their UniGene cluster IDs, clone IDs, and relative map positions. The average gene diversity, H(e), among polymorphic SSR loci, including those that were not mapped, was 0.43 for 94 EST-SSRs and 0.72 for 83 genomic-SSRs. The genetic map can be viewed and queried at http://www.conifergdb.org/pinemap. CONCLUSIONS: Many polymorphic and genetically mapped SSR markers are now available for use in P. taeda population genetics, studies of adaptive traits, and various germplasm management applications. Annotating mapped genes with UniGene clusters and GO terms allowed assessment of redundant and paralogous EST markers and further improved the quality and utility of the genetic map for P. taeda

    Skin Transcriptome of Middle-Aged Women Supplemented With Natural Herbo-mineral Shilajit Shows Induction of Microvascular and Extracellular Matrix Mechanisms

    Objective: Shilajit is a pale-brown to blackish-brown organic mineral substance available from Himalayan rocks. We demonstrated that in type I obese humans, shilajit supplementation significantly upregulated extracellular matrix (ECM)–related genes in the skeletal muscle. Such an effect was highly synergistic with exercise. The present study (clinicaltrials.gov ) aimed to evaluate the effects of shilajit supplementation on skin gene expression profile and microperfusion in healthy adult females. Methods: The study design comprised six total study visits including a baseline visit (V1) and a final 14-week visit (V6) following oral shilajit supplementation (125 or 250 mg bid). A skin biopsy of the left inner upper arm of each subject was collected at visit 2 and visit 6 for gene expression profiling using Affymetrix Clariom™ D Assay. Skin perfusion was determined by MATLAB processing of dermascopic images. Transcriptome data were normalized and subjected to statistical analysis. The differentially regulated genes were subjected to Ingenuity Pathway Analysis (IPA®). The expression of the differentially regulated genes identified by IPA® were verified using real-time polymerasechain reaction (RT-PCR). Results: Supplementation with shilajit for 14 weeks was not associated with any reported adverse effect within this period. At a higher dose (250 mg bid), shilajit improved skin perfusion when compared to baseline or the placebo. Pathway analysis identified shilajit-inducible genes relevant to endothelial cell migration, growth of blood vessels, and ECM which were validated by quantitative real-time polymerasechain reaction (RT-PCR) analysis. Conclusions: This work provides maiden evidence demonstrating that oral shilajit supplementation in adult healthy women induced genes relevant to endothelial cell migration and growth of blood vessels. Shilajit supplementation improved skin microperfusion

    Graph pangenome captures missing heritability and empowers tomato breeding

    Missing heritability in genome-wide association studies defines a major problem in genetic analyses of complex biological traits(1,2). The solution to this problem is to identify all causal genetic variants and to measure their individual contributions(3,4). Here we report a graph pangenome of tomato constructed by precisely cataloguing more than 19 million variants from 838 genomes, including 32 new reference-level genome assemblies. This graph pangenome was used forgenome-wide association study analyses and heritability estimation of 20,323 gene-expression and metabolite traits. The average estimated trait heritability is 0.41 compared with 0.33 when using the single linear reference genome. This 24% increase in estimated heritability is largely due to resolving incomplete linkage disequilibrium through the inclusion of additional causal structural variants identified using the graph pangenome. Moreover, by resolving allelic and locus heterogeneity, structural variants improve the power to identify genetic factors underlying agronomically important traits leading to, for example, the identification of two new genes potentially contributing to soluble solid content. The newly identified structural variants will facilitate genetic improvement of tomato through both marker-assisted selection and genomic selection. Our study advances the understanding of the heritability of complex traits and demonstrates the power of the graph pangenome in crop breeding