54 research outputs found

    Prediction of NB‐LRR resistance genes based on full‐length sequence homology

    Get PDF
    The activation of plant immunity is mediated by resistance (R)‐gene receptors, also known as nucleotide‐binding leucine‐rich repeat (NB‐LRR) genes, which in turn trigger the authentic defense response. R‐gene identification is a crucial goal for both classic and modern plant breeding strategies for disease resistance. The conventional method identifies NB‐LRR genes using a protein motif/domain‐based search (PDS) within an automatically predicted gene set of the respective genome assembly. PDS proved to be imprecise since repeat masking prior to automatic genome annotation unwittingly prevented comprehensive NB‐LRR gene detection. Furthermore, R‐genes have diversified in a species‐specific manner, so that NB‐LRR gene identification cannot be universally standardized. Here, we present the full‐length Homology‐based R‐gene Prediction (HRP) method for the comprehensive identification and annotation of a genome's R‐gene repertoire. Our method has substantially addressed the complex genomic organization of tomato (Solanum lycopersicum) NB‐LRR gene loci, proving to be more performant than the well‐established RenSeq approach. HRP efficiency was also tested on three differently assembled and annotated Beta sp. genomes. Indeed, HRP identified up to 45% more full‐length NB‐LRR genes compared to previous approaches. HRP also turned out to be a more refined strategy for R‐gene allele mining, testified by the identification of hitherto undiscovered Fom‐2 homologs in five Cucurbita sp. genomes. In summary, our high‐performance method for full‐length NB‐LRR gene discovery will propel the identification of novel R‐genes towards development of improved cultivars

    Evaluation of genomic high-throughput sequencing data generated on Illumina HiSeq and Genome Analyzer systems

    Get PDF
    ABSTRACT: BACKGROUND: The generation and analysis of high-throughput sequencing data are becoming a major component of many studies in molecular biology and medical research. Illumina's Genome Analyzer (GA) and HiSeq instruments are currently the most widely used sequencing devices. Here, we comprehensively evaluate properties of genomic HiSeq and GAIIx data derived from two plant genomes and one virus, with read lengths of 95 to 150 bases. RESULTS: We provide quantifications and evidence for GC bias, error rates, error sequence context, effects of quality filtering, and the reliability of quality values. By combining different filtering criteria we reduced error rates 7-fold at the expense of discarding 12.5% of alignable bases. While overall error rates are low in HiSeq data we observed regions of accumulated wrong base calls. Only 3% of all error positions accounted for 24.7% of all substitution errors. Analyzing the forward and reverse strands separately revealed error rates of up to 18.7%. Insertions and deletions occurred at very low rates on average but increased to up to 2% in homopolymers. A positive correlation between read coverage and GC content was found depending on the GC content range. CONCLUSIONS: The errors and biases we report have implications for the use and the interpretation of Illumina sequencing data. GAIIx and HiSeq data sets show slightly different error profiles. Quality filtering is essential to minimize downstream analysis artifacts. Supporting previous recommendations, the strand-specificity provides a criterion to distinguish sequencing errors from low abundance polymorphisms

    Disruption and pseudoautosomal localization of the major histocompatibility complex in monotremes

    Get PDF
    The characterization and chromosomal mapping of major histocompatibility complex (MHC)-containing BAC clones from platypus and the short-beaked echidna reveals new insights into the evolution of both the mammalian MHC and monotreme sex chromosomes

    Dissecting the effect of genetic variation on the hepatic expression of drug disposition genes across the collaborative cross mouse strains

    Get PDF
    A central challenge in pharmaceutical research is to investigate genetic variation in response to drugs. The Collaborative Cross (CC) mouse reference population is a promising model for pharmacogenomic studies because of its large amount of genetic variation, genetic reproducibility, and dense recombination sites. While the CC lines are phenotypically diverse, their genetic diversity in drug disposition processes, such as detoxification reactions, is still largely uncharacterized. Here we systematically measured RNA-sequencing expression profiles from livers of 29 CC lines under baseline conditions. We then leveraged a reference collection of metabolic biotransformation pathways to map potential relations between drugs and their underlying expression quantitative trait loci (eQTLs). By applying this approach on proximal eQTLs, including eQTLs acting on the overall expression of genes and on the expression of particular transcript isoforms, we were able to construct the organization of hepatic eQTL-drug connectivity across the CC population. The analysis revealed a substantial impact of genetic variation acting on drug biotransformation, allowed mapping of potential joint genetic effects in the context of individual drugs, and demonstrated crosstalk between drug metabolism and lipid metabolism. Our findings provide a resource for investigating drug disposition in the CC strains, and offer a new paradigm for integrating biotransformation reactions to corresponding variations in DNA sequences

    Microarray and deep sequencing cross-platform analysis of the mirRNome and isomiR variation in response to epidermal growth factor

    Get PDF
    BACKGROUND: Epidermal Growth Factor (EGF) plays an important function in the regulation of cell growth, proliferation, and differentiation by binding to its receptor (EGFR) and providing cancer cells with increased survival responsiveness. Signal transduction carried out by EGF has been extensively studied at both transcriptional and post-transcriptional levels. Little is known about the involvement of microRNAs (miRNAs) in the EGF signaling pathway. miRNAs have emerged as major players in the complex networks of gene regulation, and cancer miRNA expression studies have evidenced a direct involvement of miRNAs in cancer progression. RESULTS: In this study, we have used an integrative high content analysis approach to identify the specific miRNAs implicated in EGF signaling in HeLa cells as potential mediators of cancer mediated functions. We have used microarray and deep-sequencing technologies in order to obtain a global view of the EGF miRNA transcriptome with a robust experimental cross-validation. By applying a procedure based on Rankprod tests, we have delimited a solid set of EGF-regulated miRNAs. After validating regulated miRNAs by reverse transcription quantitative PCR, we have derived protein networks and biological functions from the predicted targets of the regulated miRNAs to gain insight into the potential role of miRNAs in EGF-treated cells. In addition, we have analyzed sequence heterogeneity due to editing relative to the reference sequence (isomiRs) among regulated miRNAs. CONCLUSIONS: We propose that the use of global genomic miRNA cross-validation derived from high throughput technologies can be used to generate more reliable datasets inferring more robust networks of co-regulated predicted miRNA target genes

    Multiple platform assessment of the EGF dependent transcriptome by microarray and deep tag sequencing analysis

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Epidermal Growth Factor (EGF) is a key regulatory growth factor activating many processes relevant to normal development and disease, affecting cell proliferation and survival. Here we use a combined approach to study the EGF dependent transcriptome of HeLa cells by using multiple long oligonucleotide based microarray platforms (from Agilent, Operon, and Illumina) in combination with digital gene expression profiling (DGE) with the Illumina Genome Analyzer.</p> <p>Results</p> <p>By applying a procedure for cross-platform data meta-analysis based on RankProd and GlobalAncova tests, we establish a well validated gene set with transcript levels altered after EGF treatment. We use this robust gene list to build higher order networks of gene interaction by interconnecting associated networks, supporting and extending the important role of the EGF signaling pathway in cancer. In addition, we find an entirely new set of genes previously unrelated to the currently accepted EGF associated cellular functions.</p> <p>Conclusions</p> <p>We propose that the use of global genomic cross-validation derived from high content technologies (microarrays or deep sequencing) can be used to generate more reliable datasets. This approach should help to improve the confidence of downstream <it>in silico </it>functional inference analyses based on high content data.</p

    The genetic architecture of the human cerebral cortex

    Get PDF
    The cerebral cortex underlies our complex cognitive capabilities, yet little is known about the specific genetic loci that influence human cortical structure. To identify genetic variants that affect cortical structure, we conducted a genome-wide association meta-analysis of brain magnetic resonance imaging data from 51,665 individuals. We analyzed the surface area and average thickness of the whole cortex and 34 regions with known functional specializations. We identified 199 significant loci and found significant enrichment for loci influencing total surface area within regulatory elements that are active during prenatal cortical development, supporting the radial unit hypothesis. Loci that affect regional surface area cluster near genes in Wnt signaling pathways, which influence progenitor expansion and areal identity. Variation in cortical structure is genetically correlated with cognitive function, Parkinson's disease, insomnia, depression, neuroticism, and attention deficit hyperactivity disorder

    Genomic variation in the genus Beta based on 656 sequenced beet genomes

    No full text
    Abstract Cultivated beets (Beta vulgaris ssp. vulgaris) constitute important crop plants, in particular sugar beet as an indispensable source of sucrose. Several species of wild beets of the genus Beta with distribution along the European Atlantic coast, Macaronesia, and throughout the Mediterranean area exist. Thorough characterization of beet genomes is required for straightforward access to genes promoting genetic resistance against biotic and abiotic stress. Analysing short-read data of 656 sequenced beet genomes, we identified 10 million variant positions in comparison to the sugar beet reference genome RefBeet-1.2. The main groups of species and subspecies were distinguishable based on shared variation, and the separation of sea beets (Beta vulgaris ssp. maritima) into a Mediterranean and an Atlantic subgroup as suggested by previous studies could be confirmed. Complementary approaches of variant-based clustering were employed based on PCA, genotype likelihoods, tree calculations, and admixture analysis. Outliers suggested the occurrence of inter(sub)specific hybridisation, independently confirmed by different analyses. Screens for regions under artificial selection in the sugar beet genome identified 15 Mbp of the genome as variation-poor, enriched for genes involved in shoot system development, stress response, and carbohydrate metabolism. The resources presented herein will be valuable for crop improvement and wild species monitoring and conservation efforts, and for studies on beet genealogy, population structure and population dynamics. Our study provides a wealth of data for in-depth analyses of further aspects of the beet genome towards a thorough understanding of the biology of this important complex of a crop species and its wild relatives

    SHARCGS, a fast and highly accurate short-read assembly algorithm for de novo genomic sequencing

    No full text
    The latest revolution in the DNA sequencing field has been brought about by the development of automated sequencers that are capable of generating giga base pair data sets quickly and at low cost. Applications of such technologies seem to be limited to resequencing and transcript discovery, due to the shortness of the generated reads. In order to extend the fields of application to de novo sequencing, we developed the SHARCGS algorithm to assemble short-read (25–40-mer) data with high accuracy and speed. The efficiency of SHARCGS was tested on BAC inserts from three eukaryotic species, on two yeast chromosomes, and on two bacterial genomes (Haemophilus influenzae, Escherichia coli). We show that 30-mer-based BAC assemblies have N50 sizes >20 kbp for Drosophila and Arabidopsis and >4 kbp for human in simulations taking missing reads and wrong base calls into account. We assembled 949,974 contigs with length >50 bp, and only one single contig could not be aligned error-free against the reference sequences. We generated 36-mer reads for the genome of Helicobacter acinonychis on the Illumina 1G sequencing instrument and assembled 937 contigs covering 98% of the genome with an N50 size of 3.7 kbp. With the exception of five contigs that differ in 1–4 positions relative to the reference sequence, all contigs matched the genome error-free. Thus, SHARCGS is a suitable tool for fully exploiting novel sequencing technologies by assembling sequence contigs de novo with high confidence and by outperforming existing assembly algorithms in terms of speed and accuracy
    • 

    corecore