30 research outputs found

    Evaluation of genomic high-throughput sequencing data generated on Illumina HiSeq and Genome Analyzer systems

    Get PDF
    ABSTRACT: BACKGROUND: The generation and analysis of high-throughput sequencing data are becoming a major component of many studies in molecular biology and medical research. Illumina's Genome Analyzer (GA) and HiSeq instruments are currently the most widely used sequencing devices. Here, we comprehensively evaluate properties of genomic HiSeq and GAIIx data derived from two plant genomes and one virus, with read lengths of 95 to 150 bases. RESULTS: We provide quantifications and evidence for GC bias, error rates, error sequence context, effects of quality filtering, and the reliability of quality values. By combining different filtering criteria we reduced error rates 7-fold at the expense of discarding 12.5% of alignable bases. While overall error rates are low in HiSeq data we observed regions of accumulated wrong base calls. Only 3% of all error positions accounted for 24.7% of all substitution errors. Analyzing the forward and reverse strands separately revealed error rates of up to 18.7%. Insertions and deletions occurred at very low rates on average but increased to up to 2% in homopolymers. A positive correlation between read coverage and GC content was found depending on the GC content range. CONCLUSIONS: The errors and biases we report have implications for the use and the interpretation of Illumina sequencing data. GAIIx and HiSeq data sets show slightly different error profiles. Quality filtering is essential to minimize downstream analysis artifacts. Supporting previous recommendations, the strand-specificity provides a criterion to distinguish sequencing errors from low abundance polymorphisms

    The genome of sugar beet (Beta vulgaris) : assembly, annotation and interpretation of a complex plant genome

    No full text
    Minoche AE. The genome of sugar beet (Beta vulgaris) : assembly, annotation and interpretation of a complex plant genome. Bielefeld; 2013

    Differential Expression Patterns of Non-symbiotic Hemoglobins in Sugar beet (Beta vulgaris ssp. vulgaris).

    No full text
    Biennial sugar beet (Beta vulgaris spp. vulgaris) is a Caryophyllidae that has adapted its growth cycle to the seasonal temperature and day length variation of temperate regions. This is the first time a holistic study of the expression pattern of non-symbiotic hemoglobins (nsHbs) is being carried out in a member of this group and under two essential environmental conditions for flowering, namely vernalization and length of photoperiod. BvHbs were identified by sequence homology searches against the latest draft of the sugar beet genome. Three nsHbs (BvHb1.1, BvHb1.2, and BvHb2) and one truncated Hb (BvHb3) were found in the genome of sugar beet. Gene expression profiling of the nsHbs was carried out by quantitative PCR in different organs and developmental stages as well as during vernalization and under different photoperiods. BvHb1.1 and BvHb2 showed differential expression during vernalization as well as during long and short days. The high expression of BvHb2 indicates that it has an active role in the cell, maybe even taking over some BvHb1.2 functions, except during germination where BvHb1.2 together with BvHb1.1 -both class 1 nsHbs- are highly expressed. The unprecedented finding of a leading peptide at the N-terminus of BvHb1.1, an nsHb from higher plants together with its observed expression indicate that it may have a very specific role due to its suggested location in chloroplasts. Our findings open up new possibilities for research, breeding and engineering since Hbs could be more involved in plant development than previously was anticipated

    Profiling of extensively diversified plant LINEs reveals distinct plant-specific subclades

    No full text
    Heitkam T, HoltgrÀwe D, Dohm JC, et al. Profiling of extensively diversified plant LINEs reveals distinct plant-specific subclades. The Plant Journal. 2014;79(3):385-397.A large fraction of eukaryotic genomes is made up of long interspersed nuclear elements (LINEs). Due to their capability to create novel copies via error-prone reverse transcription, they generate multiple families and reach high copy numbers. Although mammalian LINEs are well-described, plant LINEs are only poorly investigated. Here, we present a systematic cross-species survey of LINEs in higher plant genomes shedding light on plant LINE evolution as well as diversity, and facilitating their annotation in genome projects. Applying a Hidden Markov Model-based analysis, 59,390 intact LINE reverse transcriptases (RTs) have been extracted from 23 plant genomes. These fall in only two out of 28 LINE clades (L1 and RTE) known in eukaryotes. While plant RTE LINEs are highly homogenous and mostly constitute only a single family per genome, plant L1 LINEs are extremely diverse and form numerous families. Despite their heterogeneity, all members across the 23 species fall into only seven L1 subclades, some of them defined here. Exemplarily focusing on the L1 LINEs of a basal reference plant genome (Beta vulgaris), we show that the subclade classification level does not only reflect RT sequence similarity, but also mirrors structural aspects of complete LINE retrotransposons, like element size, position and type of encoded enzymatic domains. Our comprehensive catalogue of plant LINE RTs serves the classification of highly diverse plant LINEs, while the provided subclade-specific HMMs facilitate their annotation

    Cytosine Methylation of an Ancient Satellite Family in the Wild Beet Beta procumbens

    Get PDF
    DNA methylation is an essential epigenetic feature for the regulation and maintenance of heterochromatin. Satellite DNA is a repetitive sequence component that often occurs in large arrays in heterochromatin of subtelomeric, intercalary and centromeric regions. Knowledge about the methylation status of satellite DNA is important for understanding the role of repetitive DNA in heterochromatization. In this study, we investigated the cytosine methylation of the ancient satellite family pEV in the wild beet Beta procumbens. The pEV satellite is widespread in species-specific pEV subfamilies in the genus Beta and most likely originated before the radiation of the Betoideae and Chenopodioideae. In B. procumbens , the pEV subfamily occurs abundantly and spans intercalary and centromeric regions. To uncover its cytosine methylation, we performed chromosome-wide immunostaining and bisulfite sequencing of pEV satellite repeats. We found that CG and CHG sites are highly methylated while CHH sites show only low levels of methylation. As a consequence of the low frequency of CG and CHG sites and the preferential occurrence of most cytosines in the CHH motif in pEV monomers, this satellite family displays only low levels of total cytosine methylation

    Highly diverse chromoviruses of Beta vulgaris are classified by chromodomains and chromosomal integration

    Get PDF
    Weber B, Heitkam T, HoltgrÀwe D, et al. Highly diverse chromoviruses of Beta vulgaris are classified by chromodomains and chromosomal integration. Mobile DNA. 2013;4(1): 8.BACKGROUND: Chromoviruses are one of the three genera of Ty3-gypsy long terminal repeat (LTR) retrotransposons, and are present in high copy numbers in plant genomes. They are widely distributed within the plant kingdom, with representatives even in lower plants such as green and red algae. Their hallmark is the presence of a chromodomain at the C-terminus of the integrase. The chromodomain exhibits structural characteristics similar to proteins of the heterochromatin protein 1 (HP1) family, which mediate the binding of each chromovirus type to specific histone variants. A specific integration via the chromodomain has been shown for only a few chromoviruses. However, a detailed study of different chromoviral clades populating a single plant genome has not yet been carried out. RESULTS: We conducted a comprehensive survey of chromoviruses within the Beta vulgaris (sugar beet) genome, and found a highly diverse chromovirus population, with significant differences in element size, primarily caused by their flanking LTRs. In total, we identified and annotated full-length members of 16 families belonging to the four plant chromoviral clades: CRM, Tekay, Reina, and Galadriel. The families within each clade are structurally highly conserved; in particular, the position of the chromodomain coding region relative to the polypurine tract is clade-specific. Two distinct groups of chromodomains were identified. The group II chromodomain was present in three chromoviral clades, whereas families of the CRM clade contained a more divergent motif. Physical mapping using representatives of all four clades identified a clade-specific integration pattern. For some chromoviral families, we detected the presence of expressed sequence tags, indicating transcriptional activity. CONCLUSIONS: We present a detailed study of chromoviruses, belonging to the four major clades, which populate a single plant genome. Our results illustrate the diversity and family structure of B. vulgaris chromoviruses, and emphasize the role of chromodomains in the targeted integration of these viruses. We suggest that the diverse sets of plant chromoviruses with their different localization patterns might help to facilitate plant-genome organization in a structural and functional manner

    Exploiting single-molecule transcript sequencing for eukaryotic gene prediction

    No full text
    We develop a method to predict and validate gene models using PacBio single-molecule, real-time (SMRT) cDNA reads. Ninety-eight percent of full-insert SMRT reads span complete open reading frames. Gene model validation using SMRT reads is developed as automated process. Optimized training and prediction settings and mRNA-seq noise reduction of assisting Illumina reads results in increased gene prediction sensitivity and precision. Additionally, we present an improved gene set for sugar beet (Beta vulgaris) and the first genome-wide gene set for spinach (Spinacia oleracea). The workflow and guidelines are a valuable resource to obtain comprehensive gene sets for newly sequenced genomes of non-model eukaryotes

    Yield of clinically reportable genetic variants in unselected cerebral palsy by whole genome sequencing

    Get PDF
    Cerebral palsy (CP) is the most common cause of childhood physical disability, with incidence between 1/500 and 1/700 births in the developed world. Despite increasing evidence for a major contribution of genetics to CP aetiology, genetic testing is currently not performed systematically. We assessed the diagnostic rate of genome sequencing (GS) in a clinically unselected cohort of 150 singleton CP patients, with CP confirmed at >4 years of age. Clinical grade GS was performed on the proband and variants were filtered, and classified according to American College of Medical Genetics and Genomics-Association for Molecular Pathology (ACMG-AMP) guidelines. Variants classified as pathogenic or likely pathogenic (P/LP) were further assessed for their contribution to CP. In total, 24.7% of individuals carried a P/LP variant(s) causing or increasing risk of CP, with 4.7% resolved by copy number variant analysis and 20% carrying single nucleotide or indel variants. A further 34.7% carried one or more rare, high impact variants of uncertain significance (VUS) in variation intolerant genes. Variants were identified in a heterogeneous group of genes, including genes associated with hereditary spastic paraplegia, clotting and thrombophilic disorders, small vessel disease, and other neurodevelopmental disorders. Approximately 1/2 of individuals were classified as likely to benefit from changed clinical management as a result of genetic findings. In addition, no significant association between genetic findings and clinical factors was detectable in this cohort, suggesting that systematic sequencing of CP will be required to avoid missed diagnoses

    Exploiting single-molecule transcript sequencing for eukaryotic gene prediction

    Get PDF
    We develop a method to predict and validate gene models using PacBio single-molecule, real-time (SMRT) cDNA reads. Ninety-eight percent of full-insert SMRT reads span complete open reading frames. Gene model validation using SMRT reads is developed as automated process. Optimized training and prediction settings and mRNA-seq noise reduction of assisting Illumina reads results in increased gene prediction sensitivity and precision. Additionally, we present an improved gene set for sugar beet (Beta vulgaris) and the first genome-wide gene set for spinach (Spinacia oleracea). The workflow and guidelines are a valuable resource to obtain comprehensive gene sets for newly sequenced genomes of non-model eukaryotes
    corecore