37 research outputs found

    Evaluation of genomic high-throughput sequencing data generated on Illumina HiSeq and Genome Analyzer systems

    Get PDF
    ABSTRACT: BACKGROUND: The generation and analysis of high-throughput sequencing data are becoming a major component of many studies in molecular biology and medical research. Illumina's Genome Analyzer (GA) and HiSeq instruments are currently the most widely used sequencing devices. Here, we comprehensively evaluate properties of genomic HiSeq and GAIIx data derived from two plant genomes and one virus, with read lengths of 95 to 150 bases. RESULTS: We provide quantifications and evidence for GC bias, error rates, error sequence context, effects of quality filtering, and the reliability of quality values. By combining different filtering criteria we reduced error rates 7-fold at the expense of discarding 12.5% of alignable bases. While overall error rates are low in HiSeq data we observed regions of accumulated wrong base calls. Only 3% of all error positions accounted for 24.7% of all substitution errors. Analyzing the forward and reverse strands separately revealed error rates of up to 18.7%. Insertions and deletions occurred at very low rates on average but increased to up to 2% in homopolymers. A positive correlation between read coverage and GC content was found depending on the GC content range. CONCLUSIONS: The errors and biases we report have implications for the use and the interpretation of Illumina sequencing data. GAIIx and HiSeq data sets show slightly different error profiles. Quality filtering is essential to minimize downstream analysis artifacts. Supporting previous recommendations, the strand-specificity provides a criterion to distinguish sequencing errors from low abundance polymorphisms

    p53 Gene Repair with Zinc Finger Nucleases Optimised by Yeast 1-Hybrid and Validated by Solexa Sequencing

    Get PDF
    The tumor suppressor gene p53 is mutated or deleted in over 50% of human tumors. As functional p53 plays a pivotal role in protecting against cancer development, several strategies for restoring wild-type (wt) p53 function have been investigated. In this study, we applied an approach using gene repair with zinc finger nucleases (ZFNs). We adapted a commercially-available yeast one-hybrid (Y1H) selection kit to allow rapid building and optimization of 4-finger constructs from randomized PCR libraries. We thus generated novel functional zinc finger nucleases against two DNA sites in the human p53 gene, near cancer mutation ‘hotspots’. The ZFNs were first validated using in vitro cleavage assays and in vivo episomal gene repair assays in HEK293T cells. Subsequently, the ZFNs were used to restore wt-p53 status in the SF268 human cancer cell line, via ZFN-induced homologous recombination. The frequency of gene repair and mutation by non-homologous end-joining was then ascertained in several cancer cell lines, using a deep sequencing strategy. Our Y1H system facilitates the generation and optimisation of novel, sequence-specific four- to six-finger peptides, and the p53-specific ZFN described here can be used to mutate or repair p53 in genomic loci

    The genome of sugar beet (Beta vulgaris) : assembly, annotation and interpretation of a complex plant genome

    No full text
    Minoche AE. The genome of sugar beet (Beta vulgaris) : assembly, annotation and interpretation of a complex plant genome. Bielefeld; 2013

    Architecture and evolution of a minute plant genome

    No full text
    Supplementary materials files: supplementary information: online appendix; replication dataIt has been argued that the evolution of plant genome size is principally unidirectional and increasing owing to the varied action of whole-genome duplications (WGDs) and mobile element proliferation1. However, extreme genome size reductions have been reported in the angiosperm family tree. Here we report the sequence of the 82-megabase genome of the carnivorous bladderwort plant Utricularia gibba. Despite its tiny size, the U. gibba genome accommodates a typical number of genes for a plant, with the main difference from other plant genomes arising from a drastic reduction in non-genic DNA. Unexpectedly, we identified at least three rounds of WGD in U. gibba since common ancestry with tomato (Solanum) and grape (Vitis). The compressed architecture of the U. gibba genome indicates that a small fraction of intergenic DNA, with few or no active retrotransposons, is sufficient to regulate and integrate all the processes required for the development and reproduction of a complex organism

    Profiling of extensively diversified plant LINEs reveals distinct plant-specific subclades

    No full text
    Heitkam T, Holtgräwe D, Dohm JC, et al. Profiling of extensively diversified plant LINEs reveals distinct plant-specific subclades. The Plant Journal. 2014;79(3):385-397.A large fraction of eukaryotic genomes is made up of long interspersed nuclear elements (LINEs). Due to their capability to create novel copies via error-prone reverse transcription, they generate multiple families and reach high copy numbers. Although mammalian LINEs are well-described, plant LINEs are only poorly investigated. Here, we present a systematic cross-species survey of LINEs in higher plant genomes shedding light on plant LINE evolution as well as diversity, and facilitating their annotation in genome projects. Applying a Hidden Markov Model-based analysis, 59,390 intact LINE reverse transcriptases (RTs) have been extracted from 23 plant genomes. These fall in only two out of 28 LINE clades (L1 and RTE) known in eukaryotes. While plant RTE LINEs are highly homogenous and mostly constitute only a single family per genome, plant L1 LINEs are extremely diverse and form numerous families. Despite their heterogeneity, all members across the 23 species fall into only seven L1 subclades, some of them defined here. Exemplarily focusing on the L1 LINEs of a basal reference plant genome (Beta vulgaris), we show that the subclade classification level does not only reflect RT sequence similarity, but also mirrors structural aspects of complete LINE retrotransposons, like element size, position and type of encoded enzymatic domains. Our comprehensive catalogue of plant LINE RTs serves the classification of highly diverse plant LINEs, while the provided subclade-specific HMMs facilitate their annotation

    Cytosine Methylation of an Ancient Satellite Family in the Wild Beet Beta procumbens

    Get PDF
    DNA methylation is an essential epigenetic feature for the regulation and maintenance of heterochromatin. Satellite DNA is a repetitive sequence component that often occurs in large arrays in heterochromatin of subtelomeric, intercalary and centromeric regions. Knowledge about the methylation status of satellite DNA is important for understanding the role of repetitive DNA in heterochromatization. In this study, we investigated the cytosine methylation of the ancient satellite family pEV in the wild beet Beta procumbens. The pEV satellite is widespread in species-specific pEV subfamilies in the genus Beta and most likely originated before the radiation of the Betoideae and Chenopodioideae. In B. procumbens , the pEV subfamily occurs abundantly and spans intercalary and centromeric regions. To uncover its cytosine methylation, we performed chromosome-wide immunostaining and bisulfite sequencing of pEV satellite repeats. We found that CG and CHG sites are highly methylated while CHH sites show only low levels of methylation. As a consequence of the low frequency of CG and CHG sites and the preferential occurrence of most cytosines in the CHH motif in pEV monomers, this satellite family displays only low levels of total cytosine methylation

    Differential Expression Patterns of Non-symbiotic Hemoglobins in Sugar beet (Beta vulgaris ssp. vulgaris).

    No full text
    Biennial sugar beet (Beta vulgaris spp. vulgaris) is a Caryophyllidae that has adapted its growth cycle to the seasonal temperature and day length variation of temperate regions. This is the first time a holistic study of the expression pattern of non-symbiotic hemoglobins (nsHbs) is being carried out in a member of this group and under two essential environmental conditions for flowering, namely vernalization and length of photoperiod. BvHbs were identified by sequence homology searches against the latest draft of the sugar beet genome. Three nsHbs (BvHb1.1, BvHb1.2, and BvHb2) and one truncated Hb (BvHb3) were found in the genome of sugar beet. Gene expression profiling of the nsHbs was carried out by quantitative PCR in different organs and developmental stages as well as during vernalization and under different photoperiods. BvHb1.1 and BvHb2 showed differential expression during vernalization as well as during long and short days. The high expression of BvHb2 indicates that it has an active role in the cell, maybe even taking over some BvHb1.2 functions, except during germination where BvHb1.2 together with BvHb1.1 -both class 1 nsHbs- are highly expressed. The unprecedented finding of a leading peptide at the N-terminus of BvHb1.1, an nsHb from higher plants together with its observed expression indicate that it may have a very specific role due to its suggested location in chloroplasts. Our findings open up new possibilities for research, breeding and engineering since Hbs could be more involved in plant development than previously was anticipated

    Highly diverse chromoviruses of Beta vulgaris are classified by chromodomains and chromosomal integration

    Get PDF
    Weber B, Heitkam T, Holtgräwe D, et al. Highly diverse chromoviruses of Beta vulgaris are classified by chromodomains and chromosomal integration. Mobile DNA. 2013;4(1): 8.BACKGROUND: Chromoviruses are one of the three genera of Ty3-gypsy long terminal repeat (LTR) retrotransposons, and are present in high copy numbers in plant genomes. They are widely distributed within the plant kingdom, with representatives even in lower plants such as green and red algae. Their hallmark is the presence of a chromodomain at the C-terminus of the integrase. The chromodomain exhibits structural characteristics similar to proteins of the heterochromatin protein 1 (HP1) family, which mediate the binding of each chromovirus type to specific histone variants. A specific integration via the chromodomain has been shown for only a few chromoviruses. However, a detailed study of different chromoviral clades populating a single plant genome has not yet been carried out. RESULTS: We conducted a comprehensive survey of chromoviruses within the Beta vulgaris (sugar beet) genome, and found a highly diverse chromovirus population, with significant differences in element size, primarily caused by their flanking LTRs. In total, we identified and annotated full-length members of 16 families belonging to the four plant chromoviral clades: CRM, Tekay, Reina, and Galadriel. The families within each clade are structurally highly conserved; in particular, the position of the chromodomain coding region relative to the polypurine tract is clade-specific. Two distinct groups of chromodomains were identified. The group II chromodomain was present in three chromoviral clades, whereas families of the CRM clade contained a more divergent motif. Physical mapping using representatives of all four clades identified a clade-specific integration pattern. For some chromoviral families, we detected the presence of expressed sequence tags, indicating transcriptional activity. CONCLUSIONS: We present a detailed study of chromoviruses, belonging to the four major clades, which populate a single plant genome. Our results illustrate the diversity and family structure of B. vulgaris chromoviruses, and emphasize the role of chromodomains in the targeted integration of these viruses. We suggest that the diverse sets of plant chromoviruses with their different localization patterns might help to facilitate plant-genome organization in a structural and functional manner

    Exploiting single-molecule transcript sequencing for eukaryotic gene prediction

    No full text
    We develop a method to predict and validate gene models using PacBio single-molecule, real-time (SMRT) cDNA reads. Ninety-eight percent of full-insert SMRT reads span complete open reading frames. Gene model validation using SMRT reads is developed as automated process. Optimized training and prediction settings and mRNA-seq noise reduction of assisting Illumina reads results in increased gene prediction sensitivity and precision. Additionally, we present an improved gene set for sugar beet (Beta vulgaris) and the first genome-wide gene set for spinach (Spinacia oleracea). The workflow and guidelines are a valuable resource to obtain comprehensive gene sets for newly sequenced genomes of non-model eukaryotes
    corecore