28 research outputs found

    PoolHap: Inferring Haplotype Frequencies from Pooled Samples by Next Generation Sequencing

    Get PDF
    With the advance of next-generation sequencing (NGS) technologies, increasingly ambitious applications are becoming feasible. A particularly powerful one is the sequencing of polymorphic, pooled samples. The pool can be naturally occurring, as in the case of multiple pathogen strains in a blood sample, multiple types of cells in a cancerous tissue sample, or multiple isoforms of mRNA in a cell. In these cases, it's difficult or impossible to partition the subtypes experimentally before sequencing, and those subtype frequencies must hence be inferred. In addition, investigators may occasionally want to artificially pool the sample of a large number of individuals for reasons of cost-efficiency, e. g., when carrying out genetic mapping using bulked segregant analysis. Here we describe PoolHap, a computational tool for inferring haplotype frequencies from pooled samples when haplotypes are known. The key insight into why PoolHap works is that the large number of SNPs that come with genome-wide coverage can compensate for the uneven coverage across the genome. The performance of PoolHap is illustrated and discussed using simulated and real data. We show that PoolHap is able to accurately estimate the proportions of haplotypes with less than 2% error for 34-strain mixtures with 2X total coverage Arabidopsis thaliana whole genome polymorphism data. This method should facilitate greater biological insight into heterogeneous samples that are difficult or impossible to isolate experimentally. Software and users manual are freely available at http://arabidopsis.gmi.oeaw.ac.at/quan/poolhap/

    TE-Locate: A Tool to Locate and Group Transposable Element Occurrences Using Paired-End Next-Generation Sequencing Data

    No full text
    Transposable elements (TEs) are common mobile DNA elements present in nearly all genomes. Since the movement of TEs within a genome can sometimes have phenotypic consequences, an accurate report of TE actions is desirable. To this end, we developed TE-Locate, a computational tool that uses paired-end reads to identify the novel locations of known TEs. TE-Locate can utilize either a database of TE sequences, or annotated TEs within the reference sequence of interest. This makes TE-Locate useful in the search for any mobile sequence, including retrotransposed gene copies. One major concern is to act on the correct hierarchy level, thereby avoiding an incorrect calling of a single insertion as multiple events of TEs with high sequence similarity. We used the (super)family level, but TE-Locate can also use any other level, right down to the individual transposable element. As an example of analysis with TE-Locate, we used the Swedish population in the 1,001 Arabidopsis genomes project, and presented the biological insights gained from the novel TEs, inducing the association between different TE superfamilies. The program is freely available, and the URL is provided in the end of the paper

    On the causes of gene-body methylation variation in Arabidopsis thaliana.

    No full text
    Gene-body methylation (gbM) refers to sparse CG methylation of coding regions, which is especially prominent in evolutionarily conserved house-keeping genes. It is found in both plants and animals, but is directly and stably (epigenetically) inherited over multiple generations in the former. Studies in Arabidopsis thaliana have demonstrated that plants originating from different parts of the world exhibit genome-wide differences in gbM, which could reflect direct selection on gbM, but which could also reflect an epigenetic memory of ancestral genetic and/or environmental factors. Here we look for evidence of such factors in F2 plants resulting from a cross between a southern Swedish line with low gbM and a northern Swedish line with high gbM, grown at two different temperatures. Using bisulfite-sequencing data with nucleotide-level resolution on hundreds of individuals, we confirm that CG sites are either methylated (nearly 100% methylation across sampled cells) or unmethylated (approximately 0% methylation across sampled cells), and show that the higher level of gbM in the northern line is due to more sites being methylated. Furthermore, methylation variants almost always show Mendelian segregation, consistent with their being directly and stably inherited through meiosis. To explore how the differences between the parental lines could have arisen, we focused on somatic deviations from the inherited state, distinguishing between gains (relative to the inherited 0% methylation) and losses (relative to the inherited 100% methylation) at each site in the F2 generation. We demonstrate that deviations predominantly affect sites that differ between the parental lines, consistent with these sites being more mutable. Gains and losses behave very differently in terms of the genomic distribution, and are influenced by the local chromatin state. We find clear evidence for different trans-acting genetic polymorphism affecting gains and losses, with those affecting gains showing strong environmental interactions (GĂ—E). Direct effects of the environment were minimal. In conclusion, we show that genetic and environmental factors can change gbM at a cellular level, and hypothesize that these factors can also lead to transgenerational differences between individuals via the inclusion of such changes in the zygote. If true, this could explain genographic pattern of gbM with selection, and would cast doubt on estimates of epimutation rates from inbred lines in constant environments

    Comparison of univariate and conditional GWAS of mCHG.

    No full text
    The analysis was done separately for (A) RdDM-targeted and (B) CMT2-targeted transposons, using the 774 lines in the global panel from the 1001 Epigenomes Project (“SALK leaf in ambient temperature”; see Fig 1). For each case, the upper Manhattan plot shows univariate GWAS of mCHG and the lower GWAS of mCHG controlling for mCHH. Horizontal gray lines show genome-wide significance (p = 0.05 after Bonferroni-correction). The line plots show enrichment of a priori genes and FDR (see text), with horizontal dashed lines indicating an FDR of 20%.</p

    Genetic variation around <i>MSI1 and ROS3</i>.

    No full text
    (A) Zoom-in Manhattan plots (Fig 3) and the genome structure around Chr5:23553506, 23555910 (top), and 23522001 (bottom) illustrated by mapped short-read DNA-seq data (IGV browser). Vertical colored lines in the IGV plots show SNPs. (B) Conditional GWAS for mCHG in RdDM- and CMT2-targeted transposons. mCHH and Chr5:23555910 were both used as co-factors. Gray vertical lines indicate the Chr5:2355910 position, and horizontal lines show the genome-wide significance (p = 0.05 by Bonferroni correction). r2 was calculated from chr5:23553506 and chr5:23555910 for mCHGRdDM and mCHGCMT2, respectively. (PDF)</p

    Distribution of p-values for two GWAS models.

    No full text
    QQ plots for univariate models for mCHG levels (A) and conditional models for mCHG|mCHH (B) in RdDM-targeted transposons (left) and CMT2-targeted transposons (right). (PDF)</p

    Unorthodox mRNA start site to extend the highly structured leader of retrotransposon Tto1 mRNA increases transposition rate

    Get PDF
    Retroelement RNAs serve as templates for both translation and reverse transcription into extrachromosomal DNA. DNA copies may be inserted into the host genome to multiply element sequences. This transpositional activity of retroelements is usually restricted to specific conditions, particularly to conditions that impose stress on the host organism. In this work, we examined how the mRNA initiation point, and features of primary and secondary structure, of tobacco retrotransposon Tto1 RNA influence its transpositional activity. We found that the most abundant Tto1 RNA is not a substrate for reverse transcription. It is poorly translated, and its 5′-end does not contain a region of redundancy with the most prominent 3′-end. In contrast, expression of an mRNA with the 5′-end extended by 28 nucleotides allows translation and gives rise to transposition events in the heterologous host, Arabidopsis thaliana. In addition, the presence of extended hairpins and of two short open reading frames in the 5′-leader sequence of Tto1 mRNA suggests that translation does not involve ribosome scanning from the mRNA 5′-end to the translation initiation site

    GMI1, a structural-maintenance-of-chromosomes-hinge domain-containing protein, is involved in somatic homologous recombination in Arabidopsis

    No full text
    DNA double-strand breaks (DSBs) pose one of the most severe threats to genome integrity, potentially leading to cell death. After detection of a DSB, the DNA damage and repair response is initiated and the DSB is repaired by non-homologous end joining and/or homologous recombination. Many components of these processes are still unknown in Arabidopsis thaliana. In this work, we characterized Îł-irradiation and mitomycin C induced 1 (GMI1), a member of the SMC-hinge domain-containing protein family. RT-PCR analysis and promoter-GUS fusion studies showed that Îł-irradiation, the radio-mimetic drug bleocin, and the DNA cross-linking agent mitomycin C strongly enhance GMI1 expression particularly in meristematic tissues. The induction of GMI1 by Îł-irradiation depends on the signalling kinase Ataxia telangiectasia-mutated (ATM) but not on ATM and Rad3-related (ATR). Epistasis analysis of single and double mutants demonstrated that ATM acts upstream of GMI1 while the atr gmi1-2 double mutant was more sensitive than the respective single mutants. Comet assay revealed a reduced rate of DNA double-strand break repair in gmi1 mutants during the early recovery phase after exposure to bleocin. Moreover, the rate of homologous recombination of a reporter construct was strongly reduced in gmi1 mutant plants upon exposure to bleocin or mitomycin C. GMI1 is the first member of its protein family known to be involved in DNA repair
    corecore