12 research outputs found

    Detecting Disease-Causing Mutations in the Human Genome by Haplotype Matching

    Get PDF
    Comparisons between haplotypes from affected patients and the human reference genome are frequently used to identify candidates for disease-causing mutations, even though these alignments are expected to reveal a high level of background neutral polymorphism. This limits the scope of genetic studies to relatively small genomic intervals, because current methods for distinguishing potential causal mutations from neutral variation are inefficient. Here we describe a new strategy for detecting mutations that is based on comparing affected haplotypes with closely matched control sequences from healthy individuals, rather than with the human reference genome. We use theory, simulation, and a real data set to show that this approach is expected to reduce the number of sequence variants that must be subjected to follow-up analysis by at least a factor of 20 when closely matched control sequences are selected from a reference panel with as few as 100 control genomes. We also define a reference data resource that would allow efficient application of this strategy to large critical intervals across the genome

    GC-Rich DNA Elements Enable Replication Origin Activity in the Methylotrophic Yeast <i>Pichia pastoris</i>

    No full text
    <div><p>The well-studied DNA replication origins of the model budding and fission yeasts are A/T-rich elements. However, unlike their yeast counterparts, both plant and metazoan origins are G/C-rich and are associated with transcription start sites. Here we show that an industrially important methylotrophic budding yeast, <i>Pichia pastoris</i>, simultaneously employs at least two types of replication origins—a G/C-rich type associated with transcription start sites and an A/T-rich type more reminiscent of typical budding and fission yeast origins. We used a suite of massively parallel sequencing tools to map and dissect <i>P. pastoris</i> origins comprehensively, to measure their replication dynamics, and to assay the global positioning of nucleosomes across the genome. Our results suggest that some functional overlap exists between promoter sequences and G/C-rich replication origins in <i>P. pastoris</i> and imply an evolutionary bifurcation of the modes of replication initiation.</p></div

    Profiling of accessible chromatin regions across multiple plant species and cell types reveals common gene regulatory principles and new control modules

    No full text
    The transcriptional regulatory structure of plant genomes remains poorly defined relative to animals. It is unclear how many cis-regulatory elements exist, where these elements lie relative to promoters, and how these features are conserved across plant species. We employed the Assay for Transposase-Accessible Chromatin (ATAC-seq) in four plant species (Arabidopsis thaliana, Medicago truncatula, Solanum lycopersicum, and Oryza sativa) to delineate open chromatin regions and transcription factor (TF) binding sites across each genome. Despite 10-fold variation in intergenic space among species, the majority of open chromatin regions lie within 3 kb upstream of a transcription start site in all species. We find a common set of four TFs that appear to regulate conserved gene sets in the root tips of all four species, suggesting that TF-gene networks are generally conserved. Comparative ATAC-seq profiling of Arabidopsis root hair and non-hair cell types revealed extensive similarity as well as many cell type-specific differences. Analyzing TF binding sites in differentially accessible regions identified a MYB-driven regulatory module unique to the hair cell, which appears to control both cell fate regulators and abiotic stress responses. Our analyses revealed common regulatory principles among species and shed light on the mechanisms producing cell type-specific transcriptomes during development

    Profiling of accessible chromatin regions across multiple plant species and cell types reveals common gene regulatory principles and new control modules

    No full text
    The transcriptional regulatory structure of plant genomes remains poorly defined relative to animals. It is unclear how many cis-regulatory elements exist, where these elements lie relative to promoters, and how these features are conserved across plant species. We employed the Assay for Transposase-Accessible Chromatin (ATAC-seq) in four plant species (Arabidopsis thaliana, Medicago truncatula, Solanum lycopersicum, and Oryza sativa) to delineate open chromatin regions and transcription factor (TF) binding sites across each genome. Despite 10-fold variation in intergenic space among species, the majority of open chromatin regions lie within 3 kb upstream of a transcription start site in all species. We find a common set of four TFs that appear to regulate conserved gene sets in the root tips of all four species, suggesting that TF-gene networks are generally conserved. Comparative ATAC-seq profiling of Arabidopsis root hair and non-hair cell types revealed extensive similarity as well as many cell type-specific differences. Analyzing TF binding sites in differentially accessible regions identified a MYB-driven regulatory module unique to the hair cell, which appears to control both cell fate regulators and abiotic stress responses. Our analyses revealed common regulatory principles among species and shed light on the mechanisms producing cell type-specific transcriptomes during development

    Replication timing of the <i>P. pastoris</i> genome.

    No full text
    <p>(A) Genomic DNA from G1 and S phase cells was sheared and sequenced. Normalized S/G1 DNA copy ratios (in 1 kbp windows) were smoothed and plotted against chromosomal coordinates. Peaks correspond to positions of replication initiation. The profile of chromosome 4 is shown (all chromosomes are shown in <a href="http://www.plosgenetics.org/article/info:doi/10.1371/journal.pgen.1004169#pgen.1004169.s006" target="_blank">Figure S6</a>) with ARS locations indicated by open (AT-ARSs) and shaded (GC-ARSs) circles. Un-smoothed ratio data for one of the replicates is shown are grey. Coordinates of replication timing peaks are indicated by dashed vertical lines. (B) The distributions of smoothed S/G1 ratio data. The distribution of all ratios (“Genome”) is shown adjacent to the distribution of values at bins containing midpoints of GC-ACSs (“GC”) or AT-ARSs (“AT”). Values for ARSs that have no other ARSs within 40 kb in both directions are shown on the right (“isolated”). (C) The complete genomic ratio distribution is shown relative to distributions after removal of data within 60 kb ranges centered on AT-ARSs (“AT”), GC-ARSs (“GC”), or all ARSs (“all ARS”). (D) For each ARS, the distance to the nearest replication peak was calculated. The ARS-peak distances are shown as distributions separately for GC-ARSs (blue) and AT-ARSs (orange). Peak distances from simulated random sets of loci are shown in grey.</p

    Sequence features of GC-ARSs.

    No full text
    <p>(A) Average nucleotide frequencies around 107 GC-ARS sites (top) and twenty-eight non-ARS intergenic occurrences of the GC-ACS (bottom), centered on the best match of the GC-ACS. The nucleotide frequencies are calculated at all flanking regions around the motif independent of whether the flanking region is present in ARS contigs or cores. (B) The distribution of distances between the GC-ACS motif (in the orientation shown) and the TSS for adjacent genes transcribing away from the ARS with available TSS annotations. Distances to the 5′ side of the motif are shown in blue; distances to the 3′ side of the motif are shown in red. (C) The distribution of sequence lengths between the GC-ACS and the end of the inferred functional core region for each GC-ARS. The 5′ distance is indicated in blue; the 3′ distance is indicated in red. Numbers indicate the upper limit of the bin.</p

    The GC-ACS is required for GC-ARS function.

    No full text
    <p>Wild type (WT) and mutant (MUT) alleles of the twelve ARSs indicated were cloned into a <i>URA3</i> ARS-less vector and used to transform <i>ura3</i> yeast on selective medium plates lacking uracil. Plates were grown at 30°C for five days before pictures were taken. Colony formation indicates plasmid maintenance and ARS activity. The GC-ACS was positioned <15 bp away from the 5′ endpoint in all ARS sequences. The sequences of the fragments tested are listed in <a href="http://www.plosgenetics.org/article/info:doi/10.1371/journal.pgen.1004169#pgen.1004169.s012" target="_blank">Table S4</a>.</p

    Mapping and Dynamics of Regulatory DNA and Transcription Factor Networks in A. thaliana

    Get PDF
    Our understanding of gene regulation in plants is constrained by our limited knowledge of plant cis-regulatory DNA and its dynamics. We mapped DNase I hypersensitive sites (DHSs) in A. thaliana seedlings and used genomic footprinting to delineate ∼700,000 sites of in vivo transcription factor (TF) occupancy at nucleotide resolution. We show that variation associated with 72 diverse quantitative phenotypes localizes within DHSs. TF footprints encode an extensive cis-regulatory lexicon subject to recent evolutionary pressures, and widespread TF binding within exons may have shaped codon usage patterns. The architecture of A. thaliana TF regulatory networks is strikingly similar to that of animals in spite of diverged regulatory repertoires. We analyzed regulatory landscape dynamics during heat shock and photomorphogenesis, disclosing thousands of environmentally sensitive elements and enabling mapping of key TF regulatory circuits underlying these fundamental responses. Our results provide an extensive resource for the study of A. thaliana gene regulation and functional biology
    corecore