12 research outputs found

    Phenotype Sequencing: Identifying the Genes That Cause a Phenotype Directly from Pooled Sequencing of Independent Mutants

    Get PDF
    Random mutagenesis and phenotype screening provide a powerful method for dissecting microbial functions, but their results can be laborious to analyze experimentally. Each mutant strain may contain 50–100 random mutations, necessitating extensive functional experiments to determine which one causes the selected phenotype. To solve this problem, we propose a “Phenotype Sequencing” approach in which genes causing the phenotype can be identified directly from sequencing of multiple independent mutants. We developed a new computational analysis method showing that 1. causal genes can be identified with high probability from even a modest number of mutant genomes; 2. costs can be cut many-fold compared with a conventional genome sequencing approach via an optimized strategy of library-pooling (multiple strains per library) and tag-pooling (multiple tagged libraries per sequencing lane). We have performed extensive validation experiments on a set of E. coli mutants with increased isobutanol biofuel tolerance. We generated a range of sequencing experiments varying from 3 to 32 mutant strains, with pooling on 1 to 3 sequencing lanes. Our statistical analysis of these data (4099 mutations from 32 mutant genomes) successfully identified 3 genes (acrB, marC, acrA) that have been independently validated as causing this experimental phenotype. It must be emphasized that our approach reduces mutant sequencing costs enormously. Whereas a conventional genome sequencing experiment would have cost 7,200inreagentsalone,ourPhenotypeSequencingdesignyieldedthesameinformationvalueforonly7,200 in reagents alone, our Phenotype Sequencing design yielded the same information value for only 1200. In fact, our smallest experiments reliably identified acrB and marC at a cost of only 110110–340

    Evaluation of Clonal Hematopoiesis in Pediatric ADA-SCID Gene Therapy Participants

    No full text
    Autologous stem cell transplant with gene therapy (ASCT-GT) provides curative therapy while reducing pretransplant immune-suppressive conditioning and eliminating posttransplant immune suppression. Clonal hematopoiesis of indeterminate potential (CHIP)-associated mutations increase and telomere lengths (TLs) shorten with natural aging and DNA damaging processes. It is possible that, if CHIP is present before ASCT-GT or mutagenesis occurs after busulfan exposure, the hematopoietic stem cells carrying these somatic variants may survive the conditioning chemotherapy and have a selective reconstitution advantage, increasing the risk of hematologic malignancy and overall mortality. Seventy-four peripheral blood samples (ranging from baseline to 120 months after ASCT-GT) from 10 pediatric participants who underwent ASCT-GT for adenosine deaminase-deficient severe combined immune deficiency (ADA-SCID) after reduced-intensity conditioning with busulfan and 16 healthy controls were analyzed for TL and CHIP. One participant had a significant decrease in TL. There were no CHIP-associated mutations identified by the next-generation sequencing in any of the ADA-SCID participants. This suggests that further studies are needed to determine the utility of germline analyses in revealing the underlying genetic risk of malignancy in participants who undergo gene therapy. Although these results are promising, larger scale studies are needed to corroborate the effect of ASCT-GT on TL and CHIP. This trial was registered at www.clinicaltrials.gov as #NCT00794508

    Schematic diagram of phenotype sequencing and key parameters.

    No full text
    <p>Overview of phenotype sequencing stages: mutagenesis, screening, and sequencing. Conventional unpooled sequencing of individual strains (left), is contrasted with pooled sequencing of multiple strains per library (right), comparing the expected frequency of observation of a real mutation in each case.</p

    Modeled vs. experimental target gene yield as a function of increasing number of strains sequenced.

    No full text
    <p><b>A</b>. Bioinformatic model of expected yield for discovery of 3 target genes, as a function of increasing number of strains sequenced, plotted vs. experiment cost, assuming one lane of sequencing at a cost of 37.50persequencedstrain.<b>B</b>.Experimentallymeasuredtargetgenediscoveryyieldsasafunctionofnumberofstrainssequenced,plottedvs.experimentcost.Eachdatapointistheaverageofallsubexperimentscontainingthatnumberofstrains;theerrorbargivesthestandarderrorforthisaveragefromthatsetofsubexperiments.redline(invertedtriangles):onelaneofsequencing(32xcoverageperlibrary);blueline(+signs):threelanesofsequencing(96xcoverageperlibrary,resultinginatotalcostof37.50 per sequenced strain. <b>B</b>. Experimentally measured target gene discovery yields as a function of number of strains sequenced, plotted vs. experiment cost. Each data point is the average of all sub-experiments containing that number of strains; the error bar gives the standard error for this average from that set of sub-experiments. red line (inverted triangles): one lane of sequencing (32x coverage per library); blue line (+ signs): three lanes of sequencing (96x coverage per library, resulting in a total cost of 81.25 per strain).</p

    Effects of sequencing error and pooling on average target gene discovery yields.

    No full text
    <p><b>A</b>. The probability of reporting a SNP at a single site as a function of the mutation call threshold (read counts) assuming a coverage of c = 75, due either to sequencing error (red), or a real mutation (green), assuming a 1% sequencing error rate and a 25% true mutation fraction (i.e. library-pooling factor of P = 4). Circles indicate the expected mean read counts on each plot. <b>B</b>. The expected number of total mutation calls per genome as a function of the mutation call threshold, due either to sequencing error (red), or a real mutation (green), assuming a 4 Mb genome size. The dashed red line indicates the lowest mutation call threshold at which the number of false positive mutation calls falls below one. The dashed green line indicates the maximum mutation call threshold at which the number of false negatives remains less than one. <b>C</b>. The average number of true target genes discovered (at an FDR <0.67) as a function of the mutation call threshold, for different library-pooling levels P = 2 to P = 9, assuming sequencing of 80 mutant strains with a mutation density of 50 mutations per genome, and 20 true target genes.</p

    Effect of uniform vs. non-uniform gene size distributions on p-value scoring.

    No full text
    <p>Uniform gene-size model (blue circles, dashed line); Variable gene-size model based on subdividing the E. coli gene size distribution into ten size classes, each containing 424 genes represented by the average size within that class (green + markers); Variable gene-size model based on the exact sizes of all 4244 E coli genes (red line).</p

    Target discovery yield as a function of mutations per strain and number of strains sequenced.

    No full text
    <p><b>A</b>. For five target genes. Gray color (upper-left corner) represents discovery of all 5 targets; red  =  zero targets. <b>B</b>. For ten target genes. Gray represents discovery of all 10 targets. <b>C</b>. For twenty target genes. Gray represents discovery of all 20 targets.</p
    corecore