Search CORE

16 research outputs found

Percent of reads mapping to Prevotella and Human (background) sequences (15 Mbp sequenced).

Author: Dustin Brisson (210833)
Jane A. Dwivedi-Yu (15307239)
Matthew W. Mitchell (14817749)
Yun S. Song (14817761)
Zachary J. Oppler (7436867)
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 17/04/2023
Field of study

Percent of reads mapping to Prevotella and Human (background) sequences (15 Mbp sequenced).</p

FigShare

P. melaninogenica is a more difficult genome to design primers for than M. tuberculosis.

Author: Dustin Brisson (210833)
Jane A. Dwivedi-Yu (15307239)
Matthew W. Mitchell (14817749)
Yun S. Song (14817761)
Zachary J. Oppler (7436867)
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 17/04/2023
Field of study

Searching the two genomes with the same parameters during Stages 2 and 3 produces much larger lists of candidate primers for M. tuberculosis than for P. melaninogenica. Values are the number of candidate primers remaining after Stage 3 (parenthetical numbers are the candidate primers remaining after Stage 2).</p

FigShare

Primer set search algorithm.

Author: Dustin Brisson (210833)
Jane A. Dwivedi-Yu (15307239)
Matthew W. Mitchell (14817749)
Yun S. Song (14817761)
Zachary J. Oppler (7436867)
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 17/04/2023
Field of study

Addressing many of the major outstanding questions in the fields of microbial evolution and pathogenesis will require analyses of populations of microbial genomes. Although population genomic studies provide the analytical resolution to investigate evolutionary and mechanistic processes at fine spatial and temporal scales—precisely the scales at which these processes occur—microbial population genomic research is currently hindered by the practicalities of obtaining sufficient quantities of the relatively pure microbial genomic DNA necessary for next-generation sequencing. Here we present swga2.0, an optimized and parallelized pipeline to design selective whole genome amplification (SWGA) primer sets. Unlike previous methods, swga2.0 incorporates active and machine learning methods to evaluate the amplification efficacy of individual primers and primer sets. Additionally, swga2.0 optimizes primer set search and evaluation strategies, including parallelization at each stage of the pipeline, to dramatically decrease program runtime. Here we describe the swga2.0 pipeline, including the empirical data used to identify primer and primer set characteristics, that improve amplification performance. Additionally, we evaluate the novel swga2.0 pipeline by designing primers sets that successfully amplify Prevotella melaninogenica, an important component of the lung microbiome in cystic fibrosis patients, from samples dominated by human DNA.</div

FigShare

Poor performing primers are filtered out in Stage 3 of swga2.0.

Author: Dustin Brisson (210833)
Jane A. Dwivedi-Yu (15307239)
Matthew W. Mitchell (14817749)
Yun S. Song (14817761)
Zachary J. Oppler (7436867)
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 17/04/2023
Field of study

Higher threshold values in the random forest regression model filter greater proportions of lower-amplification primers but few moderate or efficient primers.</p

FigShare

Differences between swga1.0 and swga2.0.

Author: Dustin Brisson (210833)
Jane A. Dwivedi-Yu (15307239)
Matthew W. Mitchell (14817749)
Yun S. Song (14817761)
Zachary J. Oppler (7436867)
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 17/04/2023
Field of study

FigShare

The final random forest regression model reliably predicts the amplification efficacy of individual primers.

Author: Dustin Brisson (210833)
Jane A. Dwivedi-Yu (15307239)
Matthew W. Mitchell (14817749)
Yun S. Song (14817761)
Zachary J. Oppler (7436867)
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 17/04/2023
Field of study

The amplification predicted for Round 2 amplification was accurate only for low-performing primers (orange). Updating the model with Round 2 amplification data resulted in a model that predicted highly effective primers (green). The amplification efficacy of the primers selected for Round 3 were highly variable despite similar predictions. Nevertheless, the final random forest regression model did not select any poor performing primers. The model was not used to predict the primers used in Round 1.</p

FigShare

Overview of the swga2.0 pipeline.

Author: Dustin Brisson (210833)
Jane A. Dwivedi-Yu (15307239)
Matthew W. Mitchell (14817749)
Yun S. Song (14817761)
Zachary J. Oppler (7436867)
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 17/04/2023
Field of study

The process is broken into four stages: 1) preprocessing of locations in the target and off-target genomes, 2) filtering motifs in the target genome based on individual primer properties and frequencies in the genomes, 3) scoring the remaining primers for amplification efficacy using a machine learning model, and 4) searching and evaluating aggregations of primers as candidate primer sets.</p

FigShare

Ridge regression variable descriptions and coefficient values for primer set evaluation.

Author: Dustin Brisson (210833)
Jane A. Dwivedi-Yu (15307239)
Matthew W. Mitchell (14817749)
Yun S. Song (14817761)
Zachary J. Oppler (7436867)
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 17/04/2023
Field of study

Ridge regression variable descriptions and coefficient values for primer set evaluation.</p

FigShare

Deeper sequencing of the three successful primer sets—Prev03, Prev06, and Prev04—confirms the efficient and even selective amplification of P. melaninogenica.

Author: Dustin Brisson (210833)
Jane A. Dwivedi-Yu (15307239)
Matthew W. Mitchell (14817749)
Yun S. Song (14817761)
Zachary J. Oppler (7436867)
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 17/04/2023
Field of study

The solid colored lines indicate individual replicates and the green dashed line represents the pooled total. Each of the three primer sets yield dramatic increases in sequencing depth compared to the unamplified samples (black dashed line). Each primer set reached 10× coverage across 23–74% of the target genome, while the unamplified samples reached 10× coverage at <1% of the target genome, with 500 Mbp of sequencing effort.</p

FigShare

Primer set statistics and sequences.

Author: Dustin Brisson (210833)
Jane A. Dwivedi-Yu (15307239)
Matthew W. Mitchell (14817749)
Yun S. Song (14817761)
Zachary J. Oppler (7436867)
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 17/04/2023
Field of study

FigShare

Percent of reads mapping to <i>Prevotella</i> and Human (background) sequences (15 Mbp sequenced).

<i>P. melaninogenica</i> is a more difficult genome to design primers for than <i>M. tuberculosis</i>.

Primer set search algorithm.

Poor performing primers are filtered out in Stage 3 of swga2.0.

Differences between swga1.0 and swga2.0.

The final random forest regression model reliably predicts the amplification efficacy of individual primers.

Overview of the swga2.0 pipeline.

Ridge regression variable descriptions and coefficient values for primer set evaluation.

Deeper sequencing of the three successful primer sets—Prev03, Prev06, and Prev04—confirms the efficient and even selective amplification of <i>P. melaninogenica</i>.

Primer set statistics and sequences.