4 research outputs found

    An Improved Protocol for Sequencing of Repetitive Genomic Regions and Structural Variations Using Mutagenesis and Next Generation Sequencing

    Get PDF
    <div><p>The rise of Next Generation Sequencing (NGS) technologies has transformed <em>de novo</em> genome sequencing into an accessible research tool, but obtaining high quality eukaryotic genome assemblies remains a challenge, mostly due to the abundance of repetitive elements. These also make it difficult to study nucleotide polymorphism in repetitive regions, including certain types of structural variations. One solution proposed for resolving such regions is Sequence Assembly aided by Mutagenesis (SAM), which relies on the fact that introducing enough random mutations breaks the repetitive structure, making assembly possible. Sequencing many different mutated copies permits the sequence of the repetitive region to be inferred by consensus methods. However, this approach relies on molecular cloning in order to isolate and amplify individual mutant copies, making it hard to scale-up the approach for use in conjunction with high-throughput sequencing technologies. To address this problem, we propose NG-SAM, a modified version of the SAM protocol that relies on PCR and dilution steps only, coupled to a NGS workflow. NG-SAM therefore has the potential to be scaled-up, e.g. using emerging microfluidics technologies. We built a realistic simulation pipeline to study the feasibility of NG-SAM, and our results suggest that under appropriate experimental conditions the approach might be successfully put into practice. Moreover, our simulations suggest that NG-SAM is capable of reconstructing robustly a wide range of potential target sequences of varying lengths and repetitive structures.</p> </div

    Performance of NG-SAM in simulated experiments.

    No full text
    <p>The hexagons are colored according to the mean of the metrics from all covered simulated experiments. White areas represent unexplored parameter space. <b>A</b>. The percentage of successful simulated experiments in the first simulation setting, as a function of length and number of repetitive units. The black circle [at the point (3813, 3)] marks the repetitive structure of the target region used in the second simulation setting. The dashed line corresponds to target regions with a total size of 10 kb. <b>B</b>. Percentage of correctly reconstructed bases in the successful experiments from the first simulation setting, as a function of length and number of repetitive units in the target sequence (black circle and dashed line as in <b>A</b>). <b>C</b>. The percentage of successful simulated experiments in the second simulation setting, as a function of the dilution factors ( and in <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0043359#pone-0043359-g002" target="_blank">Figure 2</a>). The black circle corresponds to the dilution factors used in the first simulation setting. <b>D</b>. Percentage of correctly reconstructed bases in the second simulation setting as a function of the dilution factors. Black circle as in <b>C</b>; see text for further details.</p

    Overview of the simulated NG-SAM protocol.

    No full text
    <p>The numbering corresponds to the steps enumerated above in the main text. The trapezoids shaded in light blue represent PCR amplifications (with – being the number of cycles), while the rectangles shaded in yellow represent sampling of molecules by dilution. – are the number of molecules present in the various stages of the simulated experiment, with unique variants symbolised by different coloured dots. and are the dilution factors corresponding to the first and second dilution steps. The black lines represent the “lineages" of the molecules sampled by the second dilution, traced back to the initial molecule pool of size . The steps <b>A</b>–<b>C</b> correspond to the mutagenic PCR, dilution and cleanup PCR steps of the mutagenic protocol. simNGS <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0043359#pone.0043359-simNGS1" target="_blank">[35]</a> is a software for simulating Illumina sequencing and Velvet <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0043359#pone.0043359-Zerbino1" target="_blank">[8]</a> is a short read assembler.</p

    Assembly problems caused by the presence of repeats.

    No full text
    <p><b>A</b>. The structure of the target region. Red units are identical or near-identical; other colours are unique. <b>B</b>. Fragments ordered by their origin. <b>C</b>. Pool of reads obtained by short read sequencing. Note that in this example the full length of the fragments is sequenced. <b>D</b>. A graph structure summarizing assembly uncertainty. The thickness of the arrows representing the units is indicative of the depth of coverage. <b>E</b>. The two possible resolutions of the assembly graph, given that the copy numbers of all of the units are estimated correctly.</p
    corecore