19 research outputs found

    Heterozygous variations, including heterozygous SNPs and hemizygous insertions/deletions/inversions, detected during assembly of diploid genome.

    No full text
    <p>Heterozygous variations, including heterozygous SNPs and hemizygous insertions/deletions/inversions, detected during assembly of diploid genome.</p

    Overview of HapSVAssembler.

    No full text
    <p>Overview of HapSVAssembler. Stage I: Using <i>de</i> <i>novo</i> assembler to reconstruct a reference genome; Stage II: Using a reference genome assembled in Stage I, we can find SNPs and heterozygous SVs; Stage III: Two consensus haplotypes can be reconstructed from the SNP/SV matrix.</p

    A Genetic Algorithm for Diploid Genome Reconstruction Using Paired-End Sequencing

    No full text
    <div><p>The genome of many species in the biosphere is a diploid consisting of paternal and maternal haplotypes. The differences between these two haplotypes range from single nucleotide polymorphisms (SNPs) to large-scale structural variations (SVs). Existing genome assemblers for next-generation sequencing platforms attempt to reconstruct one consensus sequence, which is a mosaic of two parental haplotypes. Reconstructing paternal and maternal haplotypes is an important task in linkage analysis and association studies. This study designs and implemented HapSVAssembler on the basis of Genetic Algorithm (GA) and paired-end sequencing. The proposed method builds a consensus sequence, identifies various types of heterozygous variants, and reconstructs the paternal and maternal haplotypes by solving an optimization problem with a GA algorithm. Experimental results indicate that the HapSVAssembler has high accuracy and contiguity under various sequencing coverage, error rates, and insert sizes. The program is tested on pilot sequencing of a highly heterozygous genome, and 12,781 heterozygous SNPs and 602 hemizygous SVs are identified. We observe that, although the number of SVs is much less than that of SNPs, the genomic regions occupied by SVs are much larger, implying the heterozygosity computed using SNPs or <i>k</i>-mer spectrum may be under-estimated.</p></div

    The percentage of reducted problem sizes of CMEC model.

    No full text
    <p>(a) Under 99% similarity between the diploid genome, 0.3% of reads can be constrained together; (b) Problem size is decreased when the difference between the diploid genome is increased.</p

    Assembly accuracy and contiguity for different sequencing coverage and error rates.

    No full text
    <p>(a) The accuracy higher than 90% can be obtained with low error rate simulations even in low coverage; (b) The comparison of N10/N50 for different sequencing coverage.</p

    Heuristic population initialization for GA chromosomes.

    No full text
    <p>(a) The set of starting fragment <i>f</i><sub>4</sub> is randomly set as 1, and we will update the pseudo-haplotype<sub>1</sub>; (b) The pseudo-haplotypes are extended from <i>f</i><sub>3</sub> and <i>f</i><sub>5</sub>, and the set of <i>f</i><sub>3</sub> and <i>f</i><sub>5</sub> is determined by the similarity.</p

    Illustration of converting paired-reads to SNP matrix and SV matrix.

    No full text
    <p>(a) Paired-end read <i>r</i><sub>1</sub> and <i>r</i><sub>2</sub> both contain SNPs but <i>r</i><sub>3</sub> does not, therefore, <i>r</i><sub>1</sub> and <i>r</i><sub>2</sub> can be successfully converted to read fragment <i>f</i><sub>1</sub> and <i>f</i><sub>2</sub> respectively. SNP <i>s</i><sub>2</sub> is covered by <i>r</i><sub>2</sub>, and the allele at <i>s</i><sub>2</sub> can be obtained by the 4-th nucleotide on <i>r</i><sub>2</sub>; (b) Single-end mapped read <i>r</i><sub>1</sub> and <i>r</i><sub>2</sub> whose unmapped ends are overlapping with <i>sv</i><sub>1</sub> (e.g., a deletion), both of and can be assigned by 1.</p

    Illustration of extended Haplotype blocks via heterozygous SVs.

    No full text
    <p>One end is represented by a solid arrow and two ends from the same read are connected by a dotted line. There is a heterozygous SV<sub>1</sub> between SNP<sub>10</sub> and SNP<sub>11</sub>. (a) Without considering SVs, the entire haplotype will be broken into three haplotype blocks; (b) In our approach, Block<sub>2</sub> and Block<sub>3</sub> in (a) are merged by bridging read <i>x</i>, <i>y</i> in Block<sub>2</sub> and bridging read <i>z</i> in Block<sub>3</sub> that indicate heterozygous SV<sub>1</sub>.</p

    The accuracy for different genome size and read length.

    No full text
    <p>The paternal and maternal genomes differes in 1% SNPs. The mean insert size is 250bp with 25bp standard deviation, the sequencing coverage is 20X, and the sequencing error rate is 1%. (a) The accuracy for different genome sizes; (b) The accuracy for different read lengths.</p
    corecore