3 research outputs found

    Improving alignment accuracy on homopolymer regions for semiconductor-based sequencing technologies

    Get PDF
    BACKGROUND: Ion Torrent and Ion Proton are semiconductor-based sequencing technologies that feature rapid sequencing speed and low upfront and operating costs, thanks to the avoidance of modified nucleotides and optical measurements. Despite of these advantages, however, Ion semiconductor sequencing technologies suffer much reduced sequencing accuracy at the genomic loci with homopolymer repeats of the same nucleotide. Such limitation significantly reduces its efficiency for the biological applications aiming at accurately identifying various genetic variants. RESULTS: In this study, we propose a Bayesian inference-based method that takes the advantage of the signal distributions of the electrical voltages that are measured for all the homopolymers of a fixed length. By cross-referencing the length of homopolymers in the reference genome and the voltage signal distribution derived from the experiment, the proposed integrated model significantly improves the alignment accuracy around the homopolymer regions. CONCLUSIONS: Besides improving alignment accuracy on homopolymer regions for semiconductor-based sequencing technologies with the proposed model, similar strategies can also be used on other high-throughput sequencing technologies that share similar limitations

    Additional file 1: of Improving alignment accuracy on homopolymer regions for semiconductor-based sequencing technologies

    No full text
    Supplementary results. This file contains all supplementary results that are not covered in the manuscript, including 5 figures and 1 table on Ion Proton data. Figure S1. is about profile of retrieved homopolymers according to (a) nucleotide type and (b) position in the sequencing reads. Figure S2. is about prior possibilities of the detected voltages when nucleotide type is A and position in the sequencing reads belongs to Z1. Figure S3. is about other factors in identification of homopolymer length as (a) nucleotide type when homopolymer length is 4 and position in the sequencing reads belongs to Z1 and (b) position in the sequencing reads when homopolymer length is 4 and nucleotide type is A. Figure S4. is about identification result of homopolymer lengths when nucleotide type is A and position in the sequencing read belongs to Z1. The result is presented as (a) frequency of identification errors and (b) distribution of identification result. Figure S5. is about comparison of identification results among different identification methods according to (a) all methods and (b) two methods of only using reference information and the proposed method. Table S1. is about identification errors of homopolymer length with different methods. (PDF 396 kb
    corecore