14 research outputs found

    NGS Based Haplotype Assembly Using Matrix Completion

    Full text link
    We apply matrix completion methods for haplotype assembly from NGS reads to develop the new HapSVT, HapNuc, and HapOPT algorithms. This is performed by applying a mathematical model to convert the reads to an incomplete matrix and estimating unknown components. This process is followed by quantizing and decoding the completed matrix in order to estimate haplotypes. These algorithms are compared to the state-of-the-art algorithms using simulated data as well as the real fosmid data. It is shown that the SNP missing rate and the haplotype block length of the proposed HapOPT are better than those of HapCUT2 with comparable accuracy in terms of reconstruction rate and switch error rate. A program implementing the proposed algorithms in MATLAB is freely available at https://github.com/smajidian/HapMC

    Minimum error correction-based haplotype assembly: considerations for long read data

    Full text link
    The single nucleotide polymorphism (SNP) is the most widely studied type of genetic variation. A haplotype is defined as the sequence of alleles at SNP sites on each haploid chromosome. Haplotype information is essential in unravelling the genome-phenotype association. Haplotype assembly is a well-known approach for reconstructing haplotypes, exploiting reads generated by DNA sequencing devices. The Minimum Error Correction (MEC) metric is often used for reconstruction of haplotypes from reads. However, problems with the MEC metric have been reported. Here, we investigate the MEC approach to demonstrate that it may result in incorrectly reconstructed haplotypes for devices that produce error-prone long reads. Specifically, we evaluate this approach for devices developed by Illumina, Pacific BioSciences and Oxford Nanopore Technologies. We show that imprecise haplotypes may be reconstructed with a lower MEC than that of the exact haplotype. The performance of MEC is explored for different coverage levels and error rates of data. Our simulation results reveal that in order to avoid incorrect MEC-based haplotypes, a coverage of 25 is needed for reads generated by Pacific BioSciences RS systems.Comment: 17 pages, 6 figure

    Assessment of Soil-Nailed Excavations Seismic Failure under Cyclic Loading and Pseudo-Static forces

    Get PDF
    In this paper two numerical analysis methods (i.e. cyclic time history and pseudo-static) are applied to simulate the seismic behaviour and failure mechanism of soil-nailed structures. The numerical simulations are performed by using a finite difference software (Flac). Nevada sand soil parameters are used and construction sequences of nailed-structures are simulated prior to the cyclic and pseudo-static analyses. The results revealed that the failure pattern of two kinds of analyses are approximately similar and comprised of bilinear sliding surfaces. Furthermore, good agreement is found between failure pattern of two types of numerical analyses and previous experimental tests. based on comparison between facing displacements in two considered analysis methods, a simple process is presented to achieve the seismic coefficient consistent with the peak ground acceleration. Presentation of considered method is based on supposition that failure occurs at the constant pullout displacement of bottom-row nails for both analysis methods

    Haplotype Assembly Using Manifold Optimization and Error Correction Mechanism

    No full text
    Recent matrix completion based methods have not been able to properly model the Haplotype Assembly Problem (HAP) for noisy observations. To cope with such a case, in this letter we propose a new Minimum Error Correction (MEC) based matrix completion optimization problem over the manifold of rank-one matrices. The convergence of a specific iterative algorithm for solving this problem is proved. Simulation results illustrate that the proposed method not only outperforms some well-known matrix completion based methods, but also presents a more accurate result compared to a most recent MEC based algorithm for haplotype estimation

    Hap10: Reconstructing accurate and long polyploid haplotypes using linked reads

    No full text
    Background: Haplotype information is essential for many genetic and genomic analyses, including genotype-phenotype associations in human, animals and plants. Haplotype assembly is a method for reconstructing haplotypes from DNA sequencing reads. By the advent of new sequencing technologies, new algorithms are needed to ensure long and accurate haplotypes. While a few linked-read haplotype assembly algorithms are available for diploid genomes, to the best of our knowledge, no algorithms have yet been proposed for polyploids specifically exploiting linked reads. Results: The first haplotyping algorithm designed for linked reads generated from a polyploid genome is presented, built on a typical short-read haplotyping method, SDhaP. Using the input aligned reads and called variants, the haplotype-relevant information is extracted. Next, reads with the same barcodes are combined to produce molecule-specific fragments. Then, these fragments are clustered into strongly connected components which are then used as input of a haplotype assembly core in order to estimate accurate and long haplotypes. Conclusions: Hap10 is a novel algorithm for haplotype assembly of polyploid genomes using linked reads. The performance of the algorithms is evaluated in a number of simulation scenarios and its applicability is demonstrated on a real dataset of sweet potato.</p

    Bioinformatics Evaluation of SPATA19 Gene Expression in Different Parts of Brain

    No full text
    Background: Determining the expression pattern of testis genes in the brain is essential for understanding tissue functions and correlation or inter-correlation between testis and the brain. In this study, we examined spermatogenesis-associated 19 (SPATA19 gene) expression in 10 parts of the brain with bioinformatics analysis. Materials and Methods: The public dataset GSE46706, including 1231 samples originated from 134 Caucasian individuals, was downloaded from NCBI Gene Expression Omnibus (GEO). SPATA19 gene expression in the cerebellar cortex, frontal cortex, hippocampus, medulla, occipital cortex, putamen, substantia nigra, temporal cortex, thalamus, and white matter was examined against each other using R software and the t-test.  Results: Out of 10 brain parts examined, the cerebellar cortex and white matter showed the highest expression, and the temporal cortex showed the lowest expression of the gene. So the cerebellar cortex had a 5.6% and 6.2% increase in gene expression relative to the putamen and temporal cortex with P values of 6.04e-13 and 2.15e-17, respectively. Also, the white matter had a 4% increase in gene expression over the temporal cortex with a P value of 1.89e-13.  Conclusion: SPATA19 had more expression in the cerebellar cortex and white matter than other brain parts. These two parts make up the cerebellum
    corecore