23 research outputs found

    Minimum error correction-based haplotype assembly: considerations for long read data

    Full text link
    The single nucleotide polymorphism (SNP) is the most widely studied type of genetic variation. A haplotype is defined as the sequence of alleles at SNP sites on each haploid chromosome. Haplotype information is essential in unravelling the genome-phenotype association. Haplotype assembly is a well-known approach for reconstructing haplotypes, exploiting reads generated by DNA sequencing devices. The Minimum Error Correction (MEC) metric is often used for reconstruction of haplotypes from reads. However, problems with the MEC metric have been reported. Here, we investigate the MEC approach to demonstrate that it may result in incorrectly reconstructed haplotypes for devices that produce error-prone long reads. Specifically, we evaluate this approach for devices developed by Illumina, Pacific BioSciences and Oxford Nanopore Technologies. We show that imprecise haplotypes may be reconstructed with a lower MEC than that of the exact haplotype. The performance of MEC is explored for different coverage levels and error rates of data. Our simulation results reveal that in order to avoid incorrect MEC-based haplotypes, a coverage of 25 is needed for reads generated by Pacific BioSciences RS systems.Comment: 17 pages, 6 figure

    Low-Rank Isomap Algorithm

    Full text link
    The Isomap is a well-known nonlinear dimensionality reduction method that highly suffers from computational complexity. Its computational complexity mainly arises from two stages; a) embedding a full graph on the data in the ambient space, and b) a complete eigenvalue decomposition. Although the reduction of the computational complexity of the graphing stage has been investigated, yet the eigenvalue decomposition stage remains a bottleneck in the problem. In this paper, we propose the Low-Rank Isomap algorithm by introducing a projection operator on the embedded graph from the ambient space to a low-rank latent space to facilitate applying the partial eigenvalue decomposition. This approach leads to reducing the complexity of Isomap to a linear order while preserving the structural information during the dimensionality reduction process. The superiority of the Low-Rank Isomap algorithm compared to some state-of-art algorithms is experimentally verified on facial image clustering in terms of speed and accuracy

    Demixing Sines and Spikes Using Multiple Measurement Vectors

    Full text link
    In this paper, we address the line spectral estimation problem with multiple measurement corrupted vectors. Such scenarios appear in many practical applications such as radar, optics, and seismic imaging in which the signal of interest can be modeled as the sum of a spectrally sparse and a blocksparse signal known as outlier. Our aim is to demix the two components and for that, we design a convex problem whose objective function promotes both of the structures. Using positive trigonometric polynomials (PTP) theory, we reformulate the dual problem as a semi-definite program (SDP). Our theoretical results states that for a fixed number of measurements N and constant number of outliers, up to O(N) spectral lines can be recovered using our SDP problem as long as a minimum frequency separation condition is satisfied. Our simulation results also show that increasing the number of samples per measurement vectors, reduces the minimum required frequency separation for successful recovery.Comment: 9 pages, 3 figure