23 research outputs found
Minimum error correction-based haplotype assembly: considerations for long read data
The single nucleotide polymorphism (SNP) is the most widely studied type of
genetic variation. A haplotype is defined as the sequence of alleles at SNP
sites on each haploid chromosome. Haplotype information is essential in
unravelling the genome-phenotype association. Haplotype assembly is a
well-known approach for reconstructing haplotypes, exploiting reads generated
by DNA sequencing devices. The Minimum Error Correction (MEC) metric is often
used for reconstruction of haplotypes from reads. However, problems with the
MEC metric have been reported. Here, we investigate the MEC approach to
demonstrate that it may result in incorrectly reconstructed haplotypes for
devices that produce error-prone long reads. Specifically, we evaluate this
approach for devices developed by Illumina, Pacific BioSciences and Oxford
Nanopore Technologies. We show that imprecise haplotypes may be reconstructed
with a lower MEC than that of the exact haplotype. The performance of MEC is
explored for different coverage levels and error rates of data. Our simulation
results reveal that in order to avoid incorrect MEC-based haplotypes, a
coverage of 25 is needed for reads generated by Pacific BioSciences RS systems.Comment: 17 pages, 6 figure
Low-Rank Isomap Algorithm
The Isomap is a well-known nonlinear dimensionality reduction method that
highly suffers from computational complexity. Its computational complexity
mainly arises from two stages; a) embedding a full graph on the data in the
ambient space, and b) a complete eigenvalue decomposition. Although the
reduction of the computational complexity of the graphing stage has been
investigated, yet the eigenvalue decomposition stage remains a bottleneck in
the problem. In this paper, we propose the Low-Rank Isomap algorithm by
introducing a projection operator on the embedded graph from the ambient space
to a low-rank latent space to facilitate applying the partial eigenvalue
decomposition. This approach leads to reducing the complexity of Isomap to a
linear order while preserving the structural information during the
dimensionality reduction process. The superiority of the Low-Rank Isomap
algorithm compared to some state-of-art algorithms is experimentally verified
on facial image clustering in terms of speed and accuracy
Demixing Sines and Spikes Using Multiple Measurement Vectors
In this paper, we address the line spectral estimation problem with multiple
measurement corrupted vectors. Such scenarios appear in many practical
applications such as radar, optics, and seismic imaging in which the signal of
interest can be modeled as the sum of a spectrally sparse and a blocksparse
signal known as outlier. Our aim is to demix the two components and for that,
we design a convex problem whose objective function promotes both of the
structures. Using positive trigonometric polynomials (PTP) theory, we
reformulate the dual problem as a semi-definite program (SDP). Our theoretical
results states that for a fixed number of measurements N and constant number of
outliers, up to O(N) spectral lines can be recovered using our SDP problem as
long as a minimum frequency separation condition is satisfied. Our simulation
results also show that increasing the number of samples per measurement
vectors, reduces the minimum required frequency separation for successful
recovery.Comment: 9 pages, 3 figure