14 research outputs found
NGS Based Haplotype Assembly Using Matrix Completion
We apply matrix completion methods for haplotype assembly from NGS reads to
develop the new HapSVT, HapNuc, and HapOPT algorithms. This is performed by
applying a mathematical model to convert the reads to an incomplete matrix and
estimating unknown components. This process is followed by quantizing and
decoding the completed matrix in order to estimate haplotypes. These algorithms
are compared to the state-of-the-art algorithms using simulated data as well as
the real fosmid data. It is shown that the SNP missing rate and the haplotype
block length of the proposed HapOPT are better than those of HapCUT2 with
comparable accuracy in terms of reconstruction rate and switch error rate. A
program implementing the proposed algorithms in MATLAB is freely available at
https://github.com/smajidian/HapMC
Minimum error correction-based haplotype assembly: considerations for long read data
The single nucleotide polymorphism (SNP) is the most widely studied type of
genetic variation. A haplotype is defined as the sequence of alleles at SNP
sites on each haploid chromosome. Haplotype information is essential in
unravelling the genome-phenotype association. Haplotype assembly is a
well-known approach for reconstructing haplotypes, exploiting reads generated
by DNA sequencing devices. The Minimum Error Correction (MEC) metric is often
used for reconstruction of haplotypes from reads. However, problems with the
MEC metric have been reported. Here, we investigate the MEC approach to
demonstrate that it may result in incorrectly reconstructed haplotypes for
devices that produce error-prone long reads. Specifically, we evaluate this
approach for devices developed by Illumina, Pacific BioSciences and Oxford
Nanopore Technologies. We show that imprecise haplotypes may be reconstructed
with a lower MEC than that of the exact haplotype. The performance of MEC is
explored for different coverage levels and error rates of data. Our simulation
results reveal that in order to avoid incorrect MEC-based haplotypes, a
coverage of 25 is needed for reads generated by Pacific BioSciences RS systems.Comment: 17 pages, 6 figure
Assessment of Soil-Nailed Excavations Seismic Failure under Cyclic Loading and Pseudo-Static forces
In this paper two numerical analysis methods (i.e. cyclic time history and pseudo-static) are applied to simulate the seismic behaviour and failure mechanism of soil-nailed structures. The numerical simulations are performed by using a finite difference software (Flac). Nevada sand soil parameters are used and construction sequences of nailed-structures are simulated prior to the cyclic and pseudo-static analyses. The results revealed that the failure pattern of two kinds of analyses are approximately similar and comprised of bilinear sliding surfaces. Furthermore, good agreement is found between failure pattern of two types of numerical analyses and previous experimental tests. based on comparison between facing displacements in two considered analysis methods, a simple process is presented to achieve the seismic coefficient consistent with the peak ground acceleration. Presentation of considered method is based on supposition that failure occurs at the constant pullout displacement of bottom-row nails for both analysis methods
The third international hackathon for applying insights into large-scale genomic composition to use cases in a wide range of organisms
publishedVersio
Haplotype Assembly Using Manifold Optimization and Error Correction Mechanism
Recent matrix completion based methods have not been able to properly model
the Haplotype Assembly Problem (HAP) for noisy observations. To cope with such
a case, in this letter we propose a new Minimum Error Correction (MEC) based
matrix completion optimization problem over the manifold of rank-one matrices.
The convergence of a specific iterative algorithm for solving this problem is
proved. Simulation results illustrate that the proposed method not only
outperforms some well-known matrix completion based methods, but also presents
a more accurate result compared to a most recent MEC based algorithm for
haplotype estimation
Hap10: Reconstructing accurate and long polyploid haplotypes using linked reads
Background: Haplotype information is essential for many genetic and genomic analyses, including genotype-phenotype associations in human, animals and plants. Haplotype assembly is a method for reconstructing haplotypes from DNA sequencing reads. By the advent of new sequencing technologies, new algorithms are needed to ensure long and accurate haplotypes. While a few linked-read haplotype assembly algorithms are available for diploid genomes, to the best of our knowledge, no algorithms have yet been proposed for polyploids specifically exploiting linked reads. Results: The first haplotyping algorithm designed for linked reads generated from a polyploid genome is presented, built on a typical short-read haplotyping method, SDhaP. Using the input aligned reads and called variants, the haplotype-relevant information is extracted. Next, reads with the same barcodes are combined to produce molecule-specific fragments. Then, these fragments are clustered into strongly connected components which are then used as input of a haplotype assembly core in order to estimate accurate and long haplotypes. Conclusions: Hap10 is a novel algorithm for haplotype assembly of polyploid genomes using linked reads. The performance of the algorithms is evaluated in a number of simulation scenarios and its applicability is demonstrated on a real dataset of sweet potato.</p
Bioinformatics Evaluation of SPATA19 Gene Expression in Different Parts of Brain
Background: Determining the expression pattern of testis genes in the brain is essential for understanding tissue functions and correlation or inter-correlation between testis and the brain. In this study, we examined spermatogenesis-associated 19 (SPATA19 gene) expression in 10 parts of the brain with bioinformatics analysis.
Materials and Methods: The public dataset GSE46706, including 1231 samples originated from 134 Caucasian individuals, was downloaded from NCBI Gene Expression Omnibus (GEO). SPATA19 gene expression in the cerebellar cortex, frontal cortex, hippocampus, medulla, occipital cortex, putamen, substantia nigra, temporal cortex, thalamus, and white matter was examined against each other using R software and the t-test.
Results: Out of 10 brain parts examined, the cerebellar cortex and white matter showed the highest expression, and the temporal cortex showed the lowest expression of the gene. So the cerebellar cortex had a 5.6% and 6.2% increase in gene expression relative to the putamen and temporal cortex with P values of 6.04e-13 and 2.15e-17, respectively. Also, the white matter had a 4% increase in gene expression over the temporal cortex with a P value of 1.89e-13.
Conclusion: SPATA19 had more expression in the cerebellar cortex and white matter than other brain parts. These two parts make up the cerebellum