Search CORE

737 research outputs found

NGS Based Haplotype Assembly Using Matrix Completion

Author: Kahaei MH
Majidian Sina
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2019
Field of study

We apply matrix completion methods for haplotype assembly from NGS reads to develop the new HapSVT, HapNuc, and HapOPT algorithms. This is performed by applying a mathematical model to convert the reads to an incomplete matrix and estimating unknown components. This process is followed by quantizing and decoding the completed matrix in order to estimate haplotypes. These algorithms are compared to the state-of-the-art algorithms using simulated data as well as the real fosmid data. It is shown that the SNP missing rate and the haplotype block length of the proposed HapOPT are better than those of HapCUT2 with comparable accuracy in terms of reconstruction rate and switch error rate. A program implementing the proposed algorithms in MATLAB is freely available at https://github.com/smajidian/HapMC

arXiv.org e-Print Archive

Directory of Open Access Journals

Haplotype Assembly: An Information Theoretic View

Author: Si Hongbo
Vikalo Haris
Vishwanath Sriram
Publication venue
Publication date: 11/05/2014
Field of study

This paper studies the haplotype assembly problem from an information theoretic perspective. A haplotype is a sequence of nucleotide bases on a chromosome, often conveniently represented by a binary string, that differ from the bases in the corresponding positions on the other chromosome in a homologous pair. Information about the order of bases in a genome is readily inferred using short reads provided by high-throughput DNA sequencing technologies. In this paper, the recovery of the target pair of haplotype sequences using short reads is rephrased as a joint source-channel coding problem. Two messages, representing haplotypes and chromosome memberships of reads, are encoded and transmitted over a channel with erasures and errors, where the channel model reflects salient features of high-throughput sequencing. The focus of this paper is on the required number of reads for reliable haplotype reconstruction, and both the necessary and sufficient conditions are presented with order-wise optimal bounds.Comment: 30 pages, 5 figures, 1 tabel, journa

arXiv.org e-Print Archive

Crossref

ParaHaplo 3.0: A program package for imputation and a haplotype-based whole-genome association study using hybrid parallel computing

Author: Kamatani Naoyuki
Misawa Kazuharu
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Abstract Background Use of missing genotype imputations and haplotype reconstructions are valuable in genome-wide association studies (GWASs). By modeling the patterns of linkage disequilibrium in a reference panel, genotypes not directly measured in the study samples can be imputed and used for GWASs. Since millions of single nucleotide polymorphisms need to be imputed in a GWAS, faster methods for genotype imputation and haplotype reconstruction are required. Results We developed a program package for parallel computation of genotype imputation and haplotype reconstruction. Our program package, ParaHaplo 3.0, is intended for use in workstation clusters using the Intel Message Passing Interface. We compared the performance of ParaHaplo 3.0 on the Japanese in Tokyo, Japan and Han Chinese in Beijing, and Chinese in the HapMap dataset. A parallel version of ParaHaplo 3.0 can conduct genotype imputation 20 times faster than a non-parallel version of ParaHaplo. Conclusions ParaHaplo 3.0 is an invaluable tool for conducting haplotype-based GWASs. The need for faster genotype imputation and haplotype reconstruction using parallel computing will become increasingly important as the data sizes of such projects continue to increase. ParaHaplo executable binaries and program sources are available at <url>http://en.sourceforge.jp/projects/parallelgwas/releases/</url>.</p

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Recommended from our members

PRIMAL: Fast and Accurate Pedigree-based Imputation from Sequence Data in a Founder Population

Author: Abney Mark
Alkorta-Aranburu Gorka
Han Lide
Livne Oren E.
Nicolae Dan L.
Ober Carole
Wentworth-Sheilds William
Publication venue
Publication date: 03/01/2024
Field of study

Founder populations and large pedigrees offer many well-known advantages for genetic mapping studies, including cost-efficient study designs. Here, we describe PRIMAL (PedigRee IMputation ALgorithm), a fast and accurate pedigree-based phasing and imputation algorithm for founder populations. PRIMAL incorporates both existing and original ideas, such as a novel indexing strategy of Identity-By-Descent (IBD) segments based on clique graphs. We were able to impute the genomes of 1,317 South Dakota Hutterites, who had genome-wide genotypes for ~300,000 common single nucleotide variants (SNVs), from 98 whole genome sequences. Using a combination of pedigree-based and LD-based imputation, we were able to assign 87% of genotypes with >99% accuracy over the full range of allele frequencies. Using the IBD cliques we were also able to infer the parental origin of 83% of alleles, and genotypes of deceased recent ancestors for whom no genotype information was available. This imputed data set will enable us to better study the relative contribution of rare and common variants on human phenotypes, as well as parental origin effect of disease risk alleles in >1,000 individuals at minimal cost.</p

Knowledge UChicago

Complete haplotype phasing of the MHC and KIR loci with targeted HaploSeq

Author: Anthony D. Schmitt
Bing Ren
Jesse R. Dixon
Siddarth Selvaraj
Publication venue: Springer Nature
Publication date: 01/01/2015
Field of study

BackgroundThe MHC and KIR loci are clinically relevant regions of the genome. Typing the sequence of these loci has a wide range of applications including organ transplantation, drug discovery, pharmacogenomics and furthering fundamental research in immune genetics. Rapid advances in biochemical and next-generation sequencing (NGS) technologies have enabled several strategies for precise genotyping and phasing of candidate HLA alleles. Nonetheless, as typing of candidate HLA alleles alone reveals limited aspects of the genetics of MHC region, it is insufficient for the comprehensive utility of the aforementioned applications. For this reason, we believe phasing the entire MHC and KIR locus onto a single locus-spanning haplotype can be a critical improvement for better understanding transplantation biology.ResultsGenerating long-range (>1 Mb) phase information is traditionally very challenging. As proximity-ligation based methods of DNA sequencing preserves chromosome-span phase information, we have utilized this principle to demonstrate its utility towards generating full-length phasing of MHC and KIR loci in human samples. We accurately (~99%) reconstruct the complete haplotypes for over 90% of sequence variants (coding and non-coding) within these two loci that collectively span 4-megabases.ConclusionsBy haplotyping a majority of coding and non-coding alleles at the MHC and KIR loci in a single assay, this method has the potential to assist transplantation matching and facilitate investigation of the genetic basis of human immunity and disease

Springer - Publisher Connector

eScholarship - University of California

A Graph Auto-Encoder for Haplotype Assembly and Viral Quasispecies Reconstruction

Author: Ke Ziqi
Vikalo Haris
Publication venue
Publication date: 13/11/2019
Field of study

Reconstructing components of a genomic mixture from data obtained by means of DNA sequencing is a challenging problem encountered in a variety of applications including single individual haplotyping and studies of viral communities. High-throughput DNA sequencing platforms oversample mixture components to provide massive amounts of reads whose relative positions can be determined by mapping the reads to a known reference genome; assembly of the components, however, requires discovery of the reads' origin -- an NP-hard problem that the existing methods struggle to solve with the required level of accuracy. In this paper, we present a learning framework based on a graph auto-encoder designed to exploit structural properties of sequencing data. The algorithm is a neural network which essentially trains to ignore sequencing errors and infers the posteriori probabilities of the origin of sequencing reads. Mixture components are then reconstructed by finding consensus of the reads determined to originate from the same genomic component. Results on realistic synthetic as well as experimental data demonstrate that the proposed framework reliably assembles haplotypes and reconstructs viral communities, often significantly outperforming state-of-the-art techniques

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

On the design of clone-based haplotyping

Author
Publication venue: BioMed Central
Publication date: 12/09/2013
Field of study

Springer - Publisher Connector

Characterizing the admixed African ancestry of African Americans

Author: Absher Devin
Assimes Themistocles L
Basu Analabha
Go Alan S
Hlatky Mark A
Iribarren Carlos
Knowles Joshua W
Li Jun
Myers Richard M
Narasimhan Balasubramanian
Quertermous Thomas
Risch Neil
Sidney Steven
Southwick Audrey
Tang Hua
Zakharia Fouad
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Genome-wide SNP analyses reveal the admixed African genetic ancestry of African Americans

CiteSeerX

Crossref

Springer - Publisher Connector

PubMed Central

eScholarship - University of California

Deep Blue Documents at the University of Michigan

Recommended from our members

Variation rs2235503 C > A Within the Promoter of MSLN Affects Transcriptional Rate of Mesothelin and Plasmatic Levels of the Soluble Mesothelin-Related Peptide.

Author: Bonotti Alessandra
Cipollini Monica
Corrado Alda
Cristaudo Alfonso
De Santi Chiara
Dell'Anno Irene
Evangelista Monica
Foddis Rudy
Gemignani Federica
Landi Stefano
Marolda Daniela
Miglietta Simona
Nicolí Vanessa
Pellegrino Enrica
Pucci Perla
Silvestri Roberto
Publication venue: Front Genet
Publication date: 01/09/2020
Field of study

Soluble mesothelin-related peptide (SMRP) is a promising biomarker for malignant pleural mesothelioma (MPM), but several confounding factors can reduce SMRP-based test's accuracy. The identification of these confounders could improve the diagnostic performance of SMRP. In this study, we evaluated the sequence of 1,000 base pairs encompassing the minimal promoter region of the MSLN gene to identify expression quantitative trait loci (eQTL) that can affect SMRP. We assessed the association between four MSLN promoter variants and SMRP levels in a cohort of 72 MPM and 677 non-MPM subjects, and we carried out in vitro assays to investigate their functional role. Our results show that rs2235503 is an eQTL for MSLN associated with increased levels of SMRP in non-MPM subjects. Furthermore, we show that this polymorphic site affects the accuracy of SMRP, highlighting the importance of evaluating the individual's genetic background and giving novel insights to refine SMRP specificity as a diagnostic biomarker

Apollo (Cambridge)

Variation rs2235503 C > A Within the Promoter of MSLN Affects Transcriptional Rate of Mesothelin and Plasmatic Levels of the Soluble Mesothelin-Related Peptide

Author: Bonotti A.
Cipollini M.
Corrado A.
Cristaudo A.
De Santi C.
Dell'Anno I.
Evangelista M.
Foddis R.
Gemignani F.
Landi S.
Marolda D.
Miglietta S.
Nicoli V.
Pellegrino E.
Pucci P.
Silvestri R.
Publication venue: 'Frontiers Media SA'
Publication date: 01/01/2020
Field of study

Soluble mesothelin-related peptide (SMRP) is a promising biomarker for malignant pleural mesothelioma (MPM), but several confounding factors can reduce SMRP-based test’s accuracy. The identification of these confounders could improve the diagnostic performance of SMRP. In this study, we evaluated the sequence of 1,000 base pairs encompassing the minimal promoter region of the MSLN gene to identify expression quantitative trait loci (eQTL) that can affect SMRP. We assessed the association between four MSLN promoter variants and SMRP levels in a cohort of 72 MPM and 677 non-MPM subjects, and we carried out in vitro assays to investigate their functional role. Our results show that rs2235503 is an eQTL for MSLN associated with increased levels of SMRP in non-MPM subjects. Furthermore, we show that this polymorphic site affects the accuracy of SMRP, highlighting the importance of evaluating the individual’s genetic background and giving novel insights to refine SMRP specificity as a diagnostic biomarker

Archivio della Ricerca - Università di Pisa