Search CORE

1,569 research outputs found

An EM algorithm based on an internal list for estimating haplotype distributions of rare variants from pooled genotype data

Author: Anthony YC Kuk
Jinfeng Xu
Kuk Anthony YC
Xiang Li
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2013
Field of study

10.1186/1471-2156-14-82BMC Genetics14-BGME

Crossref

Springer

Springer - Publisher Connector

PubMed Central

ScholarBank@NUS

HKU Scholars Hub

Recommended from our members

Haplotype Inference through Sequential Monte Carlo

Author: Iliadis Alexandros
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2013
Field of study

Technological advances in the last decade have given rise to large Genome Wide Studies which have helped researchers get better insights in the genetic basis of many common diseases. As the number of samples and genome coverage has increased dramatically it is currently typical that individuals are genotyped using high throughput platforms to more than 500,000 Single Nucleotide Polymorphisms. At the same time theoretical and empirical arguments have been made for the use of haplotypes, i.e. combinations of alleles at multiple loci in individual chromosomes, as opposed to genotypes so the problem of haplotype inference is particularly relevant. Existing haplotyping methods include population based methods, methods for pooled DNA samples and methods for family and pedigree data. Furthermore, the vast amount of available data pose new challenges for haplotyping algorithms. Candidate methods should scale well to the size of the datasets as the number of loci and the number of individuals are well to the thousands. In addition, as genotyping can be performed routinely, researchers encounter a number of specific new scenarios, which can be seen as hybrid between the population and pedigree inference scenarios and require special care to incorporate the maximum amount of information. In this thesis we present a Sequential Monte Carlo framework (TDS) and tailor it to address instances of haplotype inference and frequency estimation problems. Specifically, we first adjust our framework to perform haplotype inference in trio families resulting in a methodology that demonstrates an excellent tradeoff between speed and accuracy. Consequently, we extend our method to handle general nuclear families and demonstrate the gain using our approach as opposed to alternative scenarios. We further address the problem of haplotype inference in pooling data in which we show that our method achieves improved performance over existing approaches in datasets with large number of markers. We finally present a framework to handle the haplotype inference problem in regions of CNV/SNP data. Using our approach we can phase datasets where the ploidy of an individual can vary along the region and each individual can have different breakpoints

Columbia University Academic Commons

Estimating the effect of SNP genotype on quantitative traits from pooled DNA samples

Author: Barendse William
Dominik Sonja
Hawken Rachel J
Henshall John M
Publication venue: BioMed Central
Publication date: 01/01/2012
Field of study

Springer - Publisher Connector

PubMed Central

University of Queensland eSpace

Haplotype frequency inference from pooled genetic data with a latent multinomial model

Author: Flegg Jennifer A.
Foo Yong See
Publication venue
Publication date: 31/08/2023
Field of study

In genetic studies, haplotype data provide more refined information than data about separate genetic markers. However, large-scale studies that genotype hundreds to thousands of individuals may only provide results of pooled data, where only the total allele counts of each marker in each pool are reported. Methods for inferring haplotype frequencies from pooled genetic data that scale well with pool size rely on a normal approximation, which we observe to produce unreliable inference when applied to real data. We illustrate cases where the approximation breaks down, due to the normal covariance matrix being near-singular. As an alternative to approximate methods, in this paper we propose exact methods to infer haplotype frequencies from pooled genetic data based on a latent multinomial model, where the observed allele counts are considered integer combinations of latent, unobserved haplotype counts. One of our methods, latent count sampling via Markov bases, achieves approximately linear runtime with respect to pool size. Our exact methods produce more accurate inference over existing approximate methods for synthetic data and for data based on haplotype information from the 1000 Genomes Project. We also demonstrate how our methods can be applied to time-series of pooled genetic data, as a proof of concept of how our methods are relevant to more complex hierarchical settings, such as spatiotemporal models.Comment: 35 pages, 16 figures, 3 algorithms, submitted to Biometrics journa

arXiv.org e-Print Archive

A whole genome association study of neuroticism using DNA pooling.

Author: A Aluja
A Bansal
A Bhomra
A Darvasi
A Papassotiropoulos
AI Scott
BJ Barratt
BJ Cox
BM Neale
C M Middeldorp
CM Middeldorp
D Altshuler
D I Boomsma
DA Hinds
DC Rettew
DI Boomsma
DJ Schaid
DJ Schaid
DM Benbrook
DM Benbrook
E J C G van den Oord
E Meaburn
E Meaburn
F De Fruyt
G Kirov
GM Wilson
GW Smith
HJ Eysenck
HJ Eysenck
HT Zhang
I Craig
IR Le Jeune
J Angst
J Flint
J Fullerton
J J Hottenga
J M Hettema
JA Lee
JM Hettema
JM Hettema
JP Ioannidis
JP Ioannidis
K S Kendler
KL Mohlke
KM Kirk
KS Kendler
KS Kendler
KS Kendler
KS Kendler
LM Butcher
LM Butcher
M C Neale
M R James
ML Wong
MW Nash
N G Martin
N Martin
N Norton
N R Wray
OJ Bienvenu
P Boyce
P E Slagboom
P Jylha
P Jylha
P Sham
R Jardine
R Kessler
R Redon
RM Hirschfeld
S Macgregor
S S An
S Sawcer
S Shifman
S Shifman
S Shifman
S Smiley
SA Willis-Owen
SA Willis-Owen
SA Willis-Owen
WW Fleischhacker
X Chen
Publication venue
Publication date: 01/01/2008
Field of study

We describe a multistage approach to identify single nucleotide polymorphisms (SNPs) associated with neuroticism, a personality trait that shares genetic determinants with major depression and anxiety disorders. Whole genome association with 452 574 SNPs was performed on DNA pools from approximately 2000 individuals selected on extremes of neuroticism scores from a cohort of 88 142 people from southwest England. The most significant SNPs were then genotyped on independent samples to replicate findings. We were able to replicate association of one SNP within the PDE4D gene in a second sample collected by our laboratory and in a family-based test in an independent sample; however, the SNP was not significantly associated with neuroticism in two other independent samples. We also observed an enrichment of low P-values in known regions of copy number variations. Simulation indicates that our study had approximately 80% power to identify neuroticism loci in the genome with odds ratio (OR)>2, and approximately 50% power to identify small effects (OR=1.5). Since we failed to find any loci accounting for more than 1% of the variance, the heritability of neuroticism probably arises from many loci each explaining much less than 1%. Our findings argue the need for much larger samples than anticipated in genetic association studies and that the biological basis of emotional disorders is extremely complex

Crossref

VU Research Portal

PubMed Central

Oxford University Research Archive

Application of next generation sequencing in genetic and genomic studies

Author: Wang Jingwen
Publication venue: 'Elsevier BV'
Publication date: 09/09/2016
Field of study

Genetic variants that spread along the human genome play vital roles in determining our traits, affecting development and potentially causing disorders. Most common disorders have complex underlying mechanisms involving genetic or environmental factors and the interaction between them. Over the past decade, genome-wide association studies (GWAS) have identified thousands of common variants that contribute to complex disorders and partially explain the heritability. However, there is still a large portion that is unexplained and the missing heritability may be caused by several factors, such as rare or low-frequency variants with high effect that are not covered by GWAS and linkage analysis. With the development of next generation sequencing (NGS), it is possible to rapidly detect large amount of novel rare and low-frequency variants simultaneously at a low cost. This new technology provides vast information on studying the association of genetic variations and complex disorders. Once the susceptibility gene is mapped, model organisms such as zebrafish (Danio rerio) are popular for further investigating the possible function of diseaseassociated gene in determining the phenotype. However, the genome annotation of zebrafish is not complete, which affects the characterization of gene functions. Accordingly, highthroughput RNA sequencing can be employed for identifying new transcripts. In our studies, pooled DNA samples were used for whole genome sequencing (WGS) and exome sequencing. In Paper I, we evaluated minor allele frequency (MAF) estimates using three variant detection tools with two sets of pooled exome sequencing and one set of pooled WGS data. The MAFs from the pooled sequencing data demonstrated high concordance (r = 0.88-0.94) with those from the individual genotyping data. In Paper II, exome sequencing implementing pooling strategy was performed on 100 idiopathic scoliosis (IS) patients for mapping susceptibility genes. After validating 20 candidate single nucleotide variants (SNVs), we did not find associations between them and IS. However, the previously reported common variant rs11190870 near LBX1 was validated in a large Scandinavian cohort. In Paper III, we analyzed WGS of pooled DNA samples performed on 19 affected individuals who shared a phenotype-linked haplotype in a dyslexic Finish family. Two of the individuals were sequenced for the whole genome individually as well. The screen for causative variants was narrowed down to a rare SNV, which might affect the binding affinity of LHX2 that regulated dyslexia associated gene ROBO1. In Paper IV, RNA sequencing (RNA-seq) data were analyzed for identifying novel transcripts in zebrafish early development using an inhouse pipeline. We discovered 152 novel transcribed regions (NTRs), validated more than 10 NTRs and quantified their expression in early developmental stages. In our studies, we evaluated and applied a pooling approach for identifying variants susceptible to disease using high-throughput DNA sequencing. Based on RNA sequencing data, we provided new information for genome annotation on model organism zebrafish, which is valuable for studying the function of disease causative genes. In summary, the whole series of studies demonstrate how NGS can be applied in studying the genetic basis of complex disorders and assisting in follow-up functional studies in model organisms

Publications from Karolinska Institutet

Exploiting natural selection to study adaptive behavior

Author: Pulido Tamayo Sergio
Publication venue: Ghent University. Faculty of Sciences ; KU Leuven. Faculty of Bioscience Engineering
Publication date: 01/01/2016
Field of study

The research presented in this dissertation explores different computational and modeling techniques that combined with predictions from evolution by natural selection leads to the analysis of the adaptive behavior of populations under selective pressure. For this thesis three computational methods were developed: EXPLoRA, EVORhA and SSA-ME. EXPLoRA finds genomic regions associated with a trait of interests (QTL) by explicitly modeling the expected linkage disequilibrium of a population of sergeants under selection. Data from BSA experiments was analyzed to find genomic loci associated with ethanol tolerance. EVORhA explores the interplay between driving and hitchhiking mutations during evolution to reconstruct the subpopulation structure of clonal bacterial populations based on deep sequencing data. Data from mixed infections and evolution experiments of E. Coli was used and their population structure reconstructed. SSA-ME uses mutual exclusivity in cancer to prioritize cancer driver genes. TCGA data of breast cancer tumor samples were analyzed.status: publishe

Lirias

Ghent University Academic Bibliography

Recommended from our members

Fast and accurate haplotype frequency estimation for large haplotype vectors from pooled DNA data

Author: Anastassiou Dimitris
Iliadis Alexandros
Wang Xiaodong
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2012
Field of study

Background: Typically, the first phase of a genome wide association study (GWAS) includes genotyping across hundreds of individuals and validation of the most significant SNPs. Allelotyping of pooled genomic DNA is a common approach to reduce the overall cost of the study. Knowledge of haplotype structure can provide additional information to single locus analyses. Several methods have been proposed for estimating haplotype frequencies in a population from pooled DNA data. Results: We introduce a technique for haplotype frequency estimation in a population from pooled DNA samples focusing on datasets containing a small number of individuals per pool (2 or 3 individuals) and a large number of markers. We compare our method with the publicly available state-of-the-art algorithms HIPPO and HAPLOPOOL on datasets of varying number of pools and marker sizes. We demonstrate that our algorithm provides improvements in terms of accuracy and computational time over competing methods for large number of markers while demonstrating comparable performance for smaller marker sizes. Our method is implemented in the "Tree-Based Deterministic Sampling Pool" (TDSPool) package which is available for download at http://www.ee.columbia.edu/~anastas/tdspool. Conclusions: Using a tree-based determinstic sampling technique we present an algorithm for haplotype frequency estimation from pooled data. Our method demonstrates superior performance in datasets with large number of markers and could be the method of choice for haplotype frequency estimation in such datasets

Columbia University Academic Commons

Directory of Open Access Journals

PubMed Central

Recommended from our members

Maximum-parsimony haplotype frequencies inference based on a joint constrained sparse representation of pooled DNA

Author: Anastassiou Dimitris
Iliadis Alexandros
Jajamovich Guido H.
Wang Xiaodong
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2013
Field of study

Background: DNA pooling constitutes a cost effective alternative in genome wide association studies. In DNA pooling, equimolar amounts of DNA from different individuals are mixed into one sample and the frequency of each allele in each position is observed in a single genotype experiment. The identification of haplotype frequencies from pooled data in addition to single locus analysis is of separate interest within these studies as haplotypes could increase statistical power and provide additional insight. Results: We developed a method for maximum-parsimony haplotype frequency estimation from pooled DNA data based on the sparse representation of the DNA pools in a dictionary of haplotypes. Extensions to scenarios where data is noisy or even missing are also presented. The resulting method is first applied to simulated data based on the haplotypes and their associated frequencies of the AGT gene. We further evaluate our methodology on datasets consisting of SNPs from the first 7Mb of the HapMap CEU population. Noise and missing data were further introduced in the datasets in order to test the extensions of the proposed method. Both HIPPO and HAPLOPOOL were also applied to these datasets to compare performances. Conclusions: We evaluate our methodology on scenarios where pooling is more efficient relative to individual genotyping; that is, in datasets that contain pools with a small number of individuals. We show that in such scenarios our methodology outperforms state-of-the-art methods such as HIPPO and HAPLOPOOL

Columbia University Academic Commons

Springer - Publisher Connector

PubMed Central