Search CORE

59 research outputs found

Bias Characterization in Probabilistic Genotype Data and Improved Signal Detection with Multiple Imputation

Author: Palmer Cameron Douglas
Pe’er Itsik
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2016
Field of study

Missing data are an unavoidable component of modern statistical genetics. Different array or sequencing technologies cover different single nucleotide polymorphisms (SNPs), leading to a complicated mosaic pattern of missingness where both individual genotypes and entire SNPs are sporadically absent. Such missing data patterns cannot be ignored without introducing bias, yet cannot be inferred exclusively from nonmissing data. In genome-wide association studies, the accepted solution to missingness is to impute missing data using external reference haplotypes. The resulting probabilistic genotypes may be analyzed in the place of genotype calls. A general-purpose paradigm, called Multiple Imputation (MI), is known to model uncertainty in many contexts, yet it is not widely used in association studies. Here, we undertake a systematic evaluation of existing imputed data analysis methods and MI. We characterize biases related to uncertainty in association studies, and find that bias is introduced both at the imputation level, when imputation algorithms generate inconsistent genotype probabilities, and at the association level, when analysis methods inadequately model genotype uncertainty. We find that MI performs at least as well as existing methods or in some cases much better, and provides a straightforward paradigm for adapting existing genotype association methods to uncertain data

Crossref

Columbia University Academic Commons

Directory of Open Access Journals

PubMed Central

FigShare

The variance of identity-by-descent sharing in the Wright-Fisher model

Author: Ariel Darvasi
Bennet
Hollenbeck
Itsik Pe’er
Kong
Pier Francesco Palamara
Shai Carmi
Todd Lencz
Vladimir Vacic
Publication venue: 'Genetics Society of America'
Publication date: 12/08/2013
Field of study

Widespread sharing of long, identical-by-descent (IBD) genetic segments is a hallmark of populations that have experienced recent genetic drift. Detection of these IBD segments has recently become feasible, enabling a wide range of applications from phasing and imputation to demographic inference. Here, we study the distribution of IBD sharing in the Wright-Fisher model. Specifically, using coalescent theory, we calculate the variance of the total sharing between random pairs of individuals. We then investigate the cohort-averaged sharing: the average total sharing between one individual and the rest of the cohort. We find that for large cohorts, the cohort-averaged sharing is distributed approximately normally. Surprisingly, the variance of this distribution does not vanish even for large cohorts, implying the existence of "hyper-sharing" individuals. The presence of such individuals has consequences for the design of sequencing studies, since, if they are selected for whole-genome sequencing, a larger fraction of the cohort can be subsequently imputed. We calculate the expected gain in power of imputation by IBD, and subsequently, in power to detect an association, when individuals are either randomly selected or specifically chosen to be the hyper-sharing individuals. Using our framework, we also compute the variance of an estimator of the population size that is based on the mean IBD sharing and the variance in the sharing between inbred siblings. Finally, we study IBD sharing in an admixture pulse model, and show that in the Ashkenazi Jewish population the admixture fraction is correlated with the cohort-averaged sharing.Comment: Includes Supplementary Materia

arXiv.org e-Print Archive

Crossref

Length Distributions of Identity by Descent Reveal Fine-Scale Demographic History

Author: Darvasi Ariel
Lencz Todd
Palamara Pier Francesco
Pe’er Itsik
Publication venue: The American Society of Human Genetics. Published by Elsevier Inc.
Publication date: 02/11/2012
Field of study

Data-driven studies of identity by descent (IBD) were recently enabled by high-resolution genomic data from large cohorts and scalable algorithms for IBD detection. Yet, haplotype sharing currently represents an underutilized source of information for population-genetics research. We present analytical results on the relationship between haplotype sharing across purportedly unrelated individuals and a population’s demographic history. We express the distribution of IBD sharing across pairs of individuals for segments of arbitrary length as a function of the population’s demography, and we derive an inference procedure to reconstruct such demographic history. The accuracy of the proposed reconstruction methodology was extensively tested on simulated data. We applied this methodology to two densely typed data sets: 500 Ashkenazi Jewish (AJ) individuals and 56 Kenyan Maasai (MKK) individuals (HapMap 3 data set). Reconstructing the demographic history of the AJ cohort, we recovered two subsequent population expansions, separated by a severe founder event, consistent with previous analysis of lower-throughput genetic data and historical accounts of AJ history. In the MKK cohort, high levels of cryptic relatedness were detected. The spectrum of IBD sharing is consistent with a demographic model in which several small-sized demes intermix through high migration rates and result in enrichment of shared long-range haplotypes. This scenario of historically structured demographies might explain the unexpected abundance of runs of homozygosity within several populations

Elsevier - Publisher Connector

PubMed Central

Current status of artificial intelligence methods for skin cancer survival analysis: a scoping review

Author: Brigit A. Lapolla
Caroline Chen
Celine M. Schreidah
Chunhua Weng
Chunhua Weng
Emily R. Gordon
George Bingham Reynolds
Herbert S. Chase
Itsik Pe’er
Itsik Pe’er
Itsik Pe’er
Joshua A. Kent
Larisa J. Geskin
Lauren M. Fahmy
Nicholas P. Tatonetti
Nicholas P. Tatonetti
Nicholas P. Tatonetti
Nicholas P. Tatonetti
Oluwaseyi Adeuyan
Publication venue: Frontiers Media S.A.
Publication date: 01/04/2024
Field of study

Skin cancer mortality rates continue to rise, and survival analysis is increasingly needed to understand who is at risk and what interventions improve outcomes. However, current statistical methods are limited by inability to synthesize multiple data types, such as patient genetics, clinical history, demographics, and pathology and reveal significant multimodal relationships through predictive algorithms. Advances in computing power and data science enabled the rise of artificial intelligence (AI), which synthesizes vast amounts of data and applies algorithms that enable personalized diagnostic approaches. Here, we analyze AI methods used in skin cancer survival analysis, focusing on supervised learning, unsupervised learning, deep learning, and natural language processing. We illustrate strengths and weaknesses of these approaches with examples. Our PubMed search yielded 14 publications meeting inclusion criteria for this scoping review. Most publications focused on melanoma, particularly histopathologic interpretation with deep learning. Such concentration on a single type of skin cancer amid increasing focus on deep learning highlight growing areas for innovation; however, it also demonstrates opportunity for additional analysis that addresses other types of cutaneous malignancies and expands the scope of prognostication to combine both genetic, histopathologic, and clinical data. Moreover, researchers may leverage multiple AI methods for enhanced benefit in analyses. Expanding AI to this arena may enable improved survival analysis, targeted treatments, and outcomes

Directory of Open Access Journals

Integrative eQTL-Based Analyses Reveal the Biology of Breast Cancer Risk Loci

Author: Aaron McKenna
Barbara Stranger
Itsik Pe’er
Ji-Heui Seo
Matthew L. Freedman
Myles Brown
Qiyuan Li
Svitlana Tyekucheva
Thomas LaFramboise
李奇渊
Publication venue: 'Elsevier BV'
Publication date: 31/01/2013
Field of study

该论文是在本文通讯作者美国哈佛大学医学院代纳法伯癌症中心马修.弗里德曼教授实验室完成的。Germline determinants of gene expression in tumors are infrequently studied due to the complexity of transcript regulation caused by somatically acquired alterations. We performed expression quantitative trait locus (eQTL)-based analyses using the multi-level information provided in The Cancer Genome Atlas (TCGA). Of the factors we measured, cis-acting eQTLs accounted for 1.2% of the total variation of tumor gene expression, while somatic copy-number alteration and CpG methylation accounted for 7.3% and 3.3%, respectively. eQTL analyses of 15 previously reported breast cancer risk loci resulted in the discovery of three variants that are significantly associated with transcript levels (false discovery rate [FDR] < 0.1). Our trans-based analysis identified an additional three risk loci to act through ESR1, MYC, and KLF4. These findings provide a more comprehensive picture of gene expression determinants in breast cancer as well as insights into the underlying biology of breast cancer risk loci

Elsevier - Publisher Connector

PubMed Central

Xiamen University Institutional Repository

Elevated GM3 plasma concentration in idiopathic Parkinson’s disease: A lipidomic analysis

Author: Alcalay Roy Nissim
Chan Robin Barry
Di Paolo Gilbert
Kang Un
Levy Oren Abraham
Liong Christopher
Marder Karen
Perotte Adler J.
Pe’er Itsik
Shim Hong Bin
Shorr Evan Jack
Waters Cheryl H.
Xu Yimeng
Zhou Bowen
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2017
Field of study

Parkinson’s disease (PD) is a common neurodegenerative disease whose pathological hallmark is the accumulation of intracellular α-synuclein aggregates in Lewy bodies. Lipid metabolism dysregulation may play a significant role in PD pathogenesis; however, large plasma lipidomic studies in PD are lacking. In the current study, we analyzed the lipidomic profile of plasma obtained from 150 idiopathic PD patients and 100 controls, taken from the ‘Spot’ study at Columbia University Medical Center in New York. Our mass spectrometry based analytical panel consisted of 520 lipid species from 39 lipid subclasses including all major classes of glycerophospholipids, sphingolipids, glycerolipids and sterols. Each lipid species was analyzed using a logistic regression model. The plasma concentrations of two lipid subclasses, triglycerides and monosialodihexosylganglioside (GM3), were different between PD and control participants. GM3 ganglioside concentration had the most significant difference between PD and controls (1.531±0.037 pmol/μl versus 1.337±0.040 pmol/μl respectively; p-value = 5.96E-04; q-value = 0.048; when normalized to total lipid: p-value = 2.890E-05; q-value = 2.933E-03). Next, we used a collection of 20 GM3 and glucosylceramide (GlcCer) species concentrations normalized to total lipid to perform a ROC curve analysis, and found that these lipids compare favorably with biomarkers reported in previous studies (AUC = 0.742 for males, AUC = 0.644 for females). Our results suggest that higher plasma GM3 levels are associated with PD. GM3 lies in the same glycosphingolipid metabolic pathway as GlcCer, a substrate of the enzyme glucocerebrosidase, which has been associated with PD. These findings are consistent with previous reports implicating lower glucocerebrosidase activity with PD risk

Crossref

Columbia University Academic Commons

Directory of Open Access Journals

PubMed Central

FigShare

A Hidden Markov Model for Copy Number Variant prediction from whole genome resequencing data

Author: A McKenna
B Langmead
C Alkan
C Xie
DR Bentley
ES Lander
F Hach
H Li
H Li
Itsik Pe’er
J Wang
JO Korbel
K Chen
P Medvedev
R Durbin
R Li
S Lee
S Sarin
S Yoon
Y Shen
Yiwei Gu
Yufeng Shen
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Motivation: Copy Number Variants (CNVs) are important genetic factors for studying human diseases. While high-throughput whole genome re-sequencing provides multiple lines of evidence for detecting CNVs, computational algorithms need to be tailored for different type or size of CNVs under different experimental designs. Results: To achieve optimal power and resolution of detecting CNVs at low depth of coverage, we implemented a Hidden Markov Model that integrates both depth of coverage and mate-pair relationship. The novelty of our algorithm is that we infer the likelihood of carrying a deletion jointly from multiple mate pairs in a region without the requirement of a single mate pairs being obvious outliers. By integrating all useful information in a comprehensive model, our method is able to detect medium-size deletions (200-2000bp) at low depth (<10× per sample). We applied the method to simulated data and demonstrate the power of detecting medium-size deletions is close to theoretical values. Availability: A program implemented in Java, Zinfandel, is available at http://www.cs.columbia.edu/~itsik/zinfandel

Crossref

Springer - Publisher Connector

Columbia University Academic Commons

PubMed Central

Recommended from our members

Extended haplotype association study in Crohn’s disease identifies a novel, Ashkenazi Jewish-specific missense mutation in the NF-κB pathway gene, HEATR3

Author: Abraham Clara
Brant Steven R.
Burberry Aaron
Cardinale Christopher J.
Cho Judy H.
Choi Murim
Chowers Yehuda
Desnick Robert J.
Evelyn Ng Sok Meng
Ferguson John
Gregersen Peter K.
Gusev Alexander
Hakonarson Hakon
Hui Ken Y.
Karban Amir
Katz Seymour
Lifton Richard P.
Mayer Lloyd
Nuñez Gabriel
Peter Inga
Pe’er Itsik
Silverberg Mark S.
Warner Neil
Waterman Matti
Zhang Wei
Zhao Hongyu
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 11/03/2014
Field of study

The Ashkenazi Jewish population has a several-fold higher prevalence of Crohn’s disease compared to non-Jewish European ancestry populations and has a unique genetic history. Haplotype association is critical to Crohn’s disease etiology in this population, most notably at NOD2, in which three causal, uncommon, and conditionally independent NOD2 variants reside on a shared background haplotype. We present an analysis of extended haplotypes which showed significantly greater association to Crohn’s disease in the Ashkenazi Jewish population compared to a non-Jewish population (145 haplotypes and no haplotypes with P-value < 10−3, respectively). Two haplotype regions, one each on chromosomes 16 and 21, conferred increased disease risk within established Crohn’s disease loci. We performed exome sequencing of 55 Ashkenazi Jewish individuals and follow-up genotyping focused on variants in these two regions. We observed Ashkenazi Jewish-specific nominal association at R755C in TRPM2 on chromosome 21. Within the chromosome 16 region, R642S of HEATR3 and rs9922362 of BRD7 showed genome-wide significance. Expression studies of HEATR3 demonstrated a positive role in NOD2-mediated NF-κB signaling. The BRD7 signal showed conditional dependence with only the downstream rare Crohn’s disease-causal variants in NOD2, but not with the background haplotype; this elaborates NOD2 as a key illustration of synthetic association

Harvard University - DASH

WGS-based telomere length analysis in Dutch family trios implicates stronger maternal inheritance and a role for RRM1 gene

Author: Abdellaoui A. (Abdel)
Amin N. (Najaf)
Arakelyan A. (Arsen)
Beekman M. (Marian)
Boomsma D.I. (Dorret)
Bot J. (Jan)
Bovenberg J.A. (Jasper)
Byelas H. (Heorhiy)
Cao H. (Hongzhi)
Cao S. (Sujie)
Chen R. (Ruoyan)
Cox D.R. (David R.)
Craen A.J.M. (Anton) de
de Bakker P.I.W. (Paul I. W.)
Deelen P. (Patrick)
Dijk F. (Freerk) van
Dijkstra M. (Martijn)
Du Y. (Yuanping)
Duijn C.M. (Cornelia) van
Dunnen J.T. (Johan) den
Elbers C.C. (Clara C.)
Enckevort D. (David) van
Estrada K. (Karol)
Francioli L.C. (Laurent)
Guryev V. (Victor)
Handsaker R.E. (Robert)
Hehir-Kwa J.Y. (Jayne)
Hofman A. (Albert)
Hormozdiari F. (Fereydoun)
Isaacs A. (Aaron)
Jan Hottenga J. (Jouke)
Kanterakis A. (Alexandros)
Karssen L.C. (Lennart)
Kattenberg M. (Mathijs)
Kayser M. (Manfred)
Kloosterman W.P. (Wigard)
Knijff P. (Peter) de
Koval V. (Vyacheslav)
Lameijer E.-W. (Eric-Wubbo)
Laros J.F.J. (Jeroen)
Li M. (Mingkun)
Li N. (Ning)
Li Q. (Qibin)
Li Y. (Yingrui)
Marschall T. (Tobias)
McCarroll S.A. (Steven A.)
Medina-Gomez C. (Carolina)
Mei H. (Hailiang)
Menelaou A. (Androniki)
Moed M.H. (Matthijs H.)
Neerincx P.B.T. (Pieter)
Nersisyan L. (Lilit)
Nijman I.J. (Isaac)
Nikoghosyan M. (Maria)
Ommen G.-J.B. (Gert-Jan) van
Oostra B. (Ben)
Palamara P.F. (Pier Francesco)
Pe’er I. (Itsik)
Pitts S.J. (Steven J.)
Platteel M. (Mathieu)
Polak P. (Paz)
Potluri S. (Shobha)
Pulit S.L. (Sara L.)
Renkens I. (Ivo)
Rivadeneira F. (Fernando)
Schaik B.D.C. (Barbera) van
Schönhuth A. (Alexander)
Slagboom P.E. (Eline)
Sohail M. (Mashaal)
Stoneking M. (Mark)
Suchiman H.E.D. (H. Eka D.)
Sundar P. (Purnima)
Sunyaev S.R. (Shamil R.)
Swertz M.A. (Morris A.)
The Genome of the Netherlands Consortium
Uitterlinden A.G. (André)
van den Berg L.H. (Leonard H.)
van der Velde K.J. (K. Joeri)
van Leeuwen E.M. (Elisabeth M.)
van Oven M. (Mannis)
van Setten J. (Jessica)
Veldink J. (Jan)
Vermaat M. (Martijn)
Vuzman D. (Dana)
Wang J. (Jun)
Wijmenga C. (Cisca)
Willemsen G. (Gonneke)
Ye K. (Kai)
Ye K. (Kai)
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 10/12/2019
Field of study

Telomere length (TL) regulation is an important factor in ageing, reproduction and cancer development. Genetic, hereditary and environmental factors regulating TL are currently widely investigated, however, their relative contribution to TL variability is still understudied. We have used whole genome sequencing data of 250 family trios from the Genome of the Netherlands project to perform computational measurement of TL and a series of regression and genome-wide association analyses to reveal TL inheritance patterns and associated genetic factors. Our results confirm that TL is a largely heritable trait, primarily with mother’s, and, to a lesser extent, with father’s TL having the strongest influence on the offspring. In this cohort, mother’s, but not father’s age at conception was positively linked to offspring TL. Age-related TL attrition of 40 bp/year had relatively small influence on TL variability. Finally, we have identified TL-associated variations in ribonuclease reductase catalytic subunit M1 (RRM1 gene), which is known to regulate telomere maintenance in yeast. We also highlight the importance of multivariate approach and the limitations of existing tools for the analysis of TL as a polygenic heritable quantitative trait

CWI's Institutional Repository

Erasmus University Digital Repository