Search CORE

6,815 research outputs found

An efficient genome-wide association test for multivariate phenotypes based on the Fisher combination function

Author: A Birnbaum
A Buu
Anne Buu
CC Brown
D Hedeker
DF Morrison
F Liu
H Hotelling
H Smith
I Olkin
IT Jolliffe
J Liu
J Mullahy
James J. Yang
Jia Li
JJ Yang
K Wang
L. Keoki Williams
LJ Bierut
LN He
M McGue
MB Brown
MD Mailman
MX Li
N Solovieff
NW Galwey
P Good
R Core Team
RA Fisher
RC Littell
RC Littell
RJ Simes
S van der Sluis
SL Zeger
SL Zeger
The 1000 Genomes Project Consortium
The International HapMap Consortium
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

A Quadratically Regularized Functional Canonical Correlation Analysis for Identifying the Global Structure of Pleiotropy with NGS Data

Author: Fan Ruzong
Lin Nan
Xiong Momiao
Zhu Yun
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 15/09/2016
Field of study

Investigating the pleiotropic effects of genetic variants can increase statistical power, provide important information to achieve deep understanding of the complex genetic structures of disease, and offer powerful tools for designing effective treatments with fewer side effects. However, the current multiple phenotype association analysis paradigm lacks breadth (number of phenotypes and genetic variants jointly analyzed at the same time) and depth (hierarchical structure of phenotype and genotypes). A key issue for high dimensional pleiotropic analysis is to effectively extract informative internal representation and features from high dimensional genotype and phenotype data. To explore multiple levels of representations of genetic variants, learn their internal patterns involved in the disease development, and overcome critical barriers in advancing the development of novel statistical methods and computational algorithms for genetic pleiotropic analysis, we proposed a new framework referred to as a quadratically regularized functional CCA (QRFCCA) for association analysis which combines three approaches: (1) quadratically regularized matrix factorization, (2) functional data analysis and (3) canonical correlation analysis (CCA). Large-scale simulations show that the QRFCCA has a much higher power than that of the nine competing statistics while retaining the appropriate type 1 errors. To further evaluate performance, the QRFCCA and nine other statistics are applied to the whole genome sequencing dataset from the TwinsUK study. We identify a total of 79 genes with rare variants and 67 genes with common variants significantly associated with the 46 traits using QRFCCA. The results show that the QRFCCA substantially outperforms the nine other statistics.Comment: 64 pages including 12 figure

arXiv.org e-Print Archive

Directory of Open Access Journals

FigShare

Sparse Probit Linear Mixed Model

Author: Cunningham John P.
Kloft Marius
Lippert Christoph
Mandt Stephan
Nakajima Shinichi
Wenzel Florian
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 17/07/2017
Field of study

Linear Mixed Models (LMMs) are important tools in statistical genetics. When used for feature selection, they allow to find a sparse set of genetic traits that best predict a continuous phenotype of interest, while simultaneously correcting for various confounding factors such as age, ethnicity and population structure. Formulated as models for linear regression, LMMs have been restricted to continuous phenotypes. We introduce the Sparse Probit Linear Mixed Model (Probit-LMM), where we generalize the LMM modeling paradigm to binary phenotypes. As a technical challenge, the model no longer possesses a closed-form likelihood function. In this paper, we present a scalable approximate inference algorithm that lets us fit the model to high-dimensional data sets. We show on three real-world examples from different domains that in the setup of binary labels, our algorithm leads to better prediction accuracies and also selects features which show less correlation with the confounding factors.Comment: Published version, 21 pages, 6 figure

arXiv.org e-Print Archive

MDC Repository

가족 기반 희귀 변이 연관 분석을 위한 분석 알고리즘 개발

Author: Longfei Wang
Publication venue: 서울대학교 대학원
Publication date: 01/08/2019
Field of study

학위논문(박사)--서울대학교 대학원 :자연과학대학 협동과정 생물정보학전공,2019. 8. 원성호.수많은 전장유전체연관분석(GWAS)에도 불구하고 질병연관 유전체변이(DSL)는 제한적으로만 발견되었는데 이는 실종된 질병유전성(missing heritability)에 기인한다. 한 번에 긴 리드(read)를 시퀀싱하는 기술은 이를 보완해 줄 것으로 기대되어 왔으며, 이 기술의 발달 덕분에 유전체연관분석을 활용하여 여러 희귀(rare) 및 일반(common) 인과 변이를 발견할 수 있었다. 그러나 꽤 많은 샘플을 이용한 실험에서도 단일 변이를 대상으로한 전장유전체연관분석은 부정오류(false negative) 문제에서 자유로울 수 없다. 이에 희귀변이 연관 분석의 검정력을 증가시키기 위해 생물학적으로 연관이 있는 위치의 여러 유전체변이를 하나로 합쳐서 분석하는 방법들이 제안되었다. 버든 검정(burden test), 분산구조 검정(variance component test), 결합 옴니버스 검정(combined omnibus test) 등의 위치기반 연관 분석이 바로 그것이다. 희귀변이 연관분석에 위와 같은 분석방법을 활용하면 검정력이 크게 증가하여 더 많은 질병연관 유전체 변이를 발견할 수 있을 것으로 기대되어왔다. 하지만 샘플 간 유전적 이질성의 존재와 상대적으로 샘플 수가 적은 한계들 때문에 매우 적은 수의 변이 만이 발견되었다. 이러한 문제점을 해결하기 위해 다양한 방법들이 개발되었는데, 그 중 하나는 가족기반 분석 방법으로 이는 샘플 간 유전적 이질성과 집단층화 문제를 다루는데 용이하다. 두 번째로 서로 다른 표현형이 서로 관련이 있을 경우 검정력을 증가시키기 위해 이들을 한번에 분석하는 방법이 있다. 세 번째는 메타분석을 활용하여 여러 연구의 결과를 합치는 방법으로 이는 많은 연구들에서 효과적임이 밝혀졌다. 이 논문에서는 현재 많이 사용되고 있는 여러 가족기반 희귀변이 연관 분석 방법을 비교하였고 다른 방법들에 비해 FARVAT 이 통계적으로 견고하며 계산 효율적인 방법임을 보였다. 더 나아가 이를 다중 표현형 분석 방법(mFARVAT)과 메타분석 방법(metaFARVAT)으로 확장하였다. mFARVAT은 유사우도함수 기반 스코어 테스트(quasi-likelihood-based score test)를 다수의 표현형에 적용하는 희귀질환 연관분석 방법으로 표현형들에 대한 각 변이의 동질성 및 이질성 효과를 검증한다. metaFARVAT은 여러 연구에서의 유도함수 스코어를 결합하여 버든 통계량, 변이 임계(variable threshold) 통계량, 분산구조 통계량, 결합 옴니버스 통계량을 생성한다. 이는 여러 연구들의 결과를 이용하여 변이들의 동질성 및 이질성 효과를 검증하며, 정량 표현형 및 이분 표현형에 적용이 가능하다. 다양한 시나리오 하에서의 광범위한 모의 실험을 통해 제안한 방법들이 일반적으로 견고하고 효율적이라는 것을 보였다. 또한 이 방법을 활용하여DLEC1 등의 만성폐쇄성폐질환(COPD) 관련 후보 유전자를 발견하였다.Despite of tens of thousands of genome wide association studies (GWASs), the so-called missing heritability reveals that analyses of common variants identified only a limited number of disease susceptibility loci and a substantial amount of causal variants remain undiscovered by GWASs. Sequencing technology was expected to supply this additional information by obtaining large stretches of DNA spanning the entire genome, and improvements in this technology have enabled genetic association analysis of rare/common causal variants. However, single variant association tests commonly used by GWAS result in false negative findings unless very large samples are available. Alternatively, aggregation of association signals across multiple genetic variants in a biology relevant region is expected to boost statistical power for rare variant analysis. Numerous statistical methods have been proposed for region-based rare variant association studies, such as burden, variance component, and combined omnibus tests. Region-based association tests are expected to substantially improve statistical power for rare variant analyses and to identify additional disease susceptibility loci. However, very few significant results have been identified due to genetic heterogeneity and relatively small sample sizes. To address the limitations, various approaches have been developed. First, family-based designs play an important role in controlling genetic heterogeneity and population stratification. Second, disease status are often diagnosed by the outcomes of different but related phenotypes, and thus multiple phenotype analysis is supposed to provide additional information and increase power. Third, for the small sample issue, combining results from multiple studies using meta-analysis has been repeatedly addressed as an effective strategy. In this study, I compared the performance of a selection of the popular family-based rare variant association tests and found FARVAT is the most statistically robust and computationally efficient method. Besides, I extended FARVAT for multiple phenotype analysis (mFARVAT), and meta-analysis (metaFARVAT). mFARVAT is a quasi-likelihood-based score test for rare variant association analysis with multiple phenotypes, and tests both homogeneous and heterogeneous effects of each variant on multiple phenotypes. metaFARVAT combines quasi-likelihood scores from multiple studies and generates burden, variable threshold, variance component, and combined omnibus test statistics. metaFARVAT tests homogeneous and heterogeneous genetic effects of variants among different studies and can be applied to both quantitative and dichotomous phenotypes. With extensive simulation studies under various scenarios, I found that the proposed methods are generally robust and efficient with different underlying genetic architectures, and I identified some promising candidate genes associated with chronic obstructive pulmonary disease, including DLEC1.Abstract i Contents iv List of Figures vii List of Tables viii 1 Introduction 1 1.1 The background on rare variant association studies 1 1.1.1 Overview of rare variant association studies 1 1.1.2 Challenges of rare variant association studies 8 1.2 Purpose of this study 12 1.3 Outline of the thesis 15 2 Overview of family-based rare variant association tests 16 2.1 Overview of family-based association studies 16 2.2 Comparison of the selected family-based rare variant association tests 21 2.2.1 Rare Variant Transmission Disequilibrium Test (RV-TDT) 24 2.2.2 Generalized Estimating Equations based Kernel Machine test (GEE-KM) 25 2.2.3 Combined Multivariate and Collapsing test for Pedigrees (PedCMC) 26 2.2.4 Gene-level kernel and burden tests for Pedigrees (PedGene) 27 2.2.5 FAmily-based Rare Variant Association Test (FARVAT) 28 2.2.6 Comparison of the methods with GAW19 data 30 2.3 Conclusions 38 3 Family-based Rare Variant Association Test for Multivariate Phenotypes 39 3.1 Introduction 39 3.2 Methods 40 3.2.1 Notations and the disease model 40 3.2.2 Choice of offset 42 3.2.3 Score for quasi-likelihood 43 3.2.4 Homogeneous mFARVAT 44 3.2.5 Heterogeneous mFARVAT 47 3.3 Simulation study 51 3.3.1 The simulation model 51 3.3.2 Evaluation of mFARVAT with simulated data 55 3.4 Application to COPD data 78 3.5 Discussion 85 4 Family-based Rare Variant Association Test for Meta-analysis 90 4.1 Introduction 90 4.2 Methods 92 4.2.1 Notation 92 4.2.2 Choices of Offset 93 4.2.3 Score for Quasi-likelihood 94 4.2.4 Homogeneous Model 95 4.2.5 Heterogeneous Model 98 4.3 Simulation study 101 4.3.1 The simulation model 101 4.3.2 Evaluation of metaFARVAT with simulated data 104 4.4 Application to COPD data 124 4.5 Discussion 132 5 Summary & Conclusions 145 Bibliography 149 Abstract (Korean) 156Docto

SNU Open Repository and Archive

Multi view based imaging genetics analysis on Parkinson disease

Author: Altmann Andre
Avesani Simone
Cerri Guglielmo
Giugno Rosalba
Oxtoby Neil P.
Tognon Manuel
Publication venue: 'Cold Spring Harbor Laboratory'
Publication date: 01/01/2021
Field of study

Longitudinal studies integrating imaging and genetic data have recently become widespread among bioinformatics researchers. Combining such heterogeneous data allows a better understanding of complex diseases origins and causes. Through a multi-view based workflow proposal, we show the common steps and tools used in imaging genetics analysis, interpolating genotyping, neuroimaging and transcriptomic data. We describe the advantages of existing methods to analyze heterogeneous datasets, using Parkinson\u2019s Disease (PD) as a case study. Parkinson's disease is associated with both genetic and neuroimaging factors, however such imaging genetics associations are at an early investigation stage. Therefore it is desirable to have a free and open source workflow that integrates different analysis flows in order to recover potential genetic biomarkers in PD, as in other complex diseases

Catalogo dei prodotti della ricerca