Search CORE

12,538 research outputs found

Learning the optimal scale for GWAS through hierarchical SNP aggregation

Author: Ambroise Christophe
Guinot Florent
Samson Franck
Szafranski Marie
Publication venue
Publication date: 01/01/2018
Field of study

Motivation: Genome-Wide Association Studies (GWAS) seek to identify causal genomic variants associated with rare human diseases. The classical statistical approach for detecting these variants is based on univariate hypothesis testing, with healthy individuals being tested against affected individuals at each locus. Given that an individual's genotype is characterized by up to one million SNPs, this approach lacks precision, since it may yield a large number of false positives that can lead to erroneous conclusions about genetic associations with the disease. One way to improve the detection of true genetic associations is to reduce the number of hypotheses to be tested by grouping SNPs. Results: We propose a dimension-reduction approach which can be applied in the context of GWAS by making use of the haplotype structure of the human genome. We compare our method with standard univariate and multivariate approaches on both synthetic and real GWAS data, and we show that reducing the dimension of the predictor matrix by aggregating SNPs gives a greater precision in the detection of associations between the phenotype and genomic regions

arXiv.org e-Print Archive

HAL Evry

Directory of Open Access Journals

HAL Descartes

Hal-Diderot

Partition Decoupling for Multi-gene Analysis of Gene Expression Profiling Data

Author: Braun Rosemary
Leibon Gregory
Pauls Scott
Rockmore Daniel
Publication venue
Publication date: 01/01/2011
Field of study

We present the extention and application of a new unsupervised statistical learning technique--the Partition Decoupling Method--to gene expression data. Because it has the ability to reveal non-linear and non-convex geometries present in the data, the PDM is an improvement over typical gene expression analysis algorithms, permitting a multi-gene analysis that can reveal phenotypic differences even when the individual genes do not exhibit differential expression. Here, we apply the PDM to publicly-available gene expression data sets, and demonstrate that we are able to identify cell types and treatments with higher accuracy than is obtained through other approaches. By applying it in a pathway-by-pathway fashion, we demonstrate how the PDM may be used to find sets of mechanistically-related genes that discriminate phenotypes.Comment: Revise

arXiv.org e-Print Archive

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Dartmouth Digital Commons (Dartmouth College)

Recommended from our members

Integrative analysis of the inter-tumoral heterogeneity of triple-negative breast cancer.

Author: Boymoushakian Lari
Chiu Alec M
Coller Hilary A
Mitra Mithun
Publication venue: eScholarship, University of California
Publication date: 01/01/2018
Field of study

Triple-negative breast cancers (TNBC) lack estrogen and progesterone receptors and HER2 amplification, and are resistant to therapies that target these receptors. Tumors from TNBC patients are heterogeneous based on genetic variations, tumor histology, and clinical outcomes. We used high throughput genomic data for TNBC patients (n = 137) from TCGA to characterize inter-tumor heterogeneity. Similarity network fusion (SNF)-based integrative clustering combining gene expression, miRNA expression, and copy number variation, revealed three distinct patient clusters. Integrating multiple types of data resulted in more distinct clusters than analyses with a single datatype. Whereas most TNBCs are classified by PAM50 as basal subtype, one of the clusters was enriched in the non-basal PAM50 subtypes, exhibited more aggressive clinical features and had a distinctive signature of oncogenic mutations, miRNAs and expressed genes. Our analyses provide a new classification scheme for TNBC based on multiple omics datasets and provide insight into molecular features that underlie TNBC heterogeneity

eScholarship - University of California