Search CORE

15 research outputs found

Fractal Characterizations of MAX Statistical Distribution in Genetic Association Studies

Author: Azzalini A.
Heisenberg W.
Mao J.
Press W. H.
Sham P.
Sokal R. K.
Thomas D. C.
Weir B. S.
Publication venue: 'World Scientific Pub Co Pte Lt'
Publication date: 14/09/2009
Field of study

Two non-integer parameters are defined for MAX statistics, which are maxima of

d

simpler test statistics. The first parameter,

d_{MAX}

, is the fractional number of tests, representing the equivalent numbers of independent tests in MAX. If the

d

tests are dependent,

d_{MAX} < d

. The second parameter is the fractional degrees of freedom

k

of the chi-square distribution

\chi^2_k

that fits the MAX null distribution. These two parameters,

d_{MAX}

and

k

, can be independently defined, and

k

can be non-integer even if

d_{MAX}

is an integer. We illustrate these two parameters using the example of MAX2 and MAX3 statistics in genetic case-control studies. We speculate that

k

is related to the amount of ambiguity of the model inferred by the test. In the case-control genetic association, tests with low

k

(e.g.

k=1

) are able to provide definitive information about the disease model, as versus tests with high

k

(e.g.

k=2

) that are completely uncertain about the disease model. Similar to Heisenberg's uncertain principle, the ability to infer disease model and the ability to detect significant association may not be simultaneously optimized, and

k

seems to measure the level of their balance

arXiv.org e-Print Archive

Crossref

Application of Volcano Plots in Analyses of mRNA Differential Expressions with Microarrays

Author: Alvord W. G.
Auer P. L.
Chen Y.
Chen Z.
Cohen J.
Fechner G. T.
Guyon I.
Göhlmann H.
Lee J.
Li C.
Schwender H.
Smyth G. K.
Snedecor G. W.
Trevino V.
Vandesompele J.
Welsh B. L.
WENTIAN LI
Zhao C.
Publication venue: 'World Scientific Pub Co Pte Lt'
Publication date: 28/08/2013
Field of study

Volcano plot displays unstandardized signal (e.g. log-fold-change) against noise-adjusted/standardized signal (e.g. t-statistic or -log10(p-value) from the t test). We review the basic and an interactive use of the volcano plot, and its crucial role in understanding the regularized t-statistic. The joint filtering gene selection criterion based on regularized statistics has a curved discriminant line in the volcano plot, as compared to the two perpendicular lines for the "double filtering" criterion. This review attempts to provide an unifying framework for discussions on alternative measures of differential expression, improved methods for estimating variance, and visual display of a microarray analysis result. We also discuss the possibility to apply volcano plots to other fields beyond microarray.Comment: 8 figure

arXiv.org e-Print Archive

Crossref

Association Mapping Approach into Type 2 Diabetes using Biomarkers and Clinical Data

Author: Abdulaimma B
Aday Curbelo Montañez C
Al-Jumeily D
Fergus P
Hind J
Hussain A
Radi N
Publication venue
Publication date
Field of study

The global growth in incidence of Type 2 Diabetes (T2D) has become a major international health concern. As such, understanding the aetiology of Type 2 Diabetes is vital. This paper investigates a variety of statistical method-ologies at various level of complexity to analyse genotype data and identify bi-omarkers that show evidence of increase susceptibility to T2D and related traits. A critical overview of several selected statistical methods for population-based association mapping particularly case-control genetic association analysis is pre-sented. A discussion on a dataset accessed in this paper that includes 3435 female subjects for cases and controls with genotype information across 879071 Single Nucleotide Polymorphism (SNPs) is presented. Quality control steps into the dataset through pre-processing phase are performed to remove samples and markers that failed the quality control test. Association analysis is discussed to address which statistical method can be appropriate to the dataset. Our genetic association analysis produces promising results and indicated that Allelic asso-ciation test showed one SNP above the genome-wide significance threshold of 5×10−8 which is rs10519107 (Odds Ratio (OR)=0.7409,P−Value (P)=1.813×10−9), While, there are several SNPs above the suggestive association threshold of 5×10−6 these SNPs could worth further investigation. Furthermore, Logistic Regression analysis adjusted for multiple confounder factors indicated that none of the genotyped SNPs has passed genome-wide significance threshold of 5×10−8 . Nevertheless, four SNPs (rs10519107, rs4368343, rs6848779, rs11729955) have passed suggestive association threshold

LJMU Research Online (Liverpool John Moores University)

Exploring Case-Control Genetic Association Tests Using Phase Diagrams

Author: Amos
Amos
Balding
Blackwelder
Devlin
Falconer
Fisher
Freidlin
Gibbs
Gibbs
González
Kuo
Lewis
Li
Li
Lifshitz
Parkes
Sasieni
Scheet
Slager
Suh
Tapper
Todd
Tokuhiro
Ury
Weir
Wellcome Trust Case Control Consortium
Wellcome Trust Case Control Consortium & Australo-Anglo-American Spondylitis Consortium (TASC)
Wentian Li
Wilcox
Wittke-Thompson
Yamada
Yaning Yang
Young Ju Suh
Zeggini
Zheng
Zheng
Zheng
Publication venue: 'Elsevier BV'
Publication date: 14/09/2009
Field of study

Background: By a new concept called "phase diagram", we compare two commonly used genotype-based tests for case-control genetic analysis, one is a Cochran-Armitage trend test (CAT test at

x=0.5

, or CAT0.5) and another (called MAX2) is the maximization of two chi-square test results: one from the two-by-two genotype count table that combines the baseline homozygotes and heterozygotes, and another from the table that combines heterozygotes with risk homozygotes. CAT0.5 is more suitable for multiplicative disease models and MAX2 is better for dominant/recessive models. Methods: We define the CAT0.5-MAX2 phase diagram on the disease model space such that regions where MAX2 is more powerful than CAT0.5 are separated from regions where the CAT0.5 is more powerful, and the task is to choose the appropriate parameterization to make the separation possible. Results: We find that using the difference of allele frequencies (

\delta_p

) and the difference of Hardy-Weinberg disequilibrium coefficients (

\delta_\epsilon

) can separate the two phases well, and the phase boundaries are determined by the angle

tan^{-1}(\delta_p/\delta_\epsilon)

, which is an improvement over the disease model selection using

\delta_\epsilon

only. Conclusions: We argue that phase diagrams similar to the one for CAT0.5-MAX2 have graphical appeals in understanding power performance of various tests, clarifying simulation schemes, summarizing case-control datasets, and guessing the possible mode of inheritance

arXiv.org e-Print Archive

Crossref

SAERMA: Stacked Autoencoder Rule Mining Algorithm for the Interpretation of Epistatic Interactions in GWAS for Extreme Obesity

Author: Abdulaimma B
Chalmers C
falciani F
Fergus P
Malim N
Montenaz C
Reilly D
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 27/08/2019
Field of study

One of the most important challenges in the analysis of high-throughput genetic data is the development of efficient computational methods to identify statistically significant Single Nucleotide Polymorphisms (SNPs). Genome-wide association studies (GWAS) use single-locus analysis where each SNP is independently tested for association with phenotypes. The limitation with this approach, however, is its inability to explain genetic variation in complex diseases. Alternative approaches are required to model the intricate relationships between SNPs. Our proposed approach extends GWAS by combining deep learning stacked autoencoders (SAEs) and association rule mining (ARM) to identify epistatic interactions between SNPs. Following traditional GWAS quality control and association analysis, the most significant SNPs are selected and used in the subsequent analysis to investigate epistasis. SAERMA controls the classification results produced in the final fully connected multi-layer feedforward artificial neural network (MLP) by manipulating the interestingness measures, support and confidence, in the rule generation process. The best classification results were achieved with 204 SNPs compressed to 100 units (77% AUC, 77% SE, 68% SP, 53% Gini, logloss=0.58, and MSE=0.20), although it was possible to achieve 73% AUC (77% SE, 63% SP, 45% Gini, logloss=0.62, and MSE=0.21) with 50 hidden units - both supported by close model interpretation

arXiv.org e-Print Archive

LJMU Research Online (Liverpool John Moores University)

University of Liverpool Repository

Crossref

Deep Learning Classification of Polygenic Obesity using Genome Wide Association Study SNPs

Author: Al-Jumeily D
Chalmers C
Curbelo Montañez CA
Fergus P
Hussain A
Montañez AC
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date
Field of study

In this paper, association results from genome-wide association studies (GWAS) are combined with a deep learning framework to test the predictive capacity of statistically significant single nucleotide polymorphism (SNPs) associated with obesity phenotype. Our approach demonstrates the potential of deep learning as a powerful framework for GWAS analysis that can capture information about SNPs and the important interactions between them. Basic statistical methods and techniques for the analysis of genetic SNP data from population-based genome-wide studies have been considered. Statistical association testing between individual SNPs and obesity was conducted under an additive model using logistic regression. Four subsets of loci after quality-control (QC) and association analysis were selected: P-values lower than 1x10-5 (5 SNPs), 1x10-4 (32 SNPs), 1x10-3 (248 SNPs) and 1x10-2 (2465 SNPs). A deep learning classifier is initialised using these sets of SNPs and fine-tuned to classify obese and non-obese observations. Using a deep learning classifier model and genetic variants with P-value < 1x10-2 (2465 SNPs) it was possible to obtain results (SE=0.9604, SP=0.9712, Gini=0.9817, LogLoss=0.1150, AUC=0.9908 and MSE=0.0300). As the P-value increased, an evident deterioration in performance was observed. Results demonstrate that single SNP analysis fails to capture the cumulative effect of less significant variants and their overall contribution to the outcome in disease prediction, which is captured using a deep learning framework

LJMU Research Online (Liverpool John Moores University)

Copy-number-variation and copy-number-alteration region detection by cumulative plots

Author: A Grigoriev
A Kallioniemi
A Ulgen
Annette Lee
B Xu
C Barnes
C Berthelsen
C Melodelima
C Zhang
CK Peng
CR Marshall
D Botstein
D Peiffer
DA Peiffer
E Eichler
G Wilson
H Döhner
J Fickett
J Freeman
J Fridlyand
J Gibson
J Huang
J Lupski
J Ott
J Sebat
J Simon-Sanchez
JR Vermeesch
K Wang
LA Weiss
M Lin
N Carter
N Chiorazzi
N Rosenberg
P Bernaola-Galván
P Cahan
P Szatmari
Peter K Gregersen
R Beroukhim
R Zhang
S Colella
S Sutrala
T Lencz
T Walsh
TL Newman
W Li
W Li
W Li
W Li
Wentian Li
X Zhao
Y Nannya
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2009
Field of study

Background: Regions with copy number variations (in germline cells) or copy number alteration (in somatic cells) are of great interest for human disease gene mapping and cancer studies. They represent a new type of mutation and are larger-scaled than the single nucleotide polymorphisms. Using genotyping microarray for copy number variation detection has become standard, and there is a need for improving analysis methods. Results: We apply the cumulative plot to the detection of regions with copy number variation/alteration, on samples taken from a chronic lymphocytic leukemia patient. Two sets of whole-genome genotyping of 317k single nucleotide polymorphisms, one from the normal cell and another from the cancer cell, are analyzed. We demonstrate the utility of cumulative plot in detecting a 9Mb (9 x 10^6 bases) hemizygous deletion and 1Mb homozygous deletion on chromosome 13. We also show the possibility to detect smaller copy number variation/alteration regions below the 100kb range. Conclusions: As a graphic tool, the cumulative plot is an intuitive and a scale-free (window-less) way for detecting copy number variation/alteration regions, especially when such regions are small

arXiv.org e-Print Archive

Crossref

Springer - Publisher Connector

PubMed Central