159 research outputs found

    A note on the simultaneous computation of thousands of Pearson’s Chi^2-statistics

    Get PDF
    In genetic association studies, important and common goals are the identification of single nucleotide polymorphisms (SNPs) showing a distribution that differs between several groups and the detection of SNPs with a coherent pattern. In the former situation, tens of thousands of SNPs should be tested, whereas in the latter case typically several ten SNPs are considered leading to thousands of statistics that need to be computed. A test statistic appropriate for both goals is Pearson’s Chi^2-statistic. However, computing this (or another) statistic for each SNP or pair of SNPs separately is very time-consuming. In this article, we show how simple matrix computation can be employed to calculate the Chi^2-statistic for all SNPs simultaneously

    Empirical Bayes analysis of single nucleotide polymorphisms

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>An important goal of whole-genome studies concerned with single nucleotide polymorphisms (SNPs) is the identification of SNPs associated with a covariate of interest such as the case-control status or the type of cancer. Since these studies often comprise the genotypes of hundreds of thousands of SNPs, methods are required that can cope with the corresponding multiple testing problem. For the analysis of gene expression data, approaches such as the empirical Bayes analysis of microarrays have been developed particularly for the detection of genes associated with the response. However, the empirical Bayes analysis of microarrays has only been suggested for binary responses when considering expression values, i.e. continuous predictors.</p> <p>Results</p> <p>In this paper, we propose a modification of this empirical Bayes analysis that can be used to analyze high-dimensional categorical SNP data. This approach along with a generalized version of the original empirical Bayes method are available in the R package siggenes version 1.10.0 and later that can be downloaded from <url>http://www.bioconductor.org</url>.</p> <p>Conclusion</p> <p>As applications to two subsets of the HapMap data show, the empirical Bayes analysis of microarrays cannot only be used to analyze continuous gene expression data, but also be applied to categorical SNP data, where the response is not restricted to be binary. In association studies in which typically several ten to a few hundred SNPs are considered, our approach can furthermore be employed to test interactions of SNPs. Moreover, the posterior probabilities resulting from the empirical Bayes analysis of (prespecified) interactions/genotypes can also be used to quantify the importance of these interactions.</p

    Detecting high-order interactions of single nucleotide polymorphisms using genetic programming

    Get PDF
    Motivation: Not individual single nucleotide polymorphisms (SNPs), but high-order interactions of SNPs are assumed to be responsible for complex diseases such as cancer. Therefore, one of the major goals of genetic association studies concerned with such genotype data is the identification of these high-order interactions. This search is additionally impeded by the fact that these interactions often are only explanatory for a relatively small subgroup of patients. Most of the feature selection methods proposed in the literature, unfortunately, fail at this task, since they can either only identify individual variables or interactions of a low order, or try to find rules that are explanatory for a high percentage of the observations. In this paper, we present a procedure based on genetic programming and multi-valued logic that enables the identification of high-order interactions of categorical variables such as SNPs. This method called GPAS (Genetic Programming for Association Studies) cannot only be used for feature selection, but can also be employed for discrimination. Results: In an application to the genotype data from the GENICA study, an association study concerned with sporadic breast cancer, GPAS is able to identify high-order interactions of SNPs leading to a considerably increased breast cancer risk for different subsets of patients that are not found by other feature selection methods. As an application to a subset of the HapMap data shows, GPAS is not restricted to association studies comprising several ten SNPs, but can also be employed to analyze whole-genome data. --

    The role of common genetic variants for predicting the modulation of cardiovascular outcomes

    Get PDF
    Attrition is a major issue in the drug development process with 79% of clinical failures due to safety and efficacy concerns. Genetic research can provide supporting evidence of a clear causal relationship between the drug target and disease or reveal unintended effects through associations with non-relevant phenotypes informing on potential drug safety. However, due to the underlying genetic architecture, it is often unclear which gene or variant in the loci identified through genetic analyses is driving the association. Due to recent advancements in CRISPR-Cas9 gene-editing, it is now possible to relatively easily perform whole gene knock-out studies and single base-edits to validate genetic findings of the most likely causal variant and gene. Utilising a combination of genetic approaches and functional studies can provide supporting evidence of the therapeutic profile and potential effects of drug therapies and improve our overall understanding of biological pathways and disease mechanisms. The primary aim of this thesis is to provide genetic data to support the ongoing clinical development of hypoxia-inducible factor (HIF)-prolyl hydroxylase inhibitors (PHIs) for treating anaemia of chronic kidney disease (CKD). Genome-wide association studies (GWAS) were used to identify genetic variants lying within or nearby genes encoding the drug target (prolyl hydroxylase [PHD] enzymes). These identified variants were used in Mendelian Randomisation analysis and phenome-wide association studies to genetically mirror the pharmaceutical effects of PHIs and investigate cardiovascular safety. Functional validation studies were employed to functionally validate a genetic variant for use as a proxy and to obtain a better understanding of the downstream causal pathways and biological mechanisms of the drug target. In summary, this thesis demonstrates how a combination of genetic analyses and functional validation studies is a powerful approach to validate GWAS results and further characterise therapeutic effects. This PhD project identified relevant genetic markers to genetically proxy therapeutic modulation of biomarker levels through PHD inhibition and could potentially inform further research using patient-level clinical data from Phase III trials

    Discovering Biomarkers of Alzheimer's Disease by Statistical Learning Approaches

    Get PDF
    In this work, statistical learning approaches are exploited to discover biomarkers for Alzheimer's disease (AD). The contributions has been made in the fields of both biomarker and software driven studies. Surprising discoveries were made in the field of blood-based biomarker search. With the inclusion of existing biological knowledge and a proposed novel feature selection method, several blood-based protein models were discovered to have promising ability to separate AD patients from healthy individuals. A new statistical pattern was discovered which can be potential new guideline for diagnosis methodology. In the field of brain-based biomarker, the positive contribution of covariates such as age, gender and APOE genotype to a AD classifier was verified, as well as the discovery of panel of highly informative biomarkers comprising 26 RNA transcripts. The classifier trained by the panetl of genes shows excellent capacity in discriminating patients from control. Apart from biomarker driven studies, the development of statistical packages or application were also involved. R package metaUnion was designed and developed to provide advanced meta-analytic approach applicable for microarray data. This package overcomes the defects appearing in previous meta-analytic packages { 1) the neglection of missing data, 2) the in exibility of feature dimension 3) the lack of functions to support post-analysis summary. R package metaUnion has been applied in a published study as part of the integrated genomic approaches and resulted in significant findings. To provide benchmark references about significance of features for dementia researchers, a web-based platform AlzExpress was built to provide researchers with granular level of differential expression test and meta-analysis results. A combination of fashionable big data technologies and robust data mining algorithms make AlzExpress flexible, scalable and comprehensive platform of valuable bioinformatics in dementia research.Plymouth Universit

    Molecular Prognostic Markers in Uveal Melanoma: Expression Profiling and Genomic Studies

    Get PDF
    Uveal Melanomas (UMs) arise from melanocytes. This cell type originates from neural crest cells and thereby uveal melanomas share their origin with pheochromocytomas, neuroblastomas, paragangliomas and cutaneous melanomas, other tumors that develop from neural crest originating cells. Uveal melanoma is the most common primary tumor in the eye with an incidence of approximately 7 per million every year in the Western World. In adults, 80 percent of all intraocular tumors are uveal melanomas. The mean age at diagnosis is 60 years. Most uveal melanomas arise in the ciliary body (23%), or the choroid (72%) and a small fraction originates in the iris (5%). Predispositions for UM are a light eye color, fair skin color and ability to tan, which are all related with a fair phenotype. Despite advances in treatment with enucleation, preenucleation radiotherapy, stereotactic radiotherapy, brachytherapy, charged particle irradiation, thermo therapy and local eye wall resection, the mortality rate has not changed significantly. As many as 50% of all newly diagnosed patients will die from distant metastases, which are mainly located in the liver

    Comprehensive Molecular Characterization of Muscle-Invasive Bladder Cancer

    Get PDF
    We report a comprehensive analysis of 412 muscle-invasive bladder cancers characterized by multiple TCGA analytical platforms. Fifty-eight genes were significantly mutated, and the overall mutational load was associated with APOBEC-signature mutagenesis. Clustering by mutation signature identified a high-mutation subset with 75% 5-year survival. mRNA expression clustering refined prior clustering analyses and identified a poor-survival “neuronal” subtype in which the majority of tumors lacked small cell or neuroendocrine histology. Clustering by mRNA, long non-coding RNA (lncRNA), and miRNA expression converged to identify subsets with differential epithelial-mesenchymal transition status, carcinoma in situ scores, histologic features, and survival. Our analyses identified 5 expression subtypes that may stratify response to different treatments. A multiplatform analysis of 412 muscle-invasive bladder cancer patients provides insights into mutational profiles with prognostic value and establishes a framework associating distinct tumor subtypes with clinical options

    Deciphering causal genetic determinants of red blood cell traits

    Full text link
    Les études d’association pan-génomiques ont révélé plusieurs variants génétiques associés à des traits complexes. Les mesures érythrocytaires ont souvent fait l’objet de ce genre d’études, étant mesurées de façon routinière et précise. Comprendre comment les variations génétiques influencent ces phénotypes est primordial étant donné leur importance comme marqueurs cliniques et leur influence sur la sévérité de plusieurs maladies. En particulier, des niveaux élevés d’hémoglobine fœtal chez les patients atteints d’anémie falciforme est associé à une réduction des complications et une augmentation de l’espérance de vie. Néanmoins, la majorité des variants génétiques identifiés par ces études tombent à l’intérieur de régions génétiques non-codantes, augmentant la difficulté d’identifier des gènes causaux. L’objectif premier de ce projet est l’identification et la caractérisation de gènes influençant les traits complexes, et tout particulièrement les traits sanguins. Pour y arriver, j’ai tout d’abord développé une méthode permettant d’identifier et de tester l’effet de gènes knockouts sur les traits anthropométriques. Malgré un échantillon de grande taille, cette approche n’a révélé aucune association. Ensuite, j’ai caractérisé le méthylome et le transcriptome d’érythroblastes différentiés à partir de cellules souches hématopoïétiques et identifié plusieurs gènes potentiellement impliqués dans les programmes érythroïdes fœtaux et adultes. Par ailleurs, j’ai identifié plusieurs micro-ARNs montrant des motifs d’expression spécifiques entre les stages fœtaux et adultes et qui sont enrichis pour des cibles exprimées de façon opposée. Finalement, j’ai identifié plusieurs variants génétiques associés à l’expression de gènes dans les érythroblastes (eQTL). Cette étude a permis d’identifier des variants associés à l’expression du gène ATP2B4, qui encode le principal transporteur de calcium des érythrocytes. Ces variants, qui sont également associés à des traits sanguins et à la susceptibilité à la malaria, tombent dans un élément d’ADN spécifique aux cellules érythroïdes. La délétion de cet élément par le système CRISPR/Cas9 induit une forte diminution de l’expression du gène et une augmentation des niveaux de calcium intracellulaires. En conclusion, des échantillons de génotypages exhaustifs seront nécessaires pour étudier l’effet de gènes knockouts sur les traits complexes. Les érythroblastes montrent de grandes différences au niveau de leur méthylome et transcriptome entre les différents stages développementaux. Ces différences influencent potentiellement la régulation de l’hémoglobine fœtale et impliquent de nombreux micro-ARNs et régions régulatrices non-codantes. Finalement, l’exemple d’ATP2B4 montre qu’intégrer des études épigénomiques, transcriptomiques et des expériences d’édition de génome est une approche puissante pour caractériser des variants génétiques non-codants. Par ailleurs, ces résultats impliquent ATP2B4 dans l’hydratation des érythroblastes, qui est associé à la susceptibilité à la malaria et la sévérité de l’anémie falciforme. Cibler ATP2B4 de façon thérapeutique pourrait avoir un impact majeur sur ces maladies qui affectent des millions d’individus à travers le monde.Genome-wide association studies (GWAS) have revealed several genetic variants associated with complex phenotypes. This is the case for red blood cell (RBC) traits, which are particularly amenable to GWAS as they are routinely and accurately measured. Understanding RBC trait variation is important given their significance as clinical markers and modifiers of disease severity. Notably, increased fetal hemoglobin (HbF) production in sickle cell disease (SCD) patients is associated with a higher life expectancy and decreased morbidity. Nonetheless, most variants identified through GWAS fall in non-coding regions of the human genome, increasing the difficulty of identifying causal links. The main goal of this project was to identify and characterize genes influencing complex traits, and in particular RBC phenotypes. First, I developed an approach to identify and test potential gene knockouts affecting anthropometric traits in a large sample from the general population, which did not yield significant associations. Then, I characterized the DNA methylome and transcriptome of erythroblasts differentiated ex vivo from hematopoietic progenitor stem cells (HPSC), and identified several genes potentially implicated in fetal and adult-stage erythroid programs. I also identified microRNAs (miRNA) that show specific developmental expression patterns and that are enriched in inversely expressed targets. Finally, I mapped expression quantitative trait loci (eQTL) in erythroblasts, and identify erythroid-specific eQTLs for ATP2B4, the main calcium ATPase of RBCs. These genetic variants are associated with RBC traits and malaria susceptibly, and overlap an erythroid-specific enhancer of ATP2B4. Deletion of this regulatory element using CRISPR/Cas9 experiments in human erythroid cells minimized ATP2B4 expression and increased intracellular calcium levels. In conclusion, large and comprehensive genotyping datasets will be necessary to test the role of rare gene knockouts on complex phenotypes. The transcriptomes and DNA methylomes of erythroblasts show substantial differences correlating with their developmental stages and that may be implicated in HbF production. These results also suggest a strong implication of erythroid enhancers and miRNAs in developmental stage specificity. Finally, characterizing the erythroid-specific enhancer of ATP2B4 suggest that integrating epigenomic, transcriptomic and gene editing experiments can be a powerful approach to characterize non-coding genetic variants. These results implicate ATP2B4 in erythroid cell hydration, which is associated with malaria susceptibility and SCD severity, suggesting that therapies targeting this gene could impact diseases affecting millions of individuals worldwide
    corecore