34 research outputs found

    Analysis of epistasis in human complex traits

    Get PDF
    Thousands of genetic mutations have been associated with many human complex traits and diseases, improving our understanding of the biological mechanisms underlying these phenotypes. The great majority of genetic association studies have focused exclusively on the direct effects of single mutations, ignoring possible interactions (epistasis). However, since genes operate within complex networks, interactions are expected to exist. The modelling of epistasis could further biological understanding, but the detection of such effects is complicated by a vast search space. In this thesis, we present a new statistical method to detect genetic interactions affecting quantitative traits in large-scale datasets. Our approach is based on testing for an interaction between a variant and a polygenic score (PGS) comprising a group of other mutations. We develop a new computational algorithm for PGS construction, and show through simulations that this method is robust to false-positives while retaining statistical power. We apply our approach to 97 quantitative traits in the UK Biobank (UKB) and find 144 independent interactions with the PGS for 52 different traits, including important variants known to affect disease risk at the APOE, FTO and LDLR genes, for example. We also develop a test to identify, for each variant interacting with the PGS, the variants driving that interaction. This recovers previously-known interactions and identifies several novel signals, primarily for biomarker traits. An example is a large network of genes (including ABO, ASGR1, FUT2, FUT6, PIGC and TREH) affecting alkaline phosphatase levels, or an interaction between IL33 and ALOX15 impacting eosinophil count, potentially implicated in asthma. Lastly, we extend our analysis to a new dataset of imputed variation at HLA genes in the UKB and find, among others, a new interaction for glycated haemoglobin involving HLA-DQA1*03:01, an allele previously associated with diabetes. Our results demonstrate the potential for detecting epistatic effects in presently-available genomic datasets. This can allow the uncovering of key 'core' genes modulating the impacts of other regions in the genome, as well as the identification of subgroups of interacting variants of likely functional relevance

    Visualization of biological data: Infrastructure, design and application

    Get PDF
    Visualization is an important component of biological data analysis. Ideally, visual methods are tightly integrated with analysis methods, so that it is seamless to plot data from different intermediate stages of the analysis. Bioconductor provides a substantial analysis platform, but limited tools for genomic data visualization. Visual tools for genomic data, eg GenomeView, IGV, IGB, primarily are detached from the analysis engine. This research fills this gap, by developing visualization methods that are integrated into the Bioconductor suite. There are three main components of the research: * New visual tools for genomic data that utilize the latest research in visualization. * Infrastructure development to support the visual tools, and analysis of other types of biological data. * Application of the visualization methods to the analysis of RNA-seq and DNA-seq data

    The molecular genetic basis of the association of TNFSF4 with SLE

    Get PDF
    The tumour necrosis factor ligand superfamily member 4 gene (TNFSF4), also known as OX40L, is an established susceptibility locus in the autoimmune disease systemic lupus erythematosus (SLE). Genetic association studies map polymorphisms that associate with disease, but linkage disequilibrium often hinders the identification of the actual casual allele(s) at a disease susceptibility locus. At TNFSF4 genetic association studies had shown that an extended 100kb haplotype upstream of the coding region of the gene was associated with SLE risk. The principle aim of the project was to conduct genetic association analyses in cohorts with different ancestry in an attempt to fine map the TNFSF4 association signal and thereby identify the causal genetic variants that underlie the genetic risk. Utilizing >17,900 subjects of European, African-American, Hispanic-American and Southeast Asian ancestry a transancestral fine mapping analysis was performed. The results demonstrate the strong association of TNFSF4 risk alleles in all populations tested. The most consistent and strongest evidence of association came from the single nucleotide polymorphism (SNP), rs2205960-T (P = 7.1 x 10-32, odds ratio = 1.63). This variant was also associated with autoantibody production in three independent cohorts. In silico analysis of the DNA sequence encompassing rs2205960-T predicts it to form part of a decameric motif, which binds the RelA (p65) component of the NF-κB transcription factor complex. A second associated SNP, rs16845607-A in TNFSF4 intron 1 was identified in Hispanic-Americans (P = 9.17 x 10-9, odds ratio = 2.06). In an attempt to further refine the association, resequencing was performed in 80 individuals who were selected on the basis of their genotype to carry risk or non-risk haplotypes upstream of TNFSF4. This sequencing study identified >200 novel variants, mostly small insertion-deletion polymorphisms indels. The data presented in this thesis largely resolves the genetic basis of the immediate upstream association signal observed at TNFSF4 with SLE and will facilitate the unraveling of the molecular basis of this genetic risk in systemic autoimmunity.Open Acces

    Translational software infrastructure for medical genetics

    Get PDF
    Diep in de kern van onze cellen zetelt het desoxyribonucleïnezuur (DNA) molecuul die bekend staat als het genoom.DNA codeert de informatie die het leven laat groeien, overleven, diversifiëren en evolueren.Helaas kunnen dezelfde mechanismes die ons laten aanpassen aan een veranderende omgeving ook genetische aandoeningen veroorzaken.Hoewel we in staat zijn een aantal van deze aandoeningen op te sporen door moderne technologische vorderingen, moet er nog veel ontdekt en begrepen worden.Dit proefschrift draagt software infrastructuur aan om de moleculaire oorzaak van genetische aandoeningen te onderzoeken, laat zien hoe nieuwe bevindingen vertaald worden van fundamenteel onderzoek naar nieuwe software voor genoom diagnostiek, en introduceert een raamwerk voor genetische analyses die de automatisering en validatie van nieuwe software ondersteunt voor toepassing in de patientenzorg.Eerst ontwikkelen we datamodellen en software die helpt te bepalen welke gebieden op het genoom verantwoordelijk zijn voor ziektes en andere fysieke kenmerken.Vervolgens trekken we deze principes door naar modelorganismen.Door moleculaire gelijkenissen te gebruiken, ontdekken we nieuwe manieren om nematodes in te zetten voor onderzoek naar menselijke ziektes.Daarnaast kunnen we onze kennis van het genoom en de evolutie gebruiken om te voorspellen hoe pathogeen nieuwe mutaties zijn.Het resultaat is een publieke website waar DNA snel en accuraat gescand kan worden op mogelijk ziekteverwekkende mutaties.Tenslotte presenteren we een compleet systeem voor geautomatiseerde DNA analyse, inclusief een protocol specifiek voor genoom diagnostiek om overzichtelijke patient rapportages te produceren voor medisch experts waarmee een diagnose sneller en makkelijker gesteld kan worden.Deep inside the core of our cells resides the deoxyribonucleic acid (DNA) molecule known as the genome.DNA encodes the information that allows life to grow, survive, diversify and evolve.Unfortunately, the same mechanisms that let us adapt to a changing environment can also cause genetic disorders.While we are able to diagnose a number of these disorders using modern technological advancements, much remains to be discovered and understood.This thesis presents software infrastructure for investigating the molecular etiology of genetic disease using data from model organisms, demonstrates how to translate findings from fundamental research into new software tools for genome diagnostics, and introduces a downstream genome analysis framework that assists the automation and validation of the latest tools for applied patient care.We first develop data models and software to help determine which region of the genome is responsible for diseases and other physical traits.We then extend these principles towards model organisms.By using molecular similarities, we discover new ways to use nematodes for research into human diseases.Additionally, we can use our knowledge of the genome and evolution to predict how pathogenic new mutations are.The result is a public website where DNA can be scanned quickly and accurately for probable pathogenic mutations.Finally, we present a complete system for automated DNA analysis, including a protocol specific for genome diagnostics to produce clear patient reports for medical experts with which a diagnosis is made faster and easier

    The Genetic Architecture of Structural Renal and Urinary Tract Malformations

    Get PDF
    Structural renal and urinary tract malformations are the most common cause of kidney failure in children. These congenital anomalies of the kidneys and urinary tract (CAKUT) are a phenotypically diverse group of malformations that result from defects in embryonic kidney, ureter, and bladder development. A genetic basis for CAKUT has been proposed, with over 50 monogenic causes reported, however, a molecular diagnosis is detected in less than 20% of patients. In this thesis, I used bioinformatics and statistical genetics methodology to investigate the genetic architecture of structural renal and urinary tract malformations using whole-genome sequencing (WGS) data from the 100,000 Genomes Project. Population-based rare and common variant association testing was performed in over 800 cases and 20,000 controls of diverse ancestry seeking enrichment of single-nucleotide/indel and structural variation on a genome-wide, per-gene, and cis-regulatory element basis. Using a sequencing-based genome-wide association study (GWAS) I identified the first robust genetic associations of posterior urethral valves (PUV), the most common cause of kidney failure in boys. Bayesian fine-mapping and functional annotation mapped these two loci to the transcription factor TBX5 and planar cell polarity gene PTK7, with both signals replicated in an independent cohort. Significant enrichment of rare structural variation affecting cis-regulatory elements was also detected providing novel insights into the pathogenesis of this poorly understood disorder. I also demonstrated that the contribution of known monogenic disease to CAKUT has been overestimated and that common and low-frequency variation plays an important role in phenotypic variability. These findings support an omnigenic rather than monogenic model of inheritance for CAKUT and are consistent with the extensive genotypic-phenotypic heterogeneity, variable expressivity, and incomplete penetrance observed in this condition. Finally, this work demonstrates the value of sequencing-based GWAS methodology in rare disease, beyond conventional monogenic gene discovery, and provides strong support for an inclusive diverse-ancestry approach

    Immersive analytics for oncology patient cohorts

    Get PDF
    This thesis proposes a novel interactive immersive analytics tool and methods to interrogate the cancer patient cohort in an immersive virtual environment, namely Virtual Reality to Observe Oncology data Models (VROOM). The overall objective is to develop an immersive analytics platform, which includes a data analytics pipeline from raw gene expression data to immersive visualisation on virtual and augmented reality platforms utilising a game engine. Unity3D has been used to implement the visualisation. Work in this thesis could provide oncologists and clinicians with an interactive visualisation and visual analytics platform that helps them to drive their analysis in treatment efficacy and achieve the goal of evidence-based personalised medicine. The thesis integrates the latest discovery and development in cancer patients’ prognoses, immersive technologies, machine learning, decision support system and interactive visualisation to form an immersive analytics platform of complex genomic data. For this thesis, the experimental paradigm that will be followed is in understanding transcriptomics in cancer samples. This thesis specifically investigates gene expression data to determine the biological similarity revealed by the patient's tumour samples' transcriptomic profiles revealing the active genes in different patients. In summary, the thesis contributes to i) a novel immersive analytics platform for patient cohort data interrogation in similarity space where the similarity space is based on the patient's biological and genomic similarity; ii) an effective immersive environment optimisation design based on the usability study of exocentric and egocentric visualisation, audio and sound design optimisation; iii) an integration of trusted and familiar 2D biomedical visual analytics methods into the immersive environment; iv) novel use of the game theory as the decision-making system engine to help the analytics process, and application of the optimal transport theory in missing data imputation to ensure the preservation of data distribution; and v) case studies to showcase the real-world application of the visualisation and its effectiveness
    corecore