5,530 research outputs found

    Gene Association Mapping in the Era of Next-Generation Sequencing and Systems Biology

    Get PDF
    In the past decade, advancement of genotyping technology, first microarray then “next-generation” sequencing, has enabled scientists to examine the susceptible genes that contribute to the risk of complex disorders using a genome-wide, “hypothesis free” strategy. However, despite this “hypothesis free” label, these genome-wide approaches (including genome-wide association and whole genome sequencing studies) depend on two implicit assumptions. The first assumption is that the genetic risk of complex traits is contributed by independent genes/variants (assumption of independence).The second assumption is that different genes have equal potentiality to confer to the genetic predisposition of the complex traits (assumption of equality). Despite the huge success in susceptible gene association mapping in the last decade, more and more evidence has indicated that these two underlying assumptions of these genome-wide approaches may not be sound. Other than just studying one locus at a time, alternative methods which can carry out global analyses of biological molecules in populations have been developed to understand the influence of the whole biological system on complex traits. Network based approaches, in particular, have proven informative. This dissertation will cover a few important issues concerning sequencing based study design and its applications in chapter II, III and IV. Human protein-protein interaction network will be constructed and a few of human gene network related issues will be studied and discussed in chapter V and VI. Abstracts for each chapter were summarized as followed. Chapter 2: In this chapter, we proposed a two-stage, gene-based method for association mapping of rare variants by applying four different non-collapsing algorithms. Using the Genome Analysis Workshop 18 whole genome sequencing dataset of simulated blood pressure phenotypes, we studied and contrasted the false positive rate of each algorithm using receiver operating characteristic curves. The statistical power of these methods was also evaluated and compared through the analysis of 200 simulated replications in a smaller genotype data set. We showed that the Fisher’s method was superior to the other three 3 non-collapsing methods, but was no better than the standard method implemented with famSKAT. Chapter 3: In this chapter, we aimed to identify potential susceptibility variants for bipolar disorder via the combination of exome sequencing and linkage analysis on 6 related subjects from a four-generation family. Our study identified a list of five potential candidate genes for bipolar disorder. Among these five genes, GRID1 (Glutamate Receptor Delta-1 Subunit), which was previously reported to be associated with several psychiatric disorders and brain related traits, is of particular interest. Our findings suggest a potential role for these genes and the related rare variants in the onset and development of bipolar disorder in this one family. Chapter 4: In this chapter, we investigated the potential of FMO genes to confer risk of nicotine dependence via deep targeted sequencing in 2,820 study subjects comprising of nicotine 1,583 dependents and 1,237 controls from European and African Americans. Specifically, we focused on the two genomic segments including FMO1, FMO3 and the pseudo gene FMO6P, and aimed to investigate the potential association between FMO genes and nicotine dependence. We identified different clusters of significant common variants in European (with most significant SNP rs6674596, P=0.0004, OR=0.67, MAF_EA=0.14) and African Americans (with the most significant SNP rs6608453, P=0.001, OR=0.64, MAF_AA=0.1). Most of the significant variants identified were SNPs located within intronic regions or with unknown functional significance. Chapter 5: In this chapter, we aimed to investigate the followed three scientific questions: 1) Can centrality reflect the biological significance of genes in a general human gene network? 2) Among these four commonly used centrality measures, does any of them outperform others? 3) Will they do better if we combine several centrality measures together using machine learning algorithms? To answer these scientific questions, we constructed a comprehensive human gene-gene network using protein-protein interaction data. Four essential gene sets were extracted from a variety of data sources serving as true answers in the evaluation and optimization process. Our analytic results indicated that there is a connection between the essentiality and centrality of human genes. A pattern of strong correlations was identified among the four commonly used centrality measures for a general human PPI network and the performance of each centrality measure was similar to others serving as predictors of the essentiality of genes. The improvement of the prediction models was limited when we combined several different centrality measures. Chapter 6: In this chapter, we aimed to investigate the potential enrichment pattern in centrality of susceptible genes for certain complex disorders in a functional specific sub-network. Gene expression data of human brain tissue recorded in the Human Protein Atlas were extracted and utilized to construct a series of brain function specific sub-networks. Susceptible genes from three categories of complex disorders, including neurodegenerative disorder, psychiatric disorder and non-brain related disorder, were extracted from the GWAS catalogue. We identified a significant enrichment pattern of high centrality of susceptibility genes contributing to neurodegenerative and psychiatric disorders in these sub-networks. Our findings indicate that susceptibility genes of complex disorder might have higher centralities in functional specific sub-networks

    Fifteen new risk loci for coronary artery disease highlight arterial-wall-specific mechanisms

    Get PDF
    Coronary artery disease (CAD) is a leading cause of morbidity and mortality worldwide. Although 58 genomic regions have been associated with CAD thus far, most of the heritability is unexplained, indicating that additional susceptibility loci await identification. An efficient discovery strategy may be larger-scale evaluation of promising associations suggested by genome-wide association studies (GWAS). Hence, we genotyped 56,309 participants using a targeted gene array derived from earlier GWAS results and performed meta-analysis of results with 194,427 participants previously genotyped, totaling 88,192 CAD cases and 162,544 controls. We identified 25 new SNP-CAD associations (P < 5 × 10(-8), in fixed-effects meta-analysis) from 15 genomic regions, including SNPs in or near genes involved in cellular adhesion, leukocyte migration and atherosclerosis (PECAM1, rs1867624), coagulation and inflammation (PROCR, rs867186 (p.Ser219Gly)) and vascular smooth muscle cell differentiation (LMOD1, rs2820315). Correlation of these regions with cell-type-specific gene expression and plasma protein levels sheds light on potential disease mechanisms

    Definition of genetic and pathogenic mechanisms regulating neuroinflammation

    Get PDF
    Although complex inflammatory diseases affect 5% of the population we still do not understand fully the underlying disease triggers and mechanisms. This partly explains why current treatments are not curative but only modify disease. These diseases arise from combined genetic, environmental and unknown effects. In this thesis, I have focused on identifying the genetic factors that regulate complex disease in experimental models. The rationale is that these genetic determinants will provide insight into the conserved mechanisms also important for human disease. These mechanisms can then be targeted therapeutically. I have worked with the neuroinflammatory diseases multiple sclerosis (MS) and Guillain-­‐Barré syndrome (GBS) and their respective animal models, experimental autoimmune encephalomyelitis (EAE) and experimental autoimmune neuritis (EAN). To identify risk genes in an unbiased phenotype-­‐driven manner, we established intercrosses and recombinant lines between rat strains with opposing susceptibilities to EAE and EAN. Linkage analyses and functional studies in rat lines then successfully positioned five genes that regulate experimental neuroinflammation, namely Il22ra2, Vav1, Raet1, Klrk1 and Ncf1. IL22RA2 and VAV1 were also translated as risk genes in MS cohorts. More importantly, however, the five genes targeted immune mechanisms and events that correlated well with disease. In our hands, Il22ra2 regulated macrophage activation, Vav1 controlled effector T cell activity and regulatory T cell proportions, Raet1 and Klrk1 displayed a gene-­‐gene interaction that modified NK cell activity and Ncf1 controlled oxidative burst from mononuclear cells. All these mechanisms also have described roles in both MS and GBS. By finding genetic determinants of distinct pathogenic mechanisms we may both discover novel targets for treatment and also more accurately define which current therapies are more suitable for different patients

    Immunopathogenesis of rheumatoid arthritis

    Get PDF
    Rheumatoid arthritis (RA) is the most common inflammatory arthropathy. The majority of evidence, derived from genetics, tissue analyses, models, and clinical studies, points to an immune-mediated etiology associated with stromal tissue dysregulation that together propogate chronic inflammation and articular destruction. A pre-RA phase lasting months to years may be characterized by the presence of circulating autoantibodies, increasing concentration and range of inflammatory cytokines and chemokines, and altered metabolism. Clinical disease onset comprises synovitis and systemic comorbidities affecting the vasculature, metabolism, and bone. Targeted immune therapeutics and aggressive treatment strategies have substantially improved clinical outcomes and informed pathogenetic understanding, but no cure as yet exists. Herein we review recent data that support intriguing models of disease pathogenesis. They allude to the possibility of restoration of immunologic homeostasis and thus a state of tolerance associated with drug-free remission. This target represents a bold vision for the future of RA therapeutics

    Strategies for the intelligent integration of genetic variance information in multiscale models of neurodegenerative diseases

    Get PDF
    A more complete understanding of the genetic architecture of complex traits and diseases can maximize the utility of human genetics in disease screening, diagnosis, prognosis, and therapy. Undoubtedly, the identification of genetic variants linked to polygenic and complex diseases is of supreme interest for clinicians, geneticists, patients, and the public. Furthermore, determining how genetic variants affect an individual’s health and transmuting this knowledge into the development of new medicine can revolutionize the treatment of most common deleterious diseases. However, this requires the correlation of genetic variants with specific diseases, and accurate functional assessment of genetic variation in human DNA sequencing studies is still a nontrivial challenge in clinical genomics. Assigning functional consequences and clinical significances to genetic variants is an important step in human genome interpretation. The translation of the genetic variants into functional molecular mechanisms is essential in disease pathogenesis and, eventually in therapy design. Although various statistical methods are helpful to short-list the genetic variants for fine-mapping investigation, demonstrating their role in molecular mechanism requires knowledge of functional consequences. This undoubtedly requires comprehensive investigation. Experimental interpretation of all the observed genetic variants is still impractical. Thus, the prediction of functional and regulatory consequences of the genetic variants using in-silico approaches is an important step in the discovery of clinically actionable knowledge. Since the interactions between phenotypes and genotypes are multi-layered and biologically complex. Such associations present several challenges and simultaneously offer many opportunities to design new protocols for in-silico variant evaluation strategies. This thesis presents a comprehensive protocol based on a causal reasoning algorithm that harvests and integrates multifaceted genetic and biomedical knowledge with various types of entities from several resources and repositories to understand how genetic variants perturb molecular interaction, and initiate a disease mechanism. Firstly, as a case study of genetic susceptibility loci of Alzheimer’s disease, I reviewed and summarized all the existing methodologies for Genome Wide Association Studies (GWAS) interpretation, currently available algorithms, and computable modelling approaches. In addition, I formulated a new approach for modelling and simulations of genetic regulatory networks as an extension of the syntax of the Biological Expression Language (OpenBEL). This could allow the representation of genetic variation information in cause-and-effect models to predict the functional consequences of disease-associated genetic variants. Secondly, by using the new syntax of OpenBEL, I generated an OpenBEL model for Alzheimer´s Disease (AD) together with genetic variants including their DNA, RNA or protein position, variant type and associated allele. To better understand the role of genetic variants in a disease context, I subsequently tried to predict the consequences of genetic variation based on the functional context provided by the network model. I further explained that how genetic variation information could help to identify candidate molecular mechanisms for aetiologically complex diseases such as Alzheimer’s disease (AD) and Parkinson’s disease (PD). Though integration of genetic variation information can enhance the evidence base for shared pathophysiology pathways in complex diseases, I have addressed to one of the key questions, namely the role of shared genetic variants to initiate shared molecular mechanisms between neurodegenerative diseases. I systematically analysed shared genetic variation information of AD and PD and mapped them to find shared molecular aetiology between neurodegenerative diseases. My methodology highlighted that a comprehensive understanding of genetic variation needs integration and analysis of all omics data, in order to build a joint model to capture all datasets concurrently. Moreover genomic loci should be considered to investigate the effects of GWAS variants rather than an individual genetic variant, which is hard to predict in a biologically complex molecular mechanism, predominantly to investigate shared pathology

    DomainRBF: a Bayesian regression approach to the prioritization of candidate domains for complex diseases

    Get PDF
    BACKGROUND: Domains are basic units of proteins, and thus exploring associations between protein domains and human inherited diseases will greatly improve our understanding of the pathogenesis of human complex diseases and further benefit the medical prevention, diagnosis and treatment of these diseases. Within a given domain-domain interaction network, we make the assumption that similarities of disease phenotypes can be explained using proximities of domains associated with such diseases. Based on this assumption, we propose a Bayesian regression approach named domainRBF (domain Rank with Bayes Factor) to prioritize candidate domains for human complex diseases. RESULTS: Using a compiled dataset containing 1,614 associations between 671 domains and 1,145 disease phenotypes, we demonstrate the effectiveness of the proposed approach through three large-scale leave-one-out cross-validation experiments (random control, simulated linkage interval, and genome-wide scan), and we do so in terms of three criteria (precision, mean rank ratio, and AUC score). We further show that the proposed approach is robust to the parameters involved and the underlying domain-domain interaction network through a series of permutation tests. Once having assessed the validity of this approach, we show the possibility of ab initio inference of domain-disease associations and gene-disease associations, and we illustrate the strong agreement between our inferences and the evidences from genome-wide association studies for four common diseases (type 1 diabetes, type 2 diabetes, Crohn\u27s disease, and breast cancer). Finally, we provide a pre-calculated genome-wide landscape of associations between 5,490 protein domains and 5,080 human diseases and offer free access to this resource. CONCLUSIONS: The proposed approach effectively ranks susceptible domains among the top of the candidates, and it is robust to the parameters involved. The ab initio inference of domain-disease associations shows strong agreement with the evidence provided by genome-wide association studies. The predicted landscape provides a comprehensive understanding of associations between domains and human diseases

    Novel Comprehensive Bioinformatics Approaches to Determine the Molecular Genetic Susceptibility Profile of Moderate and Severe Asthma.

    Get PDF
    Asthma is a chronic inflammatory condition linked to hyperresponsiveness in the airways. There is currently no cure available for asthma, and therapy choices are limited. Asthma is the result of the interplay between genes and the environment. The exact molecular genetic mechanism of asthma remains elusive. The aim of this study is to provide a comprehensive, detailed molecular etiology profile for the molecular factors that regulate the severity of asthma and pathogenicity using integrative bioinformatics tools. The GSE43696 omnibus gene expression dataset, which contains 50 moderate cases, 38 severe cases, and 20 healthy controls, was used to investigate differentially expressed genes (DEGs), susceptible chromosomal loci, gene networks, pathways, gene ontologies, and protein-protein interactions (PPIs) using an intensive bioinformatics pipeline. The PPI network analysis yielded DEGs that contribute to interactions that differ from moderate-to-severe asthma. The combined interaction scores resulted in higher interactions for the genes and for moderate asthma and and for severe asthma. Enrichment analysis (EA) demonstrated differential enrichment between moderate and severe asthma phenotypes; the ion transport regulation pathway was significantly enhanced in severe asthma phenotypes compared to that in moderate asthma phenotypes and involved , and The most enriched common pathway in both moderate and severe asthma is the development of the glucocorticoid receptor (GR) signaling pathway followed by glucocorticoid-mediated inhibition of proinflammatory and proconstrictory signaling in the airway of smooth muscle cell pathways. Gene sets were shared between severe and moderate asthma at 16 chromosome locations, including 17p13.1, 16p11.2, 17q21.31, 1p36, and 19q13.2, while 60 and 48 chromosomal locations were unique for both moderate and severe asthma, respectively. Phylogenetic analysis for DEGs showed that several genes have been intersected in phases of asthma in the same cluster of genes. This could indicate that several asthma-associated genes have a common ancestor and could be linked to the same biological function or gene family, implying the importance of these genes in the pathogenesis of asthma. New genetic risk factors for the development of moderate-to-severe asthma were identified in this study, and these could provide a better understanding of the molecular pathology of asthma and might provide a platform for the treatment of asthma

    Statistical Inference for Propagation Processes on Complex Networks

    Get PDF
    Die Methoden der Netzwerktheorie erfreuen sich wachsender Beliebtheit, da sie die Darstellung von komplexen Systemen durch Netzwerke erlauben. Diese werden nur mit einer Menge von Knoten erfasst, die durch Kanten verbunden werden. Derzeit verfügbare Methoden beschränken sich hauptsächlich auf die deskriptive Analyse der Netzwerkstruktur. In der hier vorliegenden Arbeit werden verschiedene Ansätze für die Inferenz über Prozessen in komplexen Netzwerken vorgestellt. Diese Prozesse beeinflussen messbare Größen in Netzwerkknoten und werden durch eine Menge von Zufallszahlen beschrieben. Alle vorgestellten Methoden sind durch praktische Anwendungen motiviert, wie die Übertragung von Lebensmittelinfektionen, die Verbreitung von Zugverspätungen, oder auch die Regulierung von genetischen Effekten. Zunächst wird ein allgemeines dynamisches Metapopulationsmodell für die Verbreitung von Lebensmittelinfektionen vorgestellt, welches die lokalen Infektionsdynamiken mit den netzwerkbasierten Transportwegen von kontaminierten Lebensmitteln zusammenführt. Dieses Modell ermöglicht die effiziente Simulationen verschiedener realistischer Lebensmittelinfektionsepidemien. Zweitens wird ein explorativer Ansatz zur Ursprungsbestimmung von Verbreitungsprozessen entwickelt. Auf Grundlage einer netzwerkbasierten Redefinition der geodätischen Distanz können komplexe Verbreitungsmuster in ein systematisches, kreisrundes Ausbreitungsschema projiziert werden. Dies gilt genau dann, wenn der Ursprungsnetzwerkknoten als Bezugspunkt gewählt wird. Die Methode wird erfolgreich auf den EHEC/HUS Epidemie 2011 in Deutschland angewandt. Die Ergebnisse legen nahe, dass die Methode die aufwändigen Standarduntersuchungen bei Lebensmittelinfektionsepidemien sinnvoll ergänzen kann. Zudem kann dieser explorative Ansatz zur Identifikation von Ursprungsverspätungen in Transportnetzwerken angewandt werden. Die Ergebnisse von umfangreichen Simulationsstudien mit verschiedenstensten Übertragungsmechanismen lassen auf eine allgemeine Anwendbarkeit des Ansatzes bei der Ursprungsbestimmung von Verbreitungsprozessen in vielfältigen Bereichen hoffen. Schließlich wird gezeigt, dass kernelbasierte Methoden eine Alternative für die statistische Analyse von Prozessen in Netzwerken darstellen können. Es wurde ein netzwerkbasierter Kern für den logistischen Kernel Machine Test entwickelt, welcher die nahtlose Integration von biologischem Wissen in die Analyse von Daten aus genomweiten Assoziationsstudien erlaubt. Die Methode wird erfolgreich bei der Analyse genetischer Ursachen für rheumatische Arthritis und Lungenkrebs getestet. Zusammenfassend machen die Ergebnisse der vorgestellten Methoden deutlich, dass die Netzwerk-theoretische Analyse von Verbreitungsprozessen einen wesentlichen Beitrag zur Beantwortung verschiedenster Fragestellungen in unterschiedlichen Anwendungen liefern kann

    Large–scale data–driven network analysis of human–plasmodium falciparum interactome: extracting essential targets and processes for malaria drug discovery

    Get PDF
    Background: Plasmodium falciparum malaria is an infectious disease considered to have great impact on public health due to its associated high mortality rates especially in sub Saharan Africa. Falciparum drugresistant strains, notably, to chloroquine and sulfadoxine-pyrimethamine in Africa is traced mainly to Southeast Asia where artemisinin resistance rate is increasing. Although careful surveillance to monitor the emergence and spread of artemisinin-resistant parasite strains in Africa is on-going, research into new drugs, particularly, for African populations, is critical since there is no replaceable drug for artemisinin combination therapies (ACTs) yet. Objective: The overall objective of this study is to identify potential protein targets through host–pathogen protein–protein functional interaction network analysis to understand the underlying mechanisms of drug failure and identify those essential targets that can play their role in predicting potential drug candidates specific to the African populations through a protein-based approach of both host and Plasmodium falciparum genomic analysis. Methods: We leveraged malaria-specific genome wide association study summary statistics data obtained from Gambia, Kenya and Malawi populations, Plasmodium falciparum selective pressure variants and functional datasets (protein sequences, interologs, host-pathogen intra-organism and host-pathogen inter-organism protein-protein interactions (PPIs)) from various sources (STRING, Reactome, HPID, Uniprot, IntAct and literature) to construct overlapping functional network for both host and pathogen. Developed algorithms and a large-scale data-driven computational framework were used in this study to analyze the datasets and the constructed networks to identify densely connected subnetworks or hubs essential for network stability and integrity. The host-pathogen network was analyzed to elucidate the influence of parasite candidate key proteins within the network and predict possible resistant pathways due to host-pathogen candidate key protein interactions. We performed biological and pathway enrichment analysis on critical proteins identified to elucidate their functions. In order to leverage disease-target-drug relationships to identify potential repurposable already approved drug candidates that could be used to treat malaria, pharmaceutical datasets from drug bank were explored using semantic similarity approach based of target–associated biological processes Results: About 600,000 significant SNPs (p-value< 0.05) from the summary statistics data were mapped to their associated genes, and we identified 79 human-associated malaria genes. The assembled parasite network comprised of 8 clusters containing 799 functional interactions between 155 reviewed proteins of which 5 clusters contained 43 key proteins (selective variants) and 2 clusters contained 2 candidate key proteins(key proteins characterized by high centrality measure), C6KTB7 and C6KTD2. The human network comprised of 32 clusters containing 4,133,136 interactions between 20,329 unique reviewed proteins of which 7 clusters contained 760 key proteins and 2 clusters contained 6 significant human malaria-associated candidate key proteins or genes P22301 (IL10), P05362 (ICAM1), P01375 (TNF), P30480 (HLA-B), P16284 (PECAM1), O00206 (TLR4). The generated host-pathogen network comprised of 31,512 functional interactions between 8,023 host and pathogen proteins. We also explored the association of pfk13 gene within the host-pathogen. We observed that pfk13 cluster with host kelch–like proteins and other regulatory genes but no direct association with our identified host candidate key malaria targets. We implemented semantic similarity based approach complemented by Kappa and Jaccard statistical measure to identify 115 malaria–similar diseases and 26 potential repurposable drug hits that can be 3 appropriated experimentally for malaria treatment. Conclusion: In this study, we reviewed existing antimalarial drugs and resistance–associated variants contributing to the diminished sensitivity of antimalarials, especially chloroquine, sulfadoxine-pyrimethamine and artemisinin combination therapy within the African population. We also described various computational techniques implemented in predicting drug targets and leads in drug research. In our data analysis, we showed that possible mechanisms of resistance to artemisinin in Africa may arise from the combinatorial effects of many resistant genes to chloroquine and sulfadoxine–pyrimethamine. We investigated the role of pfk13 within the host–pathogen network. We predicted key targets that have been proposed to be essential for malaria drug and vaccine development through structural and functional analysis of host and pathogen function networks. Based on our analysis, we propose these targets as essential co-targets for combinatorial malaria drug discovery
    corecore