61 research outputs found

    Genetic contribution to the aggregation of schizophrenia and bipolar disorder in multiplex consanguineous Pakistani pedigrees

    Full text link
    La schizophrĂ©nie (SCZ) et le trouble bipolaire (TB) sont des troubles mentaux graves qui prĂ©sentent tous deux des symptĂŽmes affectifs et psychotiques. La SCZ est un trouble psychotique primaire caractĂ©risĂ© par des symptĂŽmes d’idĂ©es dĂ©lirantes et d’hallucinations. Le TB est principalement un trouble de l'humeur primaire dĂ©fini des pĂ©riodes de manie et de dĂ©pression. En 2010, ces troubles contribuaient respectivement Ă  7,4% et 7,0% de la charge mondiale de morbiditĂ©. La prĂ©valence Ă©levĂ©e (~ 0,4% pour la SCZ et ~ 2,4% pour le TB) et la forte hĂ©ritabilitĂ© estimĂ©e (~ 80%) suggĂšrent toutes deux une forte influence gĂ©nĂ©tique. Les donnĂ©es disponibles dĂ©montrent qu'il existe des chevauchements gĂ©nĂ©tiques entre les deux conditions, mais Ă©galement des composantes gĂ©nĂ©tiques spĂ©cifiques Ă  chaque maladie. Au cours de la derniĂšre dĂ©cennie, des Ă©tudes d’association pan-gĂ©nomiques ont identifiĂ© des centaines de loci gĂ©nĂ©tiques associĂ©s Ă  ces maladies. De plus, d’autres mĂ©thodes ont permis de mettre en relief la contribution d’autres types de variations gĂ©nĂ©tiques comme les rares variations du nombre de copies (CNV), de rares polymorphismes de nuclĂ©otide simple (SNV) et des mutations de novo (MDN). Bien que notre connaissance de l'architecture gĂ©nĂ©tique de ces conditions est en progression, une grande partie de l'hĂ©ritabilitĂ© demeure toujours non rĂ©solue et inexpliquĂ©e. Une longue histoire de faible mĂ©lange gĂ©nĂ©tique combinĂ© Ă  la pratique rĂ©pandue de mariages consanguins (50% des unions sont consanguines) rend les familles pakistanaises prometteuses pour des Ă©tudes gĂ©nĂ©tiques mĂ©dicales basĂ©es sur la population. Des Ă©tudes Ă©pidĂ©miologiques ont dĂ©montrĂ© que la consanguinitĂ© est associĂ©e Ă  un risque accru de nombreux traits. L’étude de familles a largement Ă©tĂ© appliquĂ©e dans la cartographie gĂ©nĂ©tique des caractĂšres mendĂ©liens et complexes. Cependant, peu d’études ont eu recours Ă  de grandes familles consanguines multiplexes pour Ă©tudier en profondeur le rĂŽle de la consanguinitĂ© dans les troubles neuropsychiatriques tels que la SCZ et le TB. Les CNVs ont Ă©tĂ© impliquĂ©es dans la SCZ et le TB depuis la dĂ©couverte des dĂ©lĂ©tions 22q11.2. MalgrĂ© que ces derniers soient rares dans la population, ils contribuent de maniĂšre significative au risque. Des Ă©tudes d'association de CNV ont rĂ©vĂ©lĂ© un enrichissement de dĂ©lĂ©tions et de duplications rares et un taux plus Ă©levĂ© de CNV de novo dans les cas relatifs aux tĂ©moins. De plus, le sĂ©quençage du gĂ©nome de familles SCZ a rĂ©vĂ©lĂ© une charge accrue de rares CNVs exonics chez les sujets SCZ ainsi que de l'hĂ©tĂ©rogĂ©nĂ©itĂ© gĂ©nĂ©tique. L'utilisation de grandes familles de multiplexes pourrait ĂȘtre statistiquement puissante pour Ă©tudier le rĂŽle des CNVs co-sĂ©grĂ©gant avec la maladie et Ă©ventuellement pathogĂšnes. Afin de mieux comprendre l'hĂ©tĂ©rogĂ©nĂ©itĂ© gĂ©nĂ©tique et rĂ©soudre l’hĂ©ritabilitĂ© manquante de ces deux troubles mentaux, nous avons utilisĂ© du gĂ©notypage et du sĂ©quençage de l'exome afin d’examiner le profil gĂ©nĂ©tique de grandes gĂ©nĂ©alogies consanguines multiplexes d’origine Parkistanaise. Chacune de ces familles comportait plus de dix membres affectĂ©s par la SCZ ou le TB. Dans cette thĂšse, nous caractĂ©risons la population d’origine, ce qui comprend le mĂ©lange gĂ©nĂ©tique et la consanguinitĂ© rĂ©cente de notre cohorte. Nous avons testĂ© si le niveau de consanguinitĂ© Ă©tait associĂ© au phĂ©notype binaire et Ă  ses dimensions sous-phĂ©notypiques. Nous avons Ă©galement inclus un grand ensemble de donnĂ©es de populations contrĂŽles externes et appariĂ©es afin de calculer et comparer le coefficient de consanguinitĂ©. Notre approche, qui comprenait une analyse de liaison, une cartographie de l’auto-zygositĂ©, la dĂ©tection de cycles homozygotie et une analyse de sĂ©grĂ©gation de variantes homozygotes dĂ©lĂ©tĂšres rares, nous a conduit Ă  rejeter l’hypothĂšse d’un modĂšle de transmission rĂ©cessif sur ces familles (malgrĂ© leur forte consanguinitĂ©). Par la suite, nous avons examinĂ© si des CNVs co-sĂ©grĂ©gaient avec le phĂ©notype dans certaines familles. Cette Ă©tude comportait plusieurs Ă©tapes: 1 - une comparaison systĂ©matique entre diffĂ©rents algorithmes de dĂ©tection de CNVs. 2 - une validation croisĂ©e de vrais CNVs ou de faux positifs par des approches in silico ou expĂ©rimentales, 3 - le dĂ©veloppement d’un logiciel de sĂ©grĂ©gation et d'annotation. Cette Ă©tude met de l’avant Ă  la fois les avancĂ©es mĂ©thodologiques et les limites de l’exploration des CNVs. Au final, aucun des CNVs identifiĂ©s ne semblent contribuer Ă  la variance gĂ©nĂ©tique de la SCZ et du TB des familles examinĂ©es dans cette Ă©tude. Les rĂ©sultats prĂ©sentĂ©s dans cette thĂšse Ă©tayent une hypothĂšse alternative qui impliquerait des interactions polygĂ©niques entre Ă  la fois des variants rares et des variants communs.Schizophrenia (SCZ) and bipolar disorder (BP) are two major psychiatric disorders. SCZ is a primary psychotic disorder that typically involves symptoms of delusions and hallucinations, by comparison BP is a mood disorder engaging mania and depression but it can also involve psychosis. A 2010 estimation of these disorders highlighted that they respectively contributed to ~7.4% and ~7.0% of the global burden of disease. The high prevalence (~0.4% for SCZ and ~2.4% for BP) and estimated heritability (~80%) suggest a strong genetic influence. Evidence shows that there are some genetic overlaps between the two conditions but also disorder-independent genetic components. Over the past decade, genome-wide association studies (GWAS) identified hundreds of SCZ and BP loci, and other approaches identified various forms of potential genetic risk factors, for instance rare copy number variants (CNVs), rare single nucleotide variants (SNVs) and de novo mutations (DNMs). While our knowledge of the genetic architecture of these conditions grow, a large of portion of the genetic heritability of each disorder still remains unexplained. The combination of a long history of genetic admixture, and the tradition of consanguineous marriages (50% of unions are consanguineous), makes Pakistani families promising for population based medical genetics studies. Consanguinity has previously been associated with an increased risk of numerous traits in epidemiological studies. Family-based designs have been widely applied in the genetic mapping of Mendelian and complex traits. However, few studies have used large multiplex consanguineous families to thoroughly investigate the role of consanguinity in neuropsychiatric disorders such as SCZ and BP. CNVs have been implicated in SCZ and BP since the discovery of 22q11.2 deletions, however, most of them are rare in the population but contribute significantly to the risk. Association studies of CNVs found enrichment of rare deletions and duplications, and a higher rate of de novo CNVs in cases relative to controls. Whole-genome sequencing of multiplex SCZ families reported increased burden of rare, exonic CNV in SCZ probands and genetic heterogeneity. Using large multiplex families could be statistically powerful to investigate the role of segregating, and possibly pathogenic, CNVs. In order to better understand the genetic heterogeneity and look for missing heritability of these two common disorders in Pakistani families, we used SNP genotyping and whole-exome sequencing to examine the genetic profile of ten large multiplex consanguineous pedigrees; each of these families involved more than ten members affected by SCZ or BP. In this thesis, we characterized the population background which includes admixture and recent inbreeding of our cohort. We tested if the inbreeding level was associated with the binary phenotype and its subphenotype dimensions. We also included large external dataset of matched population control individuals to compute and compare the inbreeding coefficient. Our approach, which included linkage analysis, autozygosity mapping, runs of homozygosity (ROH) and rare deleterious homozygous variants segregation analysis, led us to reject the hypothesis of a recessive inheritance model across these families (despite of their high inbreeding). We subsequently looked if any CNV segregated across some of the families. This examination involved multiple steps: 1 - a systematic comparison of a range of CNV detection algorithms currently available through different platforms, 2 - a cross validation of true and false positive CNV calls through the use of in silico or experimental approaches, 3 - the development of our own segregation and annotation software. This effort both emphasized the methodological advances and limitations of CNV studies. In the end, none of the potentially pathogenic CNV identified appeared to account for the genetic variance of SCZ and BP observed in the families examined here. The results presented in this thesis provide support for an alternate hypothesis that would involve a polygenic pattern where both rare variants and common variants would be at play

    Statistical methods for clinical genome interpretation with specific application to inherited cardiac conditions

    Get PDF
    Background: While next-generation sequencing has enabled us to rapidly identify sequence variants, clinical application is limited by our ability to determine which rare variants impact disease risk. Aim: Developing computational methods to identify clinically important variants Methods and Results: (1) I built a disease-specific variant classifier for inherited cardiac conditions (ICCs), which outperforms genome-wide tools in a wide range of benchmarking. It discriminates pathogenic variants from benign variants with global accuracy improved by 4-24% over existing tools. Variants classified with >90% confidence are significantly associated with both disease status and clinical outcomes. (2) To better interpret missense variants, I examined evolutionarily equivalent residues across protein domain families, to identify positions intolerant of variations. Homologous residue constraint is a strong predictor of variant pathogenicity. It can identify a subset of de novo missense variants with comparable impact on developmental disorders as protein-truncating variants. Independent from existing approaches, it can also improve the prioritisation of disease-relevant gene for both developmental disorders and inherited hypertrophic cardiomyopathy. (3) TTN-truncating variants are known to cause dilated cardiomyopathy, but the effect of missense variants is poorly understood. Using the approach in (2), I studied the role of TTN missense variants on DCM. Our prioritised residues are enriched with known pathogenic variants, including the two known to cause DCM and others involved in skeletal myopathies. I also found a significant association between constrained variants of TTN I-set domains and DCM in a case-control burden test of Caucasian samples (OR=3.2, 95%CI=1.3-9.4). Within subsets of DCM, the association is replicated in alcoholic cardiomyopathy. (4) Finally, I also developed a tool to annotate 5’UTR variants creating or disrupting upstream open reading frames (uORF). Its utility is demonstrated to detect high-impact uORF-disturbing variants from ClinVar, gnomAD and Genomics England. Conclusion: These studies established broadly applicable methods and improved understanding of ICCs.Open Acces

    High-coverage whole-genome sequencing of the expanded 1000 Genomes Project cohort including 602 trios

    Get PDF
    The 1000 Genomes Project (1kGP) is the largest fully open resource of whole-genome sequencing (WGS) data consented for public distribution without access or use restrictions. The final, phase 3 release of the 1kGP included 2,504 unrelated samples from 26 populations and was based primarily on low-coverage WGS. Here, we present a high-coverage 3,202-sample WGS 1kGP resource, which now includes 602 complete trios, sequenced to a depth of 30X using Illumina. We performed single-nucleotide variant (SNV) and short insertion and deletion (INDEL) discovery and generated a comprehensive set of structural variants (SVs) by integrating multiple analytic methods through a machine learning model. We show gains in sensitivity and precision of variant calls compared to phase 3, especially among rare SNVs as well as INDELs and SVs spanning frequency spectrum. We also generated an improved reference imputation panel, making variants discovered here accessible for association studies

    Using machine learning to predict pathogenicity of genomic variants throughout the human genome

    Get PDF
    GeschĂ€tzt mehr als 6.000 Erkrankungen werden durch VerĂ€nderungen im Genom verursacht. Ursachen gibt es viele: Eine genomische Variante kann die Translation eines Proteins stoppen, die Genregulation stören oder das Spleißen der mRNA in eine andere Isoform begĂŒnstigen. All diese Prozesse mĂŒssen ĂŒberprĂŒft werden, um die zum beschriebenen PhĂ€notyp passende Variante zu ermitteln. Eine Automatisierung dieses Prozesses sind Varianteneffektmodelle. Mittels maschinellem Lernen und Annotationen aus verschiedenen Quellen bewerten diese Modelle genomische Varianten hinsichtlich ihrer PathogenitĂ€t. Die Entwicklung eines Varianteneffektmodells erfordert eine Reihe von Schritten: Annotation der Trainingsdaten, Auswahl von Features, Training verschiedener Modelle und Selektion eines Modells. Hier prĂ€sentiere ich ein allgemeines Workflow dieses Prozesses. Dieses ermöglicht es den Prozess zu konfigurieren, Modellmerkmale zu bearbeiten, und verschiedene Annotationen zu testen. Der Workflow umfasst außerdem die Optimierung von Hyperparametern, Validierung und letztlich die Anwendung des Modells durch genomweites Berechnen von Varianten-Scores. Der Workflow wird in der Entwicklung von Combined Annotation Dependent Depletion (CADD), einem Varianteneffektmodell zur genomweiten Bewertung von SNVs und InDels, verwendet. Durch Etablierung des ersten Varianteneffektmodells fĂŒr das humane Referenzgenome GRCh38 demonstriere ich die gewonnenen Möglichkeiten Annotationen aufzugreifen und neue Modelle zu trainieren. Außerdem zeige ich, wie Deep-Learning-Scores als Feature in einem CADD-Modell die Vorhersage von RNA-Spleißing verbessern. Außerdem werden Varianteneffektmodelle aufgrund eines neuen, auf AllelhĂ€ufigkeit basierten, Trainingsdatensatz entwickelt. Diese Ergebnisse zeigen, dass der entwickelte Workflow eine skalierbare und flexible Möglichkeit ist, um Varianteneffektmodelle zu entwickeln. Alle entstandenen Scores sind unter cadd.gs.washington.edu und cadd.bihealth.org frei verfĂŒgbar.More than 6,000 diseases are estimated to be caused by genomic variants. This can happen in many possible ways: a variant may stop the translation of a protein, interfere with gene regulation, or alter splicing of the transcribed mRNA into an unwanted isoform. It is necessary to investigate all of these processes in order to evaluate which variant may be causal for the deleterious phenotype. A great help in this regard are variant effect scores. Implemented as machine learning classifiers, they integrate annotations from different resources to rank genomic variants in terms of pathogenicity. Developing a variant effect score requires multiple steps: annotation of the training data, feature selection, model training, benchmarking, and finally deployment for the model's application. Here, I present a generalized workflow of this process. It makes it simple to configure how information is converted into model features, enabling the rapid exploration of different annotations. The workflow further implements hyperparameter optimization, model validation and ultimately deployment of a selected model via genome-wide scoring of genomic variants. The workflow is applied to train Combined Annotation Dependent Depletion (CADD), a variant effect model that is scoring SNVs and InDels genome-wide. I show that the workflow can be quickly adapted to novel annotations by porting CADD to the genome reference GRCh38. Further, I demonstrate the integration of deep-neural network scores as features into a new CADD model, improving the annotation of RNA splicing events. Finally, I apply the workflow to train multiple variant effect models from training data that is based on variants selected by allele frequency. In conclusion, the developed workflow presents a flexible and scalable method to train variant effect scores. All software and developed scores are freely available from cadd.gs.washington.edu and cadd.bihealth.org
    • 

    corecore