35 research outputs found
A two-step multiple-marker strategy for genome-wide association studies
Genome-wide association studies raise study-design and analytical issues that are still being debated. Among them, stands the issue of reducing the number of markers to be genotyped without loss of efficiency in identifying trait loci, which can reduce the cost of studies and minimize the multiple testing problem. With this aim, we proposed a two-step strategy based on two analytical methods suited to examine sets of markers rather than single markers: the local score, which screens the genome to select candidate regions in Step 1, and FBAT-LC, a multiple-marker family-based association test used to obtain significance levels of regions at step 2. The performance of this strategy was evaluated on all replicates of Genetic Analysis Workshop 15 Problem 3 simulated data, using the answers to that problem. Overall, seven of the nine generated trait loci were detected in at least 87% of the replicates using a framework designed to handle either association with the disease or association with the severity of disease. This multiple-marker strategy was compared to the single-marker approach. By considering regions instead of single markers, this strategy minimizes the multiple testing problem and the number of false-positive results
Molecular apocrine differentiation is a common feature of breast cancer in patients with germline PTEN mutations
International audienceINTRODUCTION: Breast carcinoma is the main malignant tumor occurring in patients with Cowden disease, a cancer-prone syndrome caused by germline mutation of the tumor suppressor gene PTEN characterized by the occurrence throughout life of hyperplastic, hamartomatous and malignant growths affecting various organs. The absence of known histological features for breast cancer arising in a PTEN-mutant background prompted us to explore them for potential new markers. METHODS: We first performed a microarray study of three tumors from patients with Cowden disease in the context of a transcriptomic study of 74 familial breast cancers. A subsequent histological and immunohistochemical study including 12 additional cases of Cowden disease breast carcinomas was performed to confirm the microarray data. RESULTS: Unsupervised clustering of the 74 familial tumors followed the intrinsic gene classification of breast cancer except for a group of five tumors that included the three Cowden tumors. The gene expression profile of the Cowden tumors shows considerable overlap with that of a breast cancer subgroup known as molecular apocrine breast carcinoma, which is suspected to have increased androgenic signaling and shows frequent ERBB2 amplification in sporadic tumors. The histological and immunohistochemical study showed that several cases had apocrine histological features and expressed GGT1, which is a potential new marker for apocrine breast carcinoma. CONCLUSIONS: These data suggest that activation of the ERBB2-PI3K-AKT pathway by loss of PTEN at early stages of tumorigenesis promotes the formation of breast tumors with apocrine features
A new molecular classification to drive precision treatment strategies in primary Sjögren’s syndrome
There is currently no approved treatment for primary Sjögren's syndrome, a disease that primarily affects adult women. The difficulty in developing effective therapies is -in part- because of the heterogeneity in the clinical manifestation and pathophysiology of the disease. Finding common molecular signatures among patient subgroups could improve our understanding of disease etiology, and facilitate the development of targeted therapeutics. Here, we report, in a cross-sectional cohort, a molecular classification scheme for Sjögren's syndrome patients based on the multi-omic profiling of whole blood samples from a European cohort of over 300 patients, and a similar number of age and gender-matched healthy volunteers. Using transcriptomic, genomic, epigenetic, cytokine expression and flow cytometry data, combined with clinical parameters, we identify four groups of patients with distinct patterns of immune dysregulation. The biomarkers we identify can be used by machine learning classifiers to sort future patients into subgroups, allowing the re-evaluation of response to treatments in clinical trials
Association of a schizophrenia-risk nonsynonymous variant with putamen volume in adolescents
Importance
Deviation from normal adolescent brain development precedes manifestations of many major psychiatric symptoms. Such altered developmental trajectories in adolescents may be linked to genetic risk for psychopathology.
Objective
To identify genetic variants associated with adolescent brain structure and explore psychopathologic relevance of such associations.
Design, Setting, and Participants
Voxelwise genome-wide association study in a cohort of healthy adolescents aged 14 years and validation of the findings using 4 independent samples across the life span with allele-specific expression analysis of top hits. Group comparison of the identified gene-brain association among patients with schizophrenia, unaffected siblings, and healthy control individuals. This was a population-based, multicenter study combined with a clinical sample that included participants from the IMAGEN cohort, Saguenay Youth Study, Three-City Study, and Lieber Institute for Brain Development sample cohorts and UK biobank who were assessed for both brain imaging and genetic sequencing. Clinical samples included patients with schizophrenia and unaffected siblings of patients from the Lieber Institute for Brain Development study. Data were analyzed between October 2015 and April 2018.
Main Outcomes and Measures
Gray matter volume was assessed by neuroimaging and genetic variants were genotyped by Illumina BeadChip.
Results
The discovery sample included 1721 adolescents (873 girls [50.7%]), with a mean (SD) age of 14.44 (0.41) years. The replication samples consisted of 8690 healthy adults (4497 women [51.8%]) from 4 independent studies across the life span. A nonsynonymous genetic variant (minor T allele of rs13107325 in SLC39A8, a gene implicated in schizophrenia) was associated with greater gray matter volume of the putamen (variance explained of 4.21% in the left hemisphere; 8.66; 95% CI, 6.59-10.81; P = 5.35 × 10−18; and 4.44% in the right hemisphere; t = 8.90; 95% CI, 6.75-11.19; P = 6.80 × 10−19) and also with a lower gene expression of SLC39A8 specifically in the putamen (t127 = −3.87; P = 1.70 × 10−4). The identified association was validated in samples across the life span but was significantly weakened in both patients with schizophrenia (z = −3.05; P = .002; n = 157) and unaffected siblings (z = −2.08; P = .04; n = 149).
Conclusions and Relevance
Our results show that a missense mutation in gene SLC39A8 is associated with larger gray matter volume in the putamen and that this association is significantly weakened in schizophrenia. These results may suggest a role for aberrant ion transport in the etiology of psychosis and provide a target for preemptive developmental interventions aimed at restoring the functional effect of this mutation
TGFβ receptor gene variants in systemic sclerosis-related pulmonary arterial hypertension: results from a multicentre EUSTAR study of European Caucasian patients
Introduction: Systemic sclerosis (SSc)-related pulmonary arterial hypertension (PAH) has emerged as a major mortality prognostic factor. Mutations of transforming growth factor beta (TGFβ) receptor genes strongly contribute to idiopathic and familial PAH.
Objective: To explore the genetic bases of SSc–PAH, we combined direct sequencing and genotyping of candidate genes encoding TGFβ receptor family members.
Materials and methods: TGFβ receptor genes, BMPR2, ALK1, TGFR2 and ENG, were sequenced in 10 SSc–PAH patients, nine SSc and seven controls. In addition, 22 single-nucleotide polymorphisms (SNP) of these four candidate genes were tested for association in a first set of 824 French Caucasian SSc patients (including 54 SSc–PAH) and 939 controls. The replication set consisted of 1516 European SSc (including 219 SSc–PAH) and 3129 controls from the European League Against Rheumatism Scleroderma Trials and Research group network.
Results: No mutation was identified by direct sequencing. However, two repertoried SNP, ENG rs35400405 and ALK1 rs2277382, were found in SSc–PAH patients only. The genotyping of 22 SNP including the latter showed that only rs2277382 was associated with SSc–PAH (p=0.0066, OR 2.13, 95% CI 1.24 to 3.65). Nevertheless, this was not replicated with the following result in combined analysis: p=0.123, OR 0.79, 95% CI 0.59 to 1.07.
Conclusions: This study demonstrates the lack of association between these TGFβ receptor gene polymorphisms and SSc–PAH using both sequencing and genotyping methods
Méthodes statistiques pour la prise en compte de différentes sources de biais dans les études d'association à grande échelle
Les études d'association à grande échelle sont devenus un outil très performant pour détecter les variants génétiques associés aux maladies. Ce manuscrit de doctorat s'intéresse à plusieurs des aspects clés des nouvelles problématiques informatiques et statistiques qui ont émergé grâce à de telles recherches. Les résultats des études d'association à grande échelle sont critiqués, en partie, à cause du biais induit par la stratification des populations. Nous proposons une étude de comparaison des stratégies qui existent pour prendre en compte ce problème. Leurs avantages et limites sont discutés en s'appuyant sur divers scénarios de structure des populations dans le but de proposer des conseils et indications pratiques. Nous nous intéressons ensuite à l'interférence de la structure des populations dans la recherche génétique. Nous avons développé au cours de cette thèse un nouvel algorithme appelé SHIPS (Spectral Hierarchical clustering for the Inference of Population Structure). Cet algorithme a été appliqué à un ensemble de jeux de données simulés et réels, ainsi que de nombreux autres algorithmes utilisés en pratique à titre de comparaison. Enfin, la question du test multiple dans ces études d'association est abordée à plusieurs niveaux. Nous proposons une présentation générale des méthodes de tests multiples et discutons leur validité pour différents designs d'études. Nous nous concertons ensuite sur l'obtention de résultats interprétables aux niveaux de gènes, ce qui correspond à une problématique de tests multiples avec des tests dépendants. Nous discutons et analysons les différentes approches dédiées à cette fin.Genome-Wide association studies have become powerful tools to detect genetic variants associated with diseases. This PhD thesis focuses on several key aspects of the new computational and methodological problematics that have arisen with such research. The results of Genome-Wide association studies have been questioned, in part because of the bias induced by population stratification. Many stratégies are available to account for population stratification scenarios are highlighted in order to propose pratical guidelines to account for population stratification. We then focus on the inference of population structure that has many applications for genetic research. We have developed and present in this manuscript a new clustering algoritm called Spectral Hierarchical clustering for the Inference of Population Structure (SHIPS). This algorithm in the field to propose a comparison of their performances. Finally, the issue of multiple-testing in Genome-Wide association studies is discussed on several levels. We propose a review of the multiple-testing corrections and discuss their validity for different study settings. We then focus on deriving gene-wise interpretation of the findings that corresponds to multiple-stategy to obtain valid gene-disease association measures.EVRY-Bib. électronique (912289901) / SudocSudocFranceF
Méthodes statistiques pour une analyse robuste du transcriptome à travers l'intégration d'a priori biologique
Au cours de la dernière décennie, les progrès en Biologie Moléculaire ont accéléré le développement de techniques d'investigation à haut-débit. En particulier, l'étude du transcriptome a permis des avancées majeures dans la recherche médicale. Dans cette thèse, nous nous intéressons au développement de méthodes statistiques dédiées au traitement et à l'analyse de données transcriptomiques à grande échelle. Nous abordons le problème de sélection de signatures de gènes à partir de méthodes d'analyse de l'expression différentielle et proposons une étude de comparaison de différentes approches, basée sur plusieurs stratégies de simulations et sur des données réelles. Afin de pallier les limites de ces méthodes classiques qui s'avèrent peu reproductibles, nous présentons un nouvel outil, DiAMS (DIsease Associated Modules Selection), dédié à la sélection de modules de gènes significatifs. DiAMS repose sur une extension du score-local et permet l'intégration de données d'expressions et de données d'interactions protéiques. Par la suite, nous nous intéressons au problème d'inférence de réseaux de régulation de gènes. Nous proposons une méthode de reconstruction à partir de modèles graphiques Gaussiens, basée sur l'introduction d'a priori biologique sur la structure des réseaux. Cette approche nous permet d'étudier les interactions entre gènes et d'identifier des altérations dans les mécanismes de régulation, qui peuvent conduire à l'apparition ou à la progression d'une maladie. Enfin l'ensemble de ces développements méthodologiques sont intégrés dans un pipeline d'analyse que nous appliquons à l'étude de la rechute métastatique dans le cancer du sein.Recent advances in Molecular Biology have led biologists toward high-throughput genomic studies. In particular, the investigation of the human transcriptome offers unprecedented opportunities for understanding cellular and disease mechanisms. In this PhD, we put our focus on providing robust statistical methods dedicated to the treatment and the analysis of high-throughput transcriptome data. We discuss the differential analysis approaches available in the literature for identifying genes associated with a phenotype of interest and propose a comparison study. We provide practical recommendations on the appropriate method to be used based on various simulation models and real datasets. With the eventual goal of overcoming the inherent instability of differential analysis strategies, we have developed an innovative approach called DiAMS, for DIsease Associated Modules Selection. This method was applied to select significant modules of genes rather than individual genes and involves the integration of both transcriptome and protein interactions data in a local-score strategy. We then focus on the development of a framework to infer gene regulatory networks by integration of a biological informative prior over network structures using Gaussian graphical models. This approach offers the possibility of exploring the molecular relationships between genes, leading to the identification of altered regulations potentially involved in disease processes. Finally, we apply our statistical developments to study the metastatic relapse of breast cancer.EVRY-Bib. électronique (912289901) / SudocSudocFranceF
Kerfdr: a semi-parametric kernel-based approach to local false discovery rate estimation
Background: The use of current high-throughput genetic, genomic and post-genomic data leads to the simultaneous evaluation of a large number of statistical hypothesis and, at the same time, to the multiple-testing problem. As an alternative to the too conservative Family-Wise Error-Rate (FWER), the False Discovery Rate (FDR) has appeared for the last ten years as more appropriate to handle this problem. However one drawback of FDR is related to a given rejection region for the considered statistics, attributing the same value to those that are close to the boundary and those that are not. As a result, the local FDR has been recently proposed to quantify the specific probability for a given null hypothesis to be true. Results: In this context we present a semi-parametric approach based on kernel estimators which is applied to different high-throughput biological data such as patterns in DNA sequences, genes expression and genome-wide association studies. Conclusion: The proposed method has the practical advantages, over existing approaches, to consider complex heterogeneities in the alternative hypothesis, to take into account prior information (from an expert judgment or previous studies) by allowing a semi-supervised mode, and to deal with truncated distributions such as those obtained in Monte-Carlo simulations. This method has been implemented and is available through the R package kerfdr via the CRAN or at http://stat.genopole.cnrs.fr/software/kerfdr
Detecting local high-scoring segments: a first stage approach for genome-wide studies
International audienc
A fast, unbiased and exact allelic test for case-control association studies
International audienc