136 research outputs found
an interpretable low complexity machine learning framework for robust exome based in silico diagnosis of crohn s disease patients
Abstract
Whole exome sequencing (WES) data are allowing researchers to pinpoint the causes of many Mendelian disorders. In time, sequencing data will be crucial to solve the genome interpretation puzzle, which aims at uncovering the genotype-to-phenotype relationship, but for the moment many conceptual and technical problems need to be addressed. In particular, very few attempts at the in-silico diagnosis of oligo-to-polygenic disorders have been made so far, due to the complexity of the challenge, the relative scarcity of the data and issues such as batch effects and data heterogeneity, which are confounder factors for machine learning (ML) methods. Here, we propose a method for the exome-based in-silico diagnosis of Crohn's disease (CD) patients which addresses many of the current methodological issues. First, we devise a rational ML-friendly feature representation for WES data based on the gene mutational burden concept, which is suitable for small sample sizes datasets. Second, we propose a Neural Network (NN) with parameter tying and heavy regularization, in order to limit its complexity and thus the risk of over-fitting. We trained and tested our NN on 3 CD case-controls datasets, comparing the performance with the participants of previous CAGI challenges. We show that, notwithstanding the limited NN complexity, it outperforms the previous approaches. Moreover, we interpret the NN predictions by analyzing the learned patterns at the variant and gene level and investigating the decision process leading to each prediction
Molecular Reclassification of Crohn's Disease by Cluster Analysis of Genetic Variants
Background Crohn's Disease (CD) has a heterogeneous presentation, and is typically classified according to extent and location of disease. The genetic susceptibility to CD is well known and genome-wide association scans (GWAS) and meta-analysis thereof have identified over 30 susceptibility loci. Except for the association between ileal CD and NOD2 mutations, efforts in trying to link CD genetics to clinical subphenotypes have not been very successful. We hypothesized that the large number of confirmed genetic variants enables (better) classification of CD patients. Methodology/Principal Findings To look for genetic-based subgroups, genotyping results of 46 SNPs identified from CD GWAS were analyzed by Latent Class Analysis (LCA) in CD patients and in healthy controls. Six genetic-based subgroups were identified in CD patients, which were significantly different from the five subgroups found in healthy controls. The identified CD-specific clusters are therefore likely to contribute to disease behavior. We then looked at whether we could relate the genetic-based subgroups to the currently used clinical parameters. Although modest differences in prevalence of disease location and behavior could be observed among the CD clusters, Random Forest analysis showed that patients could not be allocated to one of the 6 genetic-based subgroups based on the typically used clinical parameters alone. This points to a poor relationship between the genetic-based subgroups and the used clinical subphenotypes. Conclusions/Significance This approach serves as a first step to reclassify Crohn's disease. The used technique can be applied to other common complex diseases as well, and will help to complete patient characterization, in order to evolve towards personalized medicine. </sec
Mucosal Gene Expression of Antimicrobial Peptides in Inflammatory Bowel Disease Before and After First Infliximab Treatment
Background: Antimicrobial peptides (AMPs) protect the host intestinal mucosa against microorganisms. Abnormal expression of defensins was shown in inflammatory bowel disease (IBD), but it is not clear whether this is a primary defect. We investigated the impact of anti-inflammatory therapy with infliximab on the mucosal gene expression of AMPs in IBD. Methodology/Principal Findings: Mucosal gene expression of 81 AMPs was assessed in 61 IBD patients before and 4-6 weeks after their first infliximab infusion and in 12 control patients, using Affymetrix arrays. Quantitative real-time reverse-transcription PCR and immunohistochemistry were used to confirm microarray data. The dysregulation of many AMPs in colonic IBD in comparison with control colons was widely restored by infliximab therapy, and only DEFB1 expression remained significantly decreased after therapy in the colonic mucosa of IBD responders to infliximab. In ileal Crohn's disease (CD), expression of two neuropeptides with antimicrobial activity, PYY and CHGB, was significantly decreased before therapy compared to control ileums, and ileal PYY expression remained significantly decreased after therapy in CD responders. Expression of the downregulated AMPs before and after treatment (DEFB1 and PYY) correlated with villin 1 expression, a gut epithelial cell marker, indicating that the decrease is a consequence of epithelial damage. Conclusions/Significance: Our study shows that the dysregulation of AMPs in IBD mucosa is the consequence of inflammation, but may be responsible for perpetuation of inflammation due to ineffective clearance of microorganisms
Extended analysis of a genome-wide association study in primary sclerosing cholangitis detects multiple novel risk loci.
A limited number of genetic risk factors have been reported in primary sclerosing cholangitis (PSC). To discover further genetic susceptibility factors for PSC, we followed up on a second tier of single nucleotide polymorphisms (SNPs) from a genome-wide association study (GWAS). We analyzed 45 SNPs in 1221 PSC cases and 3508 controls. The association results from the replication analysis and the original GWAS (715 PSC cases and 2962 controls) were combined in a meta-analysis comprising 1936 PSC cases and 6470 controls. We performed an analysis of bile microbial community composition in 39 PSC patients by 16S rRNA sequencing. Seventeen SNPs representing 12 distinct genetic loci achieved nominal significance (p(replication) <0.05) in the replication. The most robust novel association was detected at chromosome 1p36 (rs3748816; p(combined)=2.1 Ă— 10(-8)) where the MMEL1 and TNFRSF14 genes represent potential disease genes. Eight additional novel loci showed suggestive evidence of association (p(repl) <0.05). FUT2 at chromosome 19q13 (rs602662; p(comb)=1.9 Ă— 10(-6), rs281377; p(comb)=2.1 Ă— 10(-6) and rs601338; p(comb)=2.7 Ă— 10(-6)) is notable due to its implication in altered susceptibility to infectious agents. We found that FUT2 secretor status and genotype defined by rs601338 significantly influence biliary microbial community composition in PSC patients. We identify multiple new PSC risk loci by extended analysis of a PSC GWAS. FUT2 genotype needs to be taken into account when assessing the influence of microbiota on biliary pathology in PSC.Norwegian PSC Research Center
German Ministry of Education and Research (BMBF) through the National Genome Research Network (NGFN)
Integrated Research and Treatment Center - Transplantation
01EO0802
PopGen biobank
NIH
DK 8496
Polymorphisms near TBX5 and GDF7 are associated with increased risk for Barrett's esophagus.
BACKGROUND & AIMS: Barrett's esophagus (BE) increases the risk of esophageal adenocarcinoma (EAC). We found the risk to be BE has been associated with single nucleotide polymorphisms (SNPs) on chromosome 6p21 (within the HLA region) and on 16q23, where the closest protein-coding gene is FOXF1. Subsequently, the Barrett's and Esophageal Adenocarcinoma Consortium (BEACON) identified risk loci for BE and esophageal adenocarcinoma near CRTC1 and BARX1, and within 100 kb of FOXP1. We aimed to identify further SNPs that increased BE risk and to validate previously reported associations. METHODS: We performed a genome-wide association study (GWAS) to identify variants associated with BE and further analyzed promising variants identified by BEACON by genotyping 10,158 patients with BE and 21,062 controls. RESULTS: We identified 2 SNPs not previously associated with BE: rs3072 (2p24.1; odds ratio [OR] = 1.14; 95% CI: 1.09-1.18; P = 1.8 Ă— 10(-11)) and rs2701108 (12q24.21; OR = 0.90; 95% CI: 0.86-0.93; P = 7.5 Ă— 10(-9)). The closest protein-coding genes were respectively GDF7 (rs3072), which encodes a ligand in the bone morphogenetic protein pathway, and TBX5 (rs2701108), which encodes a transcription factor that regulates esophageal and cardiac development. Our data also supported in BE cases 3 risk SNPs identified by BEACON (rs2687201, rs11789015, and rs10423674). Meta-analysis of all data identified another SNP associated with BE and esophageal adenocarcinoma: rs3784262, within ALDH1A2 (OR = 0.90; 95% CI: 0.87-0.93; P = 3.72 Ă— 10(-9)). CONCLUSIONS: We identified 2 loci associated with risk of BE and provided data to support a further locus. The genes we found to be associated with risk for BE encode transcription factors involved in thoracic, diaphragmatic, and esophageal development or proteins involved in the inflammatory response
Complete sequence of the 22q11.2 allele in 1,053 subjects with 22q11.2 deletion syndrome reveals modifiers of conotruncal heart defects
The 22q11.2 deletion syndrome (22q11.2DS) results from non-allelic homologous recombination between low-copy repeats termed LCR22. About 60%-70% of individuals with the typical 3 megabase (Mb) deletion from LCR22A-D have congenital heart disease, mostly of the conotruncal type (CTD), whereas others have normal cardiac anatomy. In this study, we tested whether variants in the hemizygous LCR22A-D region are associated with risk for CTDs on the basis of the sequence of the 22q11.2 region from 1,053 22q11.2DS individuals. We found a significant association (FDR p < 0.05) of the CTD subset with 62 common variants in a single linkage disequilibrium (LD) block in a 350 kb interval harboring CRKL. A total of 45 of the 62 variants were associated with increased risk for CTDs (odds ratio [OR) ranges: 1.64-4.75). Associations of four variants were replicated in a meta-analysis of three genome-wide association studies of CTDs in affected individuals without 22q11.2DS. One of the replicated variants, rs178252, is located in an open chromatin region and resides in the double-elite enhancer, GH22J020947, that is predicted to regulate CRKL (CRK-like proto-oncogene, cytoplasmic adaptor) expression. Approximately 23% of patients with nested LCR22C-D deletions have CTDs, and inactivation of Crkl in mice causes CTDs, thus implicating this gene as a modifier. Rs178252 and rs6004160 are expression quantitative trait loci (eQTLs) of CRKL. Furthermore, set-based tests identified an enhancer that is predicted to target CRKL and is significantly associated with CTD risk (GH22J020946, sequence kernal association test (SKAT) p = 7.21 Ă— 10-5) in the 22q11.2DS cohort. These findings suggest that variance in CTD penetrance in the 22q11.2DS population can be explained in part by variants affecting CRKL expression
Genetic Evidence Supporting the Association of Protease and Protease Inhibitor Genes with Inflammatory Bowel Disease: A Systematic Review
As part of the European research consortium IBDase, we addressed the role of proteases and protease inhibitors (P/PIs) in inflammatory bowel disease (IBD), characterized by chronic mucosal inflammation of the gastrointestinal tract, which affects 2.2 million people in Europe and 1.4 million people in North America. We systematically reviewed all published genetic studies on populations of European ancestry (67 studies on Crohn's disease [CD] and 37 studies on ulcerative colitis [UC]) to identify critical genomic regions associated with IBD. We developed a computer algorithm to map the 807 P/PI genes with exact genomic locations listed in the MEROPS database of peptidases onto these critical regions and to rank P/PI genes according to the accumulated evidence for their association with CD and UC. 82 P/PI genes (75 coding for proteases and 7 coding for protease inhibitors) were retained for CD based on the accumulated evidence. The cylindromatosis/turban tumor syndrome gene (CYLD) on chromosome 16 ranked highest, followed by acylaminoacyl-peptidase (APEH), dystroglycan (DAG1), macrophage-stimulating protein (MST1) and ubiquitin-specific peptidase 4 (USP4), all located on chromosome 3. For UC, 18 P/PI genes were retained (14 proteases and 4protease inhibitors), with a considerably lower amount of accumulated evidence. The ranking of P/PI genes as established in this systematic review is currently used to guide validation studies of candidate P/PI genes, and their functional characterization in interdisciplinary mechanistic studies in vitro and in vivo as part of IBDase. The approach used here overcomes some of the problems encountered when subjectively selecting genes for further evaluation and could be applied to any complex disease and gene family
- …