23 research outputs found

    Disease Associated Mutations and Functional Variants that Significantly Disrupt RNA Structure

    Get PDF
    Genome-Wide Association Studies (GWAS) have revealed a great deal of trait and diseaseassociated Single Nucleotide Polymorphisms (SNPs) that fall in noncoding or intergenic regions of the human genome. This is congruent with the current understanding that many of these regions are actively transcribed, and that many transcripts and transcript regions that do not code for protein have important roles in the cell. In carrying out many transcripts’ functions, RNA structure plays a critical role. We hypothesized that a subset of noncoding disease associated SNPs significantly change RNA structure. We developed a program called SNPfold to identify SNPs that cause significant RNA structural rearrangement and utilized it on a set of 514 disease-associated SNPs in 350 unique noncoding regions of the human transcriptome. We identified six disease-states (Hyperferritinemia Cataract Syndrome, β- Thalassemia, Cartilage-Hair Hypoplasia, Retinoblastoma, Chronic Obstructive Pulmonary Disease, and Hypertension) where multiple SNPs significantly alter RNA structural ensembles. We then conducted Selective 2’ OH Acylation and Primer Extension (SHAPE) in order to confirm predicted structure change caused by SNPs associated with Hyperferritinemia Catraract Syndrome (U22G and A56U in the FTL 5’ UTR). Both mutations are shown to disrupt the formation of an Iron Response Element stemloop that is critical to translational regulation of the mRNA. We identified compensatory mutations that were able to restore these mutant structures to that of wildtype FTL 5’ UTR. We then identified from human haplotype data several regions where SNP pairs inherited together conserve structure. Lastly, we explored the functional effect of common SNPs associated with change in RNA expression level by calculating the enrichment of their overlap with experimentally derived binding sites for 14 different RNA-binding proteins. Consistent with a subset of these SNPs altering structure in functionally important sites of mRNA transcripts, we identified several proteins where SNPs are enriched for proximal overlap. These results in their entirety indicate that both rare disease-associated and common SNPs that significantly change RNA structure are present in human populations, and that such a functional effect may account for a subset of phenotypic differences and complex disease propensities among individuals.Doctor of Philosoph

    A novel application of pattern recognition for accurate SNP and indel discovery from high-throughput data: Targeted resequencing of the glucocorticoid receptor co-chaperone FKBP5 in a Caucasian population

    Get PDF
    The detection of single nucleotide polymorphisms (SNPs) and insertion/deletions (indels) with precision from high-throughput data remains a significant bioinformatics challenge. Accurate detection is necessary before next-generation sequencing can routinely be used in the clinic. In research, scientific advances are inhibited by gaps in data, exemplified by the underrepresented discovery of rare variants, variants in non-coding regions and indels. The continued presence of false positives and false negatives prevents full automation and requires additional manual verification steps. Our methodology presents applications of both pattern recognition and sensitivity analysis to eliminate false positives and aid in the detection of SNP/indel loci and genotypes from high-throughput data. We chose FK506-binding protein 51(FKBP5) (6p21.31) for our clinical target because of its role in modulating pharmacological responses to physiological and synthetic glucocorticoids and because of the complexity of the genomic region. We detected genetic variation across a160 kb region encompassing FKBP5. 613 SNPs and 57 indels, including a 3.3 kb deletion were discovered. We validated our method using three independent data sets and, with Sanger sequencing and Affymetrix and Illumina microarrays, achieved 99% concordance. Furthermore we were able to detect 267 novel rare variants and assess linkage disequilibrium. Our results showed both a sensitivity and specificity of 98%, indicating near perfect classification between true and false variants. The process is scalable and amenable to automation, with the downstream filters taking only 1.5 hours to analyze 96 individuals simultaneously. We provide examples of how our level of precision uncovered the interactions of multiple loci, their predicted influences on mRNA stability, perturbations of the hsp90 binding site, and individual variation in FKBP5 expression. Finally we show how our discovery of rare variants may change current conceptions of evolution at this locus

    Activity profiles in international female team handball using PlayerLoadTM

    No full text
    Team handball matches place diverse physical demands on players, which may result in fatigue and decreased activity levels. However, previous speed-based methods of quantifying player activity may not be sensitive for capturing short-lasting team handball-specific movements. Purpose: To examine activity profiles of a female team handball team and individual player profiles, using inertial measurement units (IMUs). Methods: Match data was obtained from one female national team in nine international matches (n=85 individual player samples), using the Catapult OptimEye S5. PlayerLoad™min-1 was used as a measure of intensity in 5- and 10-minute periods. Team profiles were presented as relative to the player’s match means, and individual profiles were presented as relative to the mean of the 5-minute periods with >60% field time. Results: A high initial intensity was observed for team profiles, and for players with ≥2 consecutive periods of play. Substantial declines in PlayerLoad™ min-1 were observed throughout matches for the team, and for players with several consecutive periods of field time. These trends were found for all positional categories. Intensity increased substantially in the final five minutes of the first half for team profiles. Activity levels were substantially lower in the five minutes after a player’s most intense period, and were partly restored in the subsequent 5-minute period. Discussion: Possible explanations for the observed declines in activity profiles for the team and individual players include fatiguing players, situational factors and pacing. However, underlying mechanisms were not accounted for, and these assumptions are therefore based on previous team-sport studies

    The Intolerance of Regulatory Sequence to Genetic Variation Predicts Gene Dosage Sensitivity.

    No full text
    Noncoding sequence contains pathogenic mutations. Yet, compared with mutations in protein-coding sequence, pathogenic regulatory mutations are notoriously difficult to recognize. Most fundamentally, we are not yet adept at recognizing the sequence stretches in the human genome that are most important in regulating the expression of genes. For this reason, it is difficult to apply to the regulatory regions the same kinds of analytical paradigms that are being successfully applied to identify mutations among protein-coding regions that influence risk. To determine whether dosage sensitive genes have distinct patterns among their noncoding sequence, we present two primary approaches that focus solely on a gene's proximal noncoding regulatory sequence. The first approach is a regulatory sequence analogue of the recently introduced residual variation intolerance score (RVIS), termed noncoding RVIS, or ncRVIS. The ncRVIS compares observed and predicted levels of standing variation in the regulatory sequence of human genes. The second approach, termed ncGERP, reflects the phylogenetic conservation of a gene's regulatory sequence using GERP++. We assess how well these two approaches correlate with four gene lists that use different ways to identify genes known or likely to cause disease through changes in expression: 1) genes that are known to cause disease through haploinsufficiency, 2) genes curated as dosage sensitive in ClinGen's Genome Dosage Map, 3) genes judged likely to be under purifying selection for mutations that change expression levels because they are statistically depleted of loss-of-function variants in the general population, and 4) genes judged unlikely to cause disease based on the presence of copy number variants in the general population. We find that both noncoding scores are highly predictive of dosage sensitivity using any of these criteria. In a similar way to ncGERP, we assess two ensemble-based predictors of regional noncoding importance, ncCADD and ncGWAVA, and find both scores are significantly predictive of human dosage sensitive genes and appear to carry information beyond conservation, as assessed by ncGERP. These results highlight that the intolerance of noncoding sequence stretches in the human genome can provide a critical complementary tool to other genome annotation approaches to help identify the parts of the human genome increasingly likely to harbor mutations that influence risk of disease

    Whole Exome Sequencing Reveals Severe Thrombophilia in Acute Unprovoked Idiopathic Fatal Pulmonary Embolism

    No full text
    Background: Acute unprovoked idiopathic fatal pulmonary embolism (IFPE) causes sudden death without an identifiable thrombogenic risk. We aimed to investigate the underlying genomic risks of IFPE through whole exome sequencing (WES). Methods: We reviewed 14 years of consecutive out-of-hospital fatal pulmonary embolism records (n = 1478) from the ethnically diverse population of New York City. We selected 68 qualifying IFPE cases for WES. We compared the WES data of IFPE cases to those of 9332 controls to determine if there is an excess of rare damaging variants in the genome using ethnicity-matched controls in collapsing analyses. Findings: We found nine of the 68 decedents (13·2%) who died of IFPE had at least one pathogenic or likely pathogenic variant in one of the three anti-coagulant genes: SERPINC1 (Antithrombin III), PROC, and PROS1. The odds ratio of developing IFPE as a variant carrier for SERPINC1 is 144·2 (95% CI, 26·3–779·4; P = 1·7 × 10−7), for PROC is 85·6 (95% CI, 13·0–448·9; P = 2.0 × 10−5), and for PROS1 is 56·4 (95% CI, 5·3–351·1; P = 0·001). The average age-at-death of anti-coagulant gene variant carriers is significantly younger than that of non-carriers (28·56 years versus 38·02 years; P = 0·01). Interpretation: This study showed the important role of severe thrombophilia due to natural anti-coagulant deficiency in IFPE. Evaluating severe thrombophilia in out-of-hospital fatal PE beyond IFPE is warranted

    Receiver operating characteristic (ROC) curves to measure the ability of RVIS-CHGV, ncRVIS, pcGERP, ncGERP, ncCADD, ncGWAVA scores and two joint models to discriminate genes reported among ClinGen’s dosage sensitivity map from the rest of the human genome.

    No full text
    <p>Here, for a given score, all assessable genes were used. To obtain the presented levels of significance, we use a logistic regression model to regress the presence or absence of a gene among the ClinGen dosage sensitivity map list on each of the genic scores.</p
    corecore