1,804 research outputs found

    Expression Changes Confirm Genomic Variants Predicted to Result in Allele-Specific, Alternative mRNA Splicing

    Get PDF
    Splice isoform structure and abundance can be affected by either noncoding or masquerading coding variants that alter the structure or abundance of transcripts. When these variants are common in the population, these nonconstitutive transcripts are sufficiently frequent so as to resemble naturally occurring, alternative mRNA splicing. Prediction of the effects of such variants has been shown to be accurate using information theory-based methods. Single nucleotide polymorphisms (SNPs) predicted to significantly alter natural and/or cryptic splice site strength were shown to affect gene expression. Splicing changes for known SNP genotypes were confirmed in HapMap lymphoblastoid cell lines with gene expression microarrays and custom designed q-RT-PCR or TaqMan assays. The majority of these SNPs (15 of 22) as well as an independent set of 24 variants were then subjected to RNAseq analysis using the ValidSpliceMut web beacon (http://validsplicemut.cytognomix.com), which is based on data from the Cancer Genome Atlas and International Cancer Genome Consortium. SNPs from different genes analyzed with gene expression microarray and q-RT-PCR exhibited significant changes in affected splice site use. Thirteen SNPs directly affected exon inclusion and 10 altered cryptic site use. Homozygous SNP genotypes resulting in stronger splice sites exhibited higher levels of processed mRNA than alleles associated with weaker sites. Four SNPs exhibited variable expression among individuals with the same genotypes, masking statistically significant expression differences between alleles. Genome-wide information theory and expression analyses (RNAseq) in tumor exomes and genomes confirmed splicing effects for 7 of the HapMap SNP and 14 SNPs identified from tumor genomes. q-RT-PCR resolved rare splice isoforms with read abundance too low for statistical significance in ValidSpliceMut. Nevertheless, the web-beacon provides evidence of unanticipated splicing outcomes, for example, intron retention due to compromised recognition of constitutive splice sites. Thus, ValidSpliceMut and q-RT-PCR represent complementary resources for identification of allele-specific, alternative splicing

    Estimating partial body ionizing radiation exposure by automated cytogenetic biodosimetry

    Get PDF
    Purpose: Inhomogeneous exposures to ionizing radiation can be detected and quantified with the Dicentric Chromosome Assay (DCA) of metaphase cells. Complete automation of interpretation of the DCA for whole body irradiation has significantly improved throughput without compromising accuracy, however low levels of residual false positive dicentric chromosomes (DCs) have confounded its application for partial body exposure determination. Materials and Methods: We describe a method of estimating and correcting for false positive DCs in digitally processed images of metaphase cells. Nearly all DCs detected in unirradiated calibration samples are introduced by digital image processing. DC frequencies of irradiated calibration samples and those exposed to unknown radiation levels are corrected subtracting this false positive fraction from each. In partial body exposures, the fraction of cells exposed, and radiation dose can be quantified after applying this modification of the contaminated Poisson method. Results: Dose estimates of three partially irradiated samples diverged 0.2 to 2.5 Gy from physical doses and irradiated cell fractions deviated by 2.3-15.8% from the known levels. Synthetic partial body samples comprised of unirradiated and 3 Gy samples from 4 laboratories were correctly discriminated as inhomogeneous by multiple criteria. Root mean squared errors of these dose estimates ranged from 0.52 to 1.14 Gy2 and from 8.1 to 33.3%2 for the fraction of cells irradiated. Conclusions: Automated DCA can differentiate whole- from partial-body radiation exposures and provides timely quantification of estimated whole-body equivalent dose

    Ab Initio Exon Definition Using an Information Theory-based Approach

    Get PDF
    Transcribed exons in genes are joined together at donor and acceptor splice sites precisely and efficiently to generate mRNAs capa ble of being translated into proteins. The sequence variability in individual splice sites can be modeled using Shannon information theory. In the laboratory, the degree of individual splice site use is inferred from the structures of mRNAs and their relative abundance. These structures can be predicted using a bipartite information theory framework that is guided by current knowledge of biological mechanisms for exon recognition. We present the results of this analysis for the complete dataset of all expressed human exons

    Multigene signatures of responses to chemotherapy derived by biochemically-inspired machine learning.

    Get PDF
    Pharmacogenomic responses to chemotherapy drugs can be modeled by supervised machine learning of expression and copy number of relevant gene combinations. Such biochemical evidence can form the basis of derived gene signatures using cell line data, which can subsequently be examined in patients that have been treated with the same drugs. These gene signatures typically contain elements of multiple biochemical pathways which together comprise multiple origins of drug resistance or sensitivity. The signatures can capture variation in these responses to the same drug among different patients

    Transcription factor binding site clusters identify target genes with similar tissue-wide expression and buffer against mutations.

    Get PDF
    Background: The distribution and composition of cis-regulatory modules composed of transcription factor (TF) binding site (TFBS) clusters in promoters substantially determine gene expression patterns and TF targets. TF knockdown experiments have revealed that TF binding profiles and gene expression levels are correlated. We use TFBS features within accessible promoter intervals to predict genes with similar tissue-wide expression patterns and TF targets using Machine Learning (ML). Methods: Bray-Curtis Similarity was used to identify genes with correlated expression patterns across 53 tissues. TF targets from knockdown experiments were also analyzed by this approach to set up the ML framework. TFBSs were selected within DNase I-accessible intervals of corresponding promoter sequences using information theory-based position weight matrices (iPWMs) for each TF. Features from information-dense clusters of TFBSs were input to ML classifiers which predict these gene targets along with their accuracy, specificity and sensitivity. Mutations in TFBSs were analyzed in silico to examine their impact on TFBS clustering and predict changes in gene regulation. Results: The glucocorticoid receptor gene (NR3C1), whose regulation has been extensively studied, was selected to test this approach. SLC25A32 and TANK exhibited the most similar expression patterns to NR3C1. A Decision Tree classifier exhibited the best performance in detecting such genes, based on Area Under the Receiver Operating Characteristic curve (ROC). TF target gene prediction was confirmed using siRNA knockdown, which was more accurate than CRISPR/CAS9 inactivation. TFBS mutation analyses revealed that accurate target gene prediction required at least 1 information-dense TFBS cluster. Conclusions: ML based on TFBS information density, organization, and chromatin accessibility accurately identifies gene targets with comparable tissue-wide expression patterns. Multiple information-dense TFBS clusters in promoters appear to protect promoters from effects of deleterious binding site mutations in a single TFBS that would otherwise alter regulation of these genes

    Estimating partial body ionizing radiation exposure by automated cytogenetic biodosimetry

    Get PDF
    Purpose: Inhomogeneous exposures to ionizing radiation can be detected and quantified with the dicentric chromosome assay (DCA) of metaphase cells. Complete automation of interpretation of the DCA for whole-body irradiation has significantly improved throughput without compromising accuracy, however, low levels of residual false positive dicentric chromosomes (DCs) have confounded its application for partial-body exposure determination. Materials and methods: We describe a method of estimating and correcting for false positive DCs in digitally processed images of metaphase cells. Nearly all DCs detected in unirradiated calibration samples are introduced by digital image processing. DC frequencies of irradiated calibration samples and those exposed to unknown radiation levels are corrected subtracting this false positive fraction from each. In partial-body exposures, the fraction of cells exposed, and radiation dose can be quantified after applying this modification of the contaminated Poisson method. Results: Dose estimates of three partially irradiated samples diverged 0.2-2.5 Gy from physical doses and irradiated cell fractions deviated by 2.3%-15.8% from the known levels. Synthetic partial-body samples comprised of unirradiated and 3 Gy samples from 4 laboratories were correctly discriminated as inhomogeneous by multiple criteria. Root mean squared errors of these dose estimates ranged from 0.52 to 1.14 Gy2 and from 8.1 to 33.3%2 for the fraction of cells irradiated. Conclusions: Automated DCA can differentiate whole- from partial-body radiation exposures and provides timely quantification of estimated whole-body equivalent dose

    BIPAD: A web server for modeling bipartite sequence elements

    Get PDF
    BACKGROUND: Many dimeric protein complexes bind cooperatively to families of bipartite nucleic acid sequence elements, which consist of pairs of conserved half-site sequences separated by intervening distances that vary among individual sites. RESULTS: We introduce the Bipad Server [1], a web interface to predict sequence elements embedded within unaligned sequences. Either a bipartite model, consisting of a pair of one-block position weight matrices (PWM's) with a gap distribution, or a single PWM matrix for contiguous single block motifs may be produced. The Bipad program performs multiple local alignment by entropy minimization and cyclic refinement using a stochastic greedy search strategy. The best models are refined by maximizing incremental information contents among a set of potential models with varying half site and gap lengths. CONCLUSION: The web service generates information positional weight matrices, identifies binding site motifs, graphically represents the set of discovered elements as a sequence logo, and depicts the gap distribution as a histogram. Server performance was evaluated by generating a collection of bipartite models for distinct DNA binding proteins

    Predicting Response to Platin Chemotherapy Agents with Biochemically-inspired Machine Learning

    Get PDF
    Selection of effective genes that accurately predict chemotherapy response could improve cancer outcomes. We compare optimized gene signatures for cisplatin, carboplatin, and oxaliplatin response in the same cell lines, and respectively validate each with cancer patient data. Supervised support vector machine learning was used to derive gene sets whose expression was related to cell line GI50 values by backwards feature selection with cross-validation. Specific genes and functional pathways distinguishing sensitive from resistant cell lines are identified by contrasting signatures obtained at extreme vs. median GI50 thresholds. Ensembles of gene signatures at different thresholds are combined to reduce dependence on specific GI50 values for predicting drug response. The most accurate gene signatures for each platin are: cisplatin: BARD1, BCL2, BCL2L1, CDKN2C, FAAP24, FEN1, MAP3K1, MAPK13, MAPK3, NFKB1, NFKB2, SLC22A5, SLC31A2, TLR4, TWIST1; carboplatin: AKT1, EIF3K, ERCC1, GNGT1, GSR, MTHFR, NEDD4L, NLRP1, NRAS, RAF1, SGK1, TIGD1, TP53, VEGFB, VEGFC; oxaliplatin: BRAF, FCGR2A, IGF1, MSH2, NAGK, NFE2L2, NQO1, PANK3, SLC47A1, SLCO1B1, UGT1A1. TCGA bladder, ovarian and colorectal cancer patients were used to test cisplatin, carboplatin and oxaliplatin signatures (respectively), resulting in 71.0%, 60.2% and 54.5% accuracy in predicting disease recurrence and 59%, 61% and 72% accuracy in predicting remission. One cisplatin signature predicted 100% of recurrence in non-smoking bladder cancer patients (57% disease-free; N=19), and 79% recurrence in smokers (62% disease-free; N=35). This approach should be adaptable to other studies of chemotherapy response, independent of drug or cancer types

    Likely community transmission of COVID-19 infections between neighboring, persistent hotspots in Ontario, Canada

    Get PDF
    Introduction: This study aimed to produce community-level geo-spatial mapping of confirmed COVID-19 cases in Ontario Canada in near real-time to support decision-making. This was accomplished by area-to-area geostatistical analysis, space-time integration, and spatial interpolation of COVID-19 positive individuals.Methods: COVID-19 cases and locations were curated for geostatistical analyses from March 2020 through June 2021, corresponding to the first, second, and third waves of infections. Daily cases were aggregated according to designated forward sortation area (FSA), and postal codes (PC) in municipal regions Hamilton, Kitchener/Waterloo, London, Ottawa, Toronto, and Windsor/Essex county. Hotspots were identified with area-to-area tests including Getis-Ord Gi*, Global Moran’s I spatial autocorrelation, and Local Moran’s I asymmetric clustering and outlier analyses. Case counts were also interpolated across geographic regions by Empirical Bayesian Kriging, which localizes high concentrations of COVID-19 positive tests, independent of FSA or PC boundaries. The Geostatistical Disease Epidemiology Toolbox, which is freely-available software, automates the identification of these regions and produces digital maps for public health professionals to assist in pandemic management of contact tracing and distribution of other resources. Results: This study provided indicators in real-time of likely, community-level disease transmission through innovative geospatial analyses of COVID-19 incidence data. Municipal and provincial results were validated by comparisons with known outbreaks at long-term care and other high density residences and on farms. PC-level analyses revealed hotspots at higher geospatial resolution than public reports of FSAs, and often sooner. Results of different tests and kriging were compared to determine consistency among hotspot assignments. Concurrent or consecutive hotspots in close proximity suggested potential community transmission of COVID-19 from cluster and outlier analysis of neighboring PCs and by kriging. Results were also stratified by population based-categories (sex, age, and presence/absence of comorbidities).Conclusions: Earlier recognition of hotspots could reduce public health burdens of COVID-19 and expedite contact tracing
    • …
    corecore