33 research outputs found
Recommended from our members
MAPPER: a search engine for the computational identification of putative transcription factor binding sites in multiple genomes
BACKGROUND: Cis-regulatory modules are combinations of regulatory elements occurring in close proximity to each other that control the spatial and temporal expression of genes. The ability to identify them in a genome-wide manner depends on the availability of accurate models and of search methods able to detect putative regulatory elements with enhanced sensitivity and specificity. RESULTS: We describe the implementation of a search method for putative transcription factor binding sites (TFBSs) based on hidden Markov models built from alignments of known sites. We built 1,079 models of TFBSs using experimentally determined sequence alignments of sites provided by the TRANSFAC and JASPAR databases and used them to scan sequences of the human, mouse, fly, worm and yeast genomes. In several cases tested the method identified correctly experimentally characterized sites, with better specificity and sensitivity than other similar computational methods. Moreover, a large-scale comparison using synthetic data showed that in the majority of cases our method performed significantly better than a nucleotide weight matrix-based method. CONCLUSION: The search engine, available at , allows the identification, visualization and selection of putative TFBSs occurring in the promoter or other regions of a gene from the human, mouse, fly, worm and yeast genomes. In addition it allows the user to upload a sequence to query and to build a model by supplying a multiple sequence alignment of binding sites for a transcription factor of interest. Due to its extensive database of models, powerful search engine and flexible interface, MAPPER represents an effective resource for the large-scale computational analysis of transcriptional regulation
Genome-Wide Analyses for Osteosarcoma in Leonberger Dogs Reveal the CDKN2A/B Gene Locus as a Major Risk Locus
Dogs represent a unique spontaneous cancer model. Osteosarcoma (OSA) is the most common primary bone tumor in dogs (OMIA 001441-9615), and strongly resembles human forms of OSA. Several large- to giant-sized dog breeds, including the Leonberger, have a greatly increased risk of developing OSA. We performed genome-wide association analysis with high-density imputed SNP genotype data from 273 Leonberger cases with a median age of 8.1 [3.1â13.5] years and 365 controls older than eight years. This analysis revealed significant associations at the CDKN2A/B gene locus on canine chromosome 11, mirroring previous findings in other dog breeds, such as the greyhound, that also show an elevated risk for OSA. Heritability (h2SNP) was determined to be 20.6% (SE = 0.08; p-value = 5.7 Ă 10â4) based on a breed prevalence of 20%. The 2563 SNPs across the genome accounted for nearly all the h2SNP of OSA, with 2183 SNPs of small effect, 316 SNPs of moderate effect, and 64 SNPs of large effect. As with many other cancers it is likely that regulatory, non-coding variants underlie the increased risk for cancer development. Our findings confirm a complex genetic basis of OSA, moderate heritability, and the crucial role of the CDKN2A/B locus leading to strong cancer predisposition in dogs. It will ultimately be interesting to study and compare the known genetic loci associated with canine OSA in human OSA
Efficient exploration of pan-cancer networks by generalized covariance selection and interactive web content
Statistical network modeling techniques are increasingly important tools to analyze cancer genomics data. However, current tools and resources are not designed to work across multiple diagnoses and technical platforms, thus limiting their applicability to comprehensive pan-cancer datasets such as The Cancer Genome Atlas (TCGA). To address this, we describe a new data driven modeling method, based on generalized Sparse Inverse Covariance Selection (SICS). The method integrates genetic, epigenetic and transcriptional data from multiple cancers, to define links that are present in multiple cancers, a subset of cancers, or a single cancer. It is shown to be statistically robust and effective at detecting direct pathway links in data from TCGA. To facilitate interpretation of the results, we introduce a publicly accessible tool ( ext-link-type="uri" xlink:href="http://cancerlandscapes.org/">cancerlandscapes.org), in which the derived networks are explored as interactive web content, linked to several pathway and pharmacological databases. To evaluate the performance of the method, we constructed a model for eight TCGA cancers, using data from 3900 patients. The model rediscovered known mechanisms and contained interesting predictions. Possible applications include prediction of regulatory relationships, comparison of network modules across multiple forms of cancer and identification of drug targets
Efficient exploration of pan-cancer networks by generalized covariance selection and interactive web content
Statistical network modeling techniques are increasingly important tools to analyze cancer genomics data. However, current tools and resources are not designed to work across multiple diagnoses and technical platforms, thus limiting their applicability to comprehensive pan-cancer datasets such as The Cancer Genome Atlas (TCGA). To address this, we describe a new data driven modeling method, based on generalized Sparse Inverse Covariance Selection (SICS). The method integrates genetic, epigenetic and transcriptional data from multiple cancers, to define links that are present in multiple cancers, a subset of cancers, or a single cancer. It is shown to be statistically robust and effective at detecting direct pathway links in data from TCGA. To facilitate interpretation of the results, we introduce a publicly accessible tool ( ext-link-type="uri" xlink:href="http://cancerlandscapes.org/">cancerlandscapes.org), in which the derived networks are explored as interactive web content, linked to several pathway and pharmacological databases. To evaluate the performance of the method, we constructed a model for eight TCGA cancers, using data from 3900 patients. The model rediscovered known mechanisms and contained interesting predictions. Possible applications include prediction of regulatory relationships, comparison of network modules across multiple forms of cancer and identification of drug targets
Bayesian Estimation of Transcript Levels Using a General Model of Array Measurement Noise
Gene arrays demonstrate a promising ability to characterize expression levels across the entire genome but su#er from significant levels of measurement noise. We present a rigorous new approach to estimate transcript levels and ratios from one or more gene array experiments, given a model of measurement noise and available prior information
Whole-genome sequencing of glioblastoma reveals enrichment of non-coding constraint mutations in known and novel genes
Background Glioblastoma (GBM) has one of the worst 5-year survival rates of all cancers. While genomic studies of the disease have been performed, alterations in the non-coding regulatory regions of GBM have largely remained unexplored. We apply whole-genome sequencing (WGS) to identify non-coding mutations, with regulatory potential in GBM, under the hypothesis that regions of evolutionary constraint are likely to be functional, and somatic mutations are likely more damaging than in unconstrained regions. Results We validate our GBM cohort, finding similar copy number aberrations and mutated genes based on coding mutations as previous studies. Performing analysis on non-coding constraint mutations and their position relative to nearby genes, we find a significant enrichment of non-coding constraint mutations in the neighborhood of 78 genes that have previously been implicated in GBM. Among them, SEMA3C and DYNC1I1 show the highest frequencies of alterations, with multiple mutations overlapping transcription factor binding sites. We find that a non-coding constraint mutation in the SEMA3C promoter reduces the DNA binding capacity of the region. We also identify 1776 other genes enriched for non-coding constraint mutations with likely regulatory potential, providing additional candidate GBM genes. The mutations in the top four genes, DLX5, DLX6, FOXA1, and ISL1, are distributed over promoters, UTRs, and multiple transcription factor binding sites. Conclusions These results suggest that non-coding constraint mutations could play an essential role in GBM, underscoring the need to connect non-coding genomic variation to biological function and disease pathology.De tre första författarna delar förstaförfattarskapetTitle in thesis list of papers: Whole Genome Sequencing of Glioblastoma Reveals Enrichment of Non-Coding Constraint Mutations in Known and Novel Genes</p