60 research outputs found

    Statistical Modeling of Transcription Factor Binding Affinities Predicts Regulatory Interactions

    Get PDF
    Recent experimental and theoretical efforts have highlighted the fact that binding of transcription factors to DNA can be more accurately described by continuous measures of their binding affinities, rather than a discrete description in terms of binding sites. While the binding affinities can be predicted from a physical model, it is often desirable to know the distribution of binding affinities for specific sequence backgrounds. In this paper, we present a statistical approach to derive the exact distribution for sequence models with fixed GC content. We demonstrate that the affinity distribution of almost all known transcription factors can be effectively parametrized by a class of generalized extreme value distributions. Moreover, this parameterization also describes the affinity distribution for sequence backgrounds with variable GC content, such as human promoter sequences. Our approach is applicable to arbitrary sequences and all transcription factors with known binding preferences that can be described in terms of a motif matrix. The statistical treatment also provides a proper framework to directly compare transcription factors with very different affinity distributions. This is illustrated by our analysis of human promoters with known binding sites, for many of which we could identify the known regulators as those with the highest affinity. The combination of physical model and statistical normalization provides a quantitative measure which ranks transcription factors for a given sequence, and which can be compared directly with large-scale binding data. Its successful application to human promoter sequences serves as an encouraging example of how the method can be applied to other sequences

    Gitools: Analysis and Visualisation of Genomic Data Using Interactive Heat-Maps

    Get PDF
    Intuitive visualization of data and results is very important in genomics, especially when many conditions are to be analyzed and compared. Heat-maps have proven very useful for the representation of biological data. Here we present Gitools (http://www.gitools.org), an open-source tool to perform analyses and visualize data and results as interactive heat-maps. Gitools contains data import systems from several sources (i.e. IntOGen, Biomart, KEGG, Gene Ontology), which facilitate the integration of novel data with previous knowledge

    Resolving the Ortholog Conjecture: Orthologs Tend to Be Weakly, but Significantly, More Similar in Function than Paralogs

    Get PDF
    The function of most proteins is not determined experimentally, but is extrapolated from homologs. According to the “ortholog conjecture”, or standard model of phylogenomics, protein function changes rapidly after duplication, leading to paralogs with different functions, while orthologs retain the ancestral function. We report here that a comparison of experimentally supported functional annotations among homologs from 13 genomes mostly supports this model. We show that to analyze GO annotation effectively, several confounding factors need to be controlled: authorship bias, variation of GO term frequency among species, variation of background similarity among species pairs, and propagated annotation bias. After controlling for these biases, we observe that orthologs have generally more similar functional annotations than paralogs. This is especially strong for sub-cellular localization. We observe only a weak decrease in functional similarity with increasing sequence divergence. These findings hold over a large diversity of species; notably orthologs from model organisms such as E. coli, yeast or mouse have conserved function with human proteins

    SeqGene: a comprehensive software solution for mining exome- and transcriptome- sequencing data

    Get PDF
    Abstract Background The popularity of massively parallel exome and transcriptome sequencing projects demands new data mining tools with a comprehensive set of features to support a wide range of analysis tasks. Results SeqGene, a new data mining tool, supports mutation detection and annotation, dbSNP and 1000 Genome data integration, RNA-Seq expression quantification, mutation and coverage visualization, allele specific expression (ASE), differentially expressed genes (DEGs) identification, copy number variation (CNV) analysis, and gene expression quantitative trait loci (eQTLs) detection. We also developed novel methods for testing the association between SNP and expression and identifying genotype-controlled DEGs. We showed that the results generated from SeqGene compares favourably to other existing methods in our case studies. Conclusion SeqGene is designed as a general-purpose software package. It supports both paired-end reads and single reads generated on most sequencing platforms; it runs on all major types of computers; it supports arbitrary genome assemblies for arbitrary organisms; and it scales well to support both large and small scale sequencing projects. The software homepage is http://seqgene.sourceforge.net.</p

    Evolution of the TOR Pathway

    Get PDF
    The TOR kinase is a major regulator of growth in eukaryotes. Many components of the TOR pathway are implicated in cancer and metabolic diseases in humans. Analysis of the evolution of TOR and its pathway may provide fundamental insight into the evolution of growth regulation in eukaryotes and provide a practical framework on which experimental evidence can be compared between species. Here we performed phylogenetic analyses on the components of the TOR pathway and determined their point of invention. We find that the two TOR complexes and a large part of the TOR pathway originated before the Last Eukaryotic Common Ancestor and form a core to which new inputs have been added during animal evolution. In addition, we provide insight into how duplications and sub-functionalization of the S6K, RSK, SGK and PKB kinases shaped the complexity of the TOR pathway. In yeast we identify novel AGC kinases that are orthologous to the S6 kinase. These results demonstrate how a vital signaling pathway can be both highly conserved and flexible in eukaryotes

    Structural variation in the chicken genome identified by paired-end next-generation DNA sequencing of reduced representation libraries

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Variation within individual genomes ranges from single nucleotide polymorphisms (SNPs) to kilobase, and even megabase, sized structural variants (SVs), such as deletions, insertions, inversions, and more complex rearrangements. Although much is known about the extent of SVs in humans and mice, species in which they exert significant effects on phenotypes, very little is known about the extent of SVs in the 2.5-times smaller and less repetitive genome of the chicken.</p> <p>Results</p> <p>We identified hundreds of shared and divergent SVs in four commercial chicken lines relative to the reference chicken genome. The majority of SVs were found in intronic and intergenic regions, and we also found SVs in the coding regions. To identify the SVs, we combined high-throughput short read paired-end sequencing of genomic reduced representation libraries (RRLs) of pooled samples from 25 individuals and computational mapping of DNA sequences from a reference genome.</p> <p>Conclusion</p> <p>We provide a first glimpse of the high abundance of small structural genomic variations in the chicken. Extrapolating our results, we estimate that there are thousands of rearrangements in the chicken genome, the majority of which are located in non-coding regions. We observed that structural variation contributes to genetic differentiation among current domesticated chicken breeds and the Red Jungle Fowl. We expect that, because of their high abundance, SVs might explain phenotypic differences and play a role in the evolution of the chicken genome. Finally, our study exemplifies an efficient and cost-effective approach for identifying structural variation in sequenced genomes.</p

    Advanced Computational Biology Methods Identify Molecular Switches for Malignancy in an EGF Mouse Model of Liver Cancer

    Get PDF
    The molecular causes by which the epidermal growth factor receptor tyrosine kinase induces malignant transformation are largely unknown. To better understand EGFs' transforming capacity whole genome scans were applied to a transgenic mouse model of liver cancer and subjected to advanced methods of computational analysis to construct de novo gene regulatory networks based on a combination of sequence analysis and entrained graph-topological algorithms. Here we identified transcription factors, processes, key nodes and molecules to connect as yet unknown interacting partners at the level of protein-DNA interaction. Many of those could be confirmed by electromobility band shift assay at recognition sites of gene specific promoters and by western blotting of nuclear proteins. A novel cellular regulatory circuitry could therefore be proposed that connects cell cycle regulated genes with components of the EGF signaling pathway. Promoter analysis of differentially expressed genes suggested the majority of regulated transcription factors to display specificity to either the pre-tumor or the tumor state. Subsequent search for signal transduction key nodes upstream of the identified transcription factors and their targets suggested the insulin-like growth factor pathway to render the tumor cells independent of EGF receptor activity. Notably, expression of IGF2 in addition to many components of this pathway was highly upregulated in tumors. Together, we propose a switch in autocrine signaling to foster tumor growth that was initially triggered by EGF and demonstrate the knowledge gain form promoter analysis combined with upstream key node identification

    Identification of Y-Box Binding Protein 1 As a Core Regulator of MEK/ERK Pathway-Dependent Gene Signatures in Colorectal Cancer Cells

    Get PDF
    Transcriptional signatures are an indispensible source of correlative information on disease-related molecular alterations on a genome-wide level. Numerous candidate genes involved in disease and in factors of predictive, as well as of prognostic, value have been deduced from such molecular portraits, e.g. in cancer. However, mechanistic insights into the regulatory principles governing global transcriptional changes are lagging behind extensive compilations of deregulated genes. To identify regulators of transcriptome alterations, we used an integrated approach combining transcriptional profiling of colorectal cancer cell lines treated with inhibitors targeting the receptor tyrosine kinase (RTK)/RAS/mitogen-activated protein kinase pathway, computational prediction of regulatory elements in promoters of co-regulated genes, chromatin-based and functional cellular assays. We identified commonly co-regulated, proliferation-associated target genes that respond to the MAPK pathway. We recognized E2F and NFY transcription factor binding sites as prevalent motifs in those pathway-responsive genes and confirmed the predicted regulatory role of Y-box binding protein 1 (YBX1) by reporter gene, gel shift, and chromatin immunoprecipitation assays. We also validated the MAPK-dependent gene signature in colorectal cancers and provided evidence for the association of YBX1 with poor prognosis in colorectal cancer patients. This suggests that MEK/ERK-dependent, YBX1-regulated target genes are involved in executing malignant properties

    Reconstruction of metabolic pathways for the cattle genome

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Metabolic reconstruction of microbial, plant and animal genomes is a necessary step toward understanding the evolutionary origins of metabolism and species-specific adaptive traits. The aims of this study were to reconstruct conserved metabolic pathways in the cattle genome and to identify metabolic pathways with missing genes and proteins. The MetaCyc database and PathwayTools software suite were chosen for this work because they are widely used and easy to implement.</p> <p>Results</p> <p>An amalgamated cattle genome database was created using the NCBI and Ensembl cattle genome databases (based on build 3.1) as data sources. PathwayTools was used to create a cattle-specific pathway genome database, which was followed by comprehensive manual curation for the reconstruction of metabolic pathways. The curated database, CattleCyc 1.0, consists of 217 metabolic pathways. A total of 64 mammalian-specific metabolic pathways were modified from the reference pathways in MetaCyc, and two pathways previously identified but missing from MetaCyc were added. Comparative analysis of metabolic pathways revealed the absence of mammalian genes for 22 metabolic enzymes whose activity was reported in the literature. We also identified six human metabolic protein-coding genes for which the cattle ortholog is missing from the sequence assembly.</p> <p>Conclusion</p> <p>CattleCyc is a powerful tool for understanding the biology of ruminants and other cetartiodactyl species. In addition, the approach used to develop CattleCyc provides a framework for the metabolic reconstruction of other newly sequenced mammalian genomes. It is clear that metabolic pathway analysis strongly reflects the quality of the underlying genome annotations. Thus, having well-annotated genomes from many mammalian species hosted in BioCyc will facilitate the comparative analysis of metabolic pathways among different species and a systems approach to comparative physiology.</p
    corecore