1,045 research outputs found

    ReLA, a local alignment search tool for the identification of distal and proximal gene regulatory regions and their conserved transcription factor binding sites

    Get PDF
    Motivation: The prediction and annotation of the genomic regions involved in gene expression has been largely explored. Most of the energy has been devoted to the development of approaches that detect transcription start sites, leaving the identification of regulatory regions and their functional transcription factor binding sites (TFBSs) largely unexplored and with important quantitative and qualitative methodological gaps

    Automated Conserved Non-Coding Sequence (CNS) Discovery Reveals Differences in Gene Content and Promoter Evolution among Grasses

    Get PDF
    Conserved non-coding sequences (CNS) are islands of non-coding sequence that, like protein coding exons, show less divergence in sequence between related species than functionless DNA. Several CNSs have been demonstrated experimentally to function as cis-regulatory regions. However, the specific functions of most CNSs remain unknown. Previous searches for CNS in plants have either anchored on exons and only identified nearby sequences or required years of painstaking manual annotation. Here we present an open source tool that can accurately identify CNSs between any two related species with sequenced genomes, including both those immediately adjacent to exons and distal sequences separated by \u3e12 kb of non-coding sequence. We have used this tool to characterize new motifs, associate CNSs with additional functions, and identify previously undetected genes encoding RNA and protein in the genomes of five grass species. We provide a list of 15,363 orthologous CNSs conserved across all grasses tested. We were also able to identify regulatory sequences present in the common ancestor of grasses that have been lost in one or more extant grass lineages. Lists of orthologous gene pairs and associated CNSs are provided for reference inbred lines of arabidopsis, Japonica rice, foxtail millet, sorghum, brachypodium, and maize

    Identification and characterization of non-coding genomic variations associated to cancer diseases

    Get PDF
    [eng] The genetic and molecular bases of most of the human diseases have become one of the main goals of the human biology in the last decades. To be able to unveil the genetic variations and the affected cellular processes associated with a specific disease is crucial in order to generate accurate diagnosis and further therapies. The Next Generation Sequencing (NGS) revolution, with the associated reduction in time and costs of sequencing, has allowed the scientist to access large number of human genomes to their biomedical studies. The study of genetic disorders, cancer in particular, has benefit from NGS identifying genetic variations associated with a given disorder. All these new results, some of them in regions with unknown function, have generated a double challenge in the scientific community. Firstly, detect as much as possible all the different variants associated with a disease, in some complex diseases several. Secondly, to understand the functional impact those modifications are causing in the cell. Regarding the first challenge, this thesis contributes in the identification of genetic modifications throw the development of a bioinformatics tool named SMUFIN (Moncunill et al. 2014). SMUFIN can detect somatic variants related with tumour development and progression in a quickly and effective way. Not limited to the software development, several tumours has been analysed and their somatic variants characterized. These tumours include mantel cell lymphoma, paediatric medulloblastoma and chronic lymphocytic leukaemia (Moncunill et al. 2014; Puente et al. 2015). In the evaluation of the functional impact, the thesis also includes a method, RELA, to determine when these annotated variants play a regulatory role as enhancers or promoters (Gonzalez et al. 2012). Combined with other available data and a spread methodology to unveil regulatory regions evaluation of variants affecting regulatory regions have been performed in chronic lymphocytic leukaemia (details included in the thesis discussion). To sum up, this thesis cover with methodology and provide bioinformatics tools to perform a complete genomic analysis of genetic variants in biomedicine studies. It includes from the identification of variants for each of the patients to the evaluation of their functional impact in the disease development and progression. This kind of approach is currently common in the research laboratories and it will be part of the healthcare system in a close future to diagnose and classify patients.[spa] El estudio de las bases genéticas y moleculares de las patologías humanas ha constituido el centro de atención de gran parte de la investigación en biología durante las últimas décadas con el fin último de comprender los procesos celulares alterados en cada caso y la posibilidad de generar protocolos de diagnosis y terapias específicas. Con la llegada de la denominada Next Generation Sequencing (NGS) y su consiguiente reducción en tiempo y costes ha permitido el acceso a la secuenciación de numeroso genomas humanos en el entorno biomédico. El estudio de enfermedades genéticas, y del cáncer en particular, se ha visto enormemente favorecido al poder incorporar un importante número de genomas de pacientes a sus estudios y así poder identificar directamente las mutaciones asociadas a cada patología. A su vez, esta revolución junto con la capacidad de detectar modificaciones genéticas en regiones cuya función todavía se desconoce, ha generado un doble desafío en la comunidad científica: por un lado el análisis de variantes genéticas asociadas a cada tipo de enfermedad y, por el otro, el entender el impacto funcional que dichas modificaciones provocan en la célula. Esta tesis contribuye a solucionar estas limitaciones a través del desarrollo de una aplicación, SMUFIN (Moncunill et al. 2014), que permite de forma rápida y eficaz la identificación de variaciones somáticas asociadas al desarrollo o progresión de tumores. También se describen los resultados obtenidos relativos a la identificación y caracterización de las reorganizaciones cromosómicas en cáncer, así como los resultados obtenido en cuanto a sus mecanismos e impacto funcional (Puente et al. 2015). Además, como parte de la anotación genómica para la interpretación funcional de las variaciones detectadas, esta tesis incluye los resultados del desarrollo de estrategias y metodologías para la detección de regiones reguladoras en genomas de eucariotas (Gonzalez et al. 2012). En resumen esta tesis intenta cubrir y dotar de herramientas bionformáticas para completar los pasos necesarios para el análisis de genomas en biomedicina, desde que un grupo de pacientes son secuenciados hasta que sus diferentes variantes son identificadas y su impacto funcional determinado. Este tipo de análisis, que ahora esta ocurriendo en el campo de la investigación, pronto será una realidad y una rutina en el sistema sanitario

    Topoisomerase II beta interacts with cohesin and CTCF at topological domain borders

    Get PDF
    BACKGROUND: Type II DNA topoisomerases (TOP2) regulate DNA topology by generating transient double stranded breaks during replication and transcription. Topoisomerase II beta (TOP2B) facilitates rapid gene expression and functions at the later stages of development and differentiation. To gain new insight into the genome biology of TOP2B, we used proteomics (BioID), chromatin immunoprecipitation, and high-throughput chromosome conformation capture (Hi-C) to identify novel proximal TOP2B protein interactions and characterize the genomic landscape of TOP2B binding at base pair resolution. RESULTS: Our human TOP2B proximal protein interaction network included members of the cohesin complex and nucleolar proteins associated with rDNA biology. TOP2B associates with DNase I hypersensitivity sites, allele-specific transcription factor (TF) binding, and evolutionarily conserved TF binding sites on the mouse genome. Approximately half of all CTCF/cohesion-bound regions coincided with TOP2B binding. Base pair resolution ChIP-exo mapping of TOP2B, CTCF, and cohesin sites revealed a striking structural ordering of these proteins along the genome relative to the CTCF motif. These ordered TOP2B-CTCF-cohesin sites flank the boundaries of topologically associating domains (TADs) with TOP2B positioned externally and cohesin internally to the domain loop. CONCLUSIONS: TOP2B is positioned to solve topological problems at diverse cis-regulatory elements and its occupancy is a highly ordered and prevalent feature of CTCF/cohesin binding sites that flank TADs

    MotifMap: integrative genome-wide maps of regulatory motif sites for model species

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>A central challenge of biology is to map and understand gene regulation on a genome-wide scale. For any given genome, only a small fraction of the regulatory elements embedded in the DNA sequence have been characterized, and there is great interest in developing computational methods to systematically map all these elements and understand their relationships. Such computational efforts, however, are significantly hindered by the overwhelming size of non-coding regions and the statistical variability and complex spatial organizations of regulatory elements and interactions. Genome-wide catalogs of regulatory elements for all model species simply do not yet exist.</p> <p>Results</p> <p>The MotifMap system uses databases of transcription factor binding motifs, refined genome alignments, and a comparative genomic statistical approach to provide comprehensive maps of candidate regulatory elements encoded in the genomes of model species. The system is used to derive new genome-wide maps for yeast, fly, worm, mouse, and human. The human map contains 519,108 sites for 570 matrices with a False Discovery Rate of 0.1 or less. The new maps are assessed in several ways, for instance using high-throughput experimental ChIP-seq data and AUC statistics, providing strong evidence for their accuracy and coverage. The maps can be usefully integrated with many other kinds of omic data and are available at <url>http://motifmap.igb.uci.edu/</url>.</p> <p>Conclusions</p> <p>MotifMap and its integration with other data provide a foundation for analyzing gene regulation on a genome-wide scale, and for automatically generating regulatory pathways and hypotheses. The power of this approach is demonstrated and discussed using the P53 apoptotic pathway and the Gli hedgehog pathways as examples.</p

    CBS: an open platform that integrates predictive methods and epigenetics information to characterize conserved regulatory features in multiple Drosophila genomes.

    Full text link
    Background: Information about the composition of regulatory regions is of great value for designing experiments to functionally characterize gene expression. The multiplicity of available applications to predict transcription factor binding sites in a particular locus contrasts with the substantial computational expertise that is demanded to manipulate them, which may constitute a potential barrier for the experimental community. Results: CBS (Conserved regulatory Binding Sites, http://compfly.bio.ub.es/CBS) is a public platform of evolutionarily conserved binding sites and enhancers predicted in multiple Drosophila genomes that is furnished with published chromatin signatures associated to transcriptionally active regions and other experimental sources of information. The rapid access to this novel body of knowledge through a user-friendly web interface enables non-expert users to identify the binding sequences available for any particular gene, transcription factor, or genome region. Conclusions: The CBS platform is a powerful resource that provides tools for data mining individual sequences and groups of co-expressed genes with epigenomics information to conduct regulatory screenings in Drosophila

    Predicting transcription factor binding sites using local over-representation and comparative genomics

    Get PDF
    BACKGROUND: Identifying cis-regulatory elements is crucial to understanding gene expression, which highlights the importance of the computational detection of overrepresented transcription factor binding sites (TFBSs) in coexpressed or coregulated genes. However, this is a challenging problem, especially when considering higher eukaryotic organisms. RESULTS: We have developed a method, named TFM-Explorer, that searches for locally overrepresented TFBSs in a set of coregulated genes, which are modeled by profiles provided by a database of position weight matrices. The novelty of the method is that it takes advantage of spatial conservation in the sequence and supports multiple species. The efficiency of the underlying algorithm and its robustness to noise allow weak regulatory signals to be detected in large heterogeneous data sets. CONCLUSION: TFM-Explorer provides an efficient way to predict TFBS overrepresentation in related sequences. Promising results were obtained in a variety of examples in human, mouse, and rat genomes. The software is publicly available at

    Sheep Genome Functional Annotation Reveals Proximal Regulatory Elements Contributed to The Evolution of Modern Breeds

    Get PDF
    Domestication fundamentally reshaped animal morphology, physiology and behaviour, offering the opportunity to investigate the molecular processes driving evolutionary change. Here we assess sheep domestication and artificial selection by comparing genome sequence from 43 modern breeds (Ovis aries) and their Asian mouflon ancestor (O. orientalis) to identify selection sweeps. Next, we provide a comparative functional annotation of the sheep genome, validated using experimental ChIP-Seq of sheep tissue. Using these annotations, we evaluate the impact of selection and domestication on regulatory sequences and find that sweeps are significantly enriched for protein coding genes, proximal regulatory elements of genes and genome features associated with active transcription. Finally, we find individual sites displaying strong allele frequency divergence are enriched for the same regulatory features. Our data demonstrate that remodelling of gene expression is likely to have been one of the evolutionary forces that drove phenotypic diversification of this common livestock species

    IDENTIFICATION OF A NON-CLASSICAL GLUCOCORTICOID-RESPONSIVE ELEMENT IN THE 5'-FLANKING REGION OF THE CHICKEN GROWTH HORMONE GENE

    Get PDF
    Growth hormone (GH) effects growth and contributes to a lean phenotype in broiler chickens. GH secretion by the anterior pituitary begins on embryonic day (e) 14, concomitantly with a rise in adrenal glucocorticoids (GC) or corticosterone (CORT) secretion. CORT treatment of chicken embryonic pituitary (CEP) cells induces GH secretion prematurely. GC induction of the GH gene requires on-going protein synthesis or an intermediary protein, but the gene lacks a classical GC-response element. We hypothesized that a GC-responsive intermediary protein is necessary for the CORT induced increase in GH. Characterization of the upstream region of the gene may help identify such a protein. To this end, a fragment of the GH gene (-1727/+48) was cloned into a luciferase reporter and characterized in e11 CEP cells. CORT treatment increased luciferase activity and mRNA. Inclusion of CHX blocked CORT induction of luciferase mRNA. Through deletion analysis, we found that a GC-responsive region (GCRR) is located at -1045 to -954. By defining the GC-responsive region and cis-acting elements located within, trans-acting proteins involved in GC induction of the GH gene may be identified. The GCRR is CORT-responsive in either orientation, but it is context-dependent. Potential transcription factor motifs in the GCRR include ETS-1 and a degenerate GRE (GREF). Nuclear proteins bound to a GCRR probe in a CORT-regulated manner and unlabeled competitor DNA competed off binding. Mutation of the central portion of the DNA probe resulted in a significant decrease in protein binding. Mutation of the ETS-1 site or GREF site in the -1045/+48 GH construct resulted in ablation of luciferase activity. ETS-1 and GR are associated with the endogenous gene under basal and 1.5 h CORT-treated conditions, while GR recruitment increased after CORT treatment. GC regulation of the GH gene during chicken embryonic development requires cis-acting elements located 1 kb upstream from the transcription start site and includes recruitment of ETS-1 and GR. This is the first study to demonstrate involvement of ETS-1 in GC regulation of the GH gene during embryonic development. Characterization of GC regulation of the GH gene during embryonic development enhances our understanding of growth regulation in vertebrates

    The dual transcriptional regulator CysR in Corynebacterium glutamicum ATCC 13032 controls a subset of genes of the McbR regulon in response to the availability of sulphide acceptor molecules

    Get PDF
    Background: Regulation of sulphur metabolism in Corynebacterium glutamicum ATCC 13032 has been studied intensively in the last few years, due to its industrial as well as scientific importance. Previously, the gene cg0156 was shown to belong to the regulon of McbR, a global transcriptional repressor of sulphur metabolism in C. glutamicum. This gene encodes a putative ROK-type regulator, a paralogue of the activator of sulphonate utilisation, SsuR. Therefore, it is an interesting candidate for study to further the understanding of the regulation of sulphur metabolism in C. glutamicum. Results: Deletion of cg0156, now designated cysR, results in the inability of the mutant to utilise sulphate and aliphatic sulphonates. DNA microarray hybridisations revealed 49 genes with significantly increased and 48 with decreased transcript levels in presence of the native CysR compared to a cysR deletion mutant. Among the genes positively controlled by CysR were the gene cluster involved in sulphate reduction, fpr2 cysIXHDNYZ, and ssuR. Gel retardation experiments demonstrated that binding of CysR to DNA depends in vitro on the presence of either O-acetyl-L-serine or O-acetyl-L-homoserine. Mapping of the transcription start points of five transcription units helped to identify a 10 bp inverted repeat as the possible CysR binding site. Subsequent in vivo tests proved this motif to be necessary for CysR-dependent transcriptional regulation. Conclusion: CysR acts as the functional analogue of the unrelated LysR-type regulator CysB from Escherichia coli, controlling sulphide production in response to acceptor availability. In both bacteria, gene duplication events seem to have taken place which resulted in the evolution of dedicated regulators for the control of sulphonate utilisation. The striking convergent evolution of network topology indicates the strong selective pressure to control the metabolism of the essential but often toxic sulphur-containing (bio-)molecules
    corecore