16 research outputs found
ArrayIDer: automated structural re-annotation pipeline for DNA microarrays
<p>Abstract</p> <p>Background</p> <p>Systems biology modeling from microarray data requires the most contemporary structural and functional array annotation. However, microarray annotations, especially for non-commercial, non-traditional biomedical model organisms, are often dated. In addition, most microarray analysis tools do not readily accept EST clone names, which are abundantly represented on arrays. Manual re-annotation of microarrays is impracticable and so we developed a computational re-annotation tool (<it>ArrayIDer</it>) to retrieve the most recent accession mapping files from public databases based on EST clone names or accessions and rapidly generate database accessions for entire microarrays.</p> <p>Results</p> <p>We utilized the Fred Hutchinson Cancer Research Centre 13K chicken cDNA array β a widely-used non-commercial chicken microarray β to demonstrate the principle that <it>ArrayIDer </it>could markedly improve annotation. We structurally re-annotated 55% of the entire array. Moreover, we decreased non-chicken functional annotations by 2 fold. One beneficial consequence of our re-annotation was to identify 290 pseudogenes, of which 66 were previously incorrectly annotated.</p> <p>Conclusion</p> <p><it>ArrayIDer </it>allows rapid automated structural re-annotation of entire arrays and provides multiple accession types for use in subsequent functional analysis. This information is especially valuable for systems biology modeling in the non-traditional biomedical model organisms.</p
Arboretum: Reconstruction and analysis of the evolutionary history of condition-specific transcriptional modules
Comparative functional genomics studies the evolution of biological processes by analyzing functional data, such as gene expression profiles, across species. A major challenge is to compare profiles collected in a complex phylogeny. Here, we present Arboretum, a novel scalable computational algorithm that integrates expression data from multiple species with species and gene phylogenies to infer modules of coexpressed genes in extant species and their evolutionary histories. We also develop new, generally applicable measures of conservation and divergence in gene regulatory modules to assess the impact of changes in gene content and expression on module evolution. We used Arboretum to study the evolution of the transcriptional response to heat shock in eight species of Ascomycota fungi and to reconstruct modules of the ancestral environmental stress response (ESR). We found substantial conservation in the stress response across species and in the reconstructed components of the ancestral ESR modules. The greatest divergence was in the most induced stress, primarily through module expansion. The divergence of the heat stress response exceeds that observed in the response to glucose depletion in the same species. Arboretum and its associated analyses provide a comprehensive framework to systematically study regulatory evolution of condition-specific responses.Howard Hughes Medical InstituteBroad Institute of MIT and HarvardNational Institutes of Health (U.S.) (Pioneer Award)National Institutes of Health (U.S.) (R01 2R01CA119176-01)Burroughs Wellcome Fund (Career Award at the Scientific Interface)Alfred P. Sloan Foundatio
BioNetBuilder2.0: bringing systems biology to chicken and other model organisms
BACKGROUND:Systems Biology research tools, such as Cytoscape, have greatly extended the reach of genomic research. By providing platforms to integrate data with molecular interaction networks, researchers can more rapidly begin interpretation of large data sets collected for a system of interest. BioNetBuilder is an open-source client-server Cytoscape plugin that automatically integrates molecular interactions from all major public interaction databases and serves them directly to the user's Cytoscape environment. Until recently however, chicken and other eukaryotic model systems had little interaction data available.RESULTS:Version 2.0 of BioNetBuilder includes a redesigned synonyms resolution engine that enables transfer and integration of interactions across speciesthis engine translates between alternate gene names as well as between orthologs in multiple species. Additionally, BioNetBuilder is now implemented to be part of the Gaggle, thereby allowing seamless communication of interaction data to any software implementing the widely used Gaggle software. Using BioNetBuilder, we constructed a chicken interactome possessing 72,000 interactions among 8,140 genes directly in the Cytoscape environment. In this paper, we present a tutorial on how to do so and analysis of a specific use case.CONCLUSION:BioNetBuilder 2.0 provides numerous user-friendly systems biology tools that were otherwise inaccessible to researchers in chicken genomics, as well as other model systems. We provide a detailed tutorial spanning all required steps in the analysis. BioNetBuilder 2.0, the tools for maintaining its data bases, standard operating procedures for creating local copies of its back-end data bases, as well as all of the Gaggle and Cytoscape codes required, are open-source and freely available at http://err.bio.nyu.edu/cytoscape/bionetbuilder/ webcite.This item is part of the UA Faculty Publications collection. For more information this item or other items in the UA Campus Repository, contact the University of Arizona Libraries at [email protected]
Inference and Evolutionary Analysis of Genome-Scale Regulatory Networks in Large Phylogenies
Changes in transcriptional regulatory networks can significantly contribute to species evolution and adaptation. However, identification of genome-scale regulatory networks is an open challenge, especially in non-model organisms. Here, we introduce multi-species regulatory network learning (MRTLE), a computational approach that uses phylogenetic structure, sequence-specific motifs, and transcriptomic data, to infer the regulatory networks in different species. Using simulated data from known networks and transcriptomic data from six divergent yeasts, we demonstrate that MRTLE predicts networks with greater accuracy than existing methods because it incorporates phylogenetic information. We used MRTLE to infer the structure of the transcriptional networks that control the osmotic stress responses of divergent, non-model yeast species and then validated our predictions experimentally. Interrogating these networks reveals that gene duplication promotes network divergence across evolution. Taken together, our approach facilitates study of regulatory network evolutionary dynamics across multiple poorly studied species. Keywords: regulatory networks;
network inference; evolution of gene regulatory networks; evolution of stress response; yeast; probabilistic graphical model; phylogeny; comparative functional genomicsNational Science Foundation (U.S.) (Grant DBI-1350677)National Institutes of Health (U.S.) (Grant R01CA119176-01)National Institutes of Health (U.S.) (Grant DP1OD003958-01
Evolution of variants of yeast site-specific recombinase Flp that utilize native genomic sequences as recombination target sites
As a tool in directed genome manipulations, site-specific recombination is a double-edged sword. Exquisite specificity, while highly desirable, makes it imperative that the target site be first inserted at the desired genomic locale before it can be manipulated. We describe a combination of computational and experimental strategies, based on the tyrosine recombinase Flp and its target site FRT, to overcome this impediment. We document the systematic evolution of Flp variants that can utilize, in a bacterial assay, two sites from the human interleukin 10 gene, IL10, as recombination substrates. Recombination competence on an end target site is acquired via chimeric sites containing mixed sequences from FRT and the genomic locus. This is the first time that a tyrosine site-specific recombinase has been coaxed successfully to perform DNA exchange within naturally occurring sequences derived from a foreign genomic context. We demonstrate the ability of an Flp variant to mediate integration of a reporter cassette in Escherichia coli via recombination at one of the IL10-derived sites
A Genome-Wide Analysis of FRT-Like Sequences in the Human Genome
Efficient and precise genome manipulations can be achieved by the
Flp/FRT system of site-specific DNA recombination.
Applications of this system are limited, however, to cases when target sites for
Flp recombinase, FRT sites, are pre-introduced into a genome
locale of interest. To expand use of the Flp/FRT system in
genome engineering, variants of Flp recombinase can be evolved to recognize
pre-existing genomic sequences that resemble FRT and thus can
serve as recombination sites. To understand the distribution and sequence
properties of genomic FRT-like sites, we performed a
genome-wide analysis of FRT-like sites in the human genome
using the experimentally-derived parameters. Out of 642,151 identified
FRT-like sequences, 581,157 sequences were unique and
12,452 sequences had at least one exact duplicate. Duplicated
FRT-like sequences are located mostly within LINE1, but
also within LTRs of endogenous retroviruses, Alu repeats and other repetitive
DNA sequences. The unique FRT-like sequences were classified
based on the number of matches to FRT within the first four
proximal bases pairs of the Flp binding elements of FRT and the
nature of mismatched base pairs in the same region. The data obtained will be
useful for the emerging field of genome engineering
Recommended from our members
Elucidation of the Cardiac Myogenesis Regulatory Network
Heart development has been extensively studied in numerous organisms throughout the twentieth century. The timing of key inductive signals and the expression of many critical transcription factors have been mapped across a variety of model systems. A collective image of the various stages of cardiac development is beginning to emerge. Although most of the seminal events are conserved across evolution, it is increasingly clear that subtle differences can have substantive effects on models of heart development processes. Furthermore, the overwhelming majority of work contributing to these models has been performed on a gene-by-gene basis. As a result, we have a loosely stitched cross-evolutionary view of cardiogenesis that leaves much to be desired by way of completeness. Thus, in order to move toward a comprehensive model of heart development, we have a critical need for global network views of heart development processes conducted within one species.Cardiac myogenesis, the development of heart muscle cells, is the earliest heart development process and is required for the formation of all adult heart structures. Key signaling pathways, and their precise timing and targets, have only recently begun to be defined. The downstream targets of these pathways and their timing of activation or repression remain largely unknown. To address this, I compiled data from three genomic microarray studies, each addressing a distinct aspect of cardiac myogenesis signaling and expression, to construct a global preliminary network of the primary inductive signals and their downstream targets in the chick model embryology system.The preliminary cardiac myogenesis network obtained from these studies generates far too many hypotheses to test experimentally. The challenge that lies ahead for elucidating the fine structure of this, or any network model, is in determining the next most enlightening experiments. Headway in sorting out more profitable experiments can be made by selecting from among the universe of known interaction data as well as taking advantage of a property selected for throughout evolution - robustness. Network robustness is loosely defined as the ability of a network to maintain input and output properties in the face of perturbation. It is unsurprising that evolution would sculpt such a characteristic into molecular networks required to perform a task in varied environmental and genetic circumstances. However the way in which evolution has engendered this quality has opened the door to an exciting new avenue for in silico experimentation.I present in this dissertation the beginnings of a collaborative project for biological network elucidation software called BioNET. The long-term goal of BioNET is to take a description of a network model and phenotype as input and return a set of candidate network models capable of more robustly producing the phenotype. Fundamental to BioNET is the ability to acquire information from the universe of known molecular interaction data for in silico experimentation in any model system. To this end, I redesigned BioNetBuilder, open-source network integration software, to transfer any and all publicly available interaction data across species and serve them via the web. As these data grow in scale, BioNET will be increasingly useful for identifying the more plausible, among possible network architectures, such as the preliminary cardiac myogenesis network presented in this dissertation