164 research outputs found
Evolution of gene regulation of pluripotency - the case for wiki tracks at genome browsers
<p>Abstract</p> <p>Background</p> <p>Experimentally validated data on gene regulation are hard to obtain. In particular, information about transcription factor binding sites in regulatory regions are scattered around in the literature. This impedes their systematic in-context analysis, e.g. the inference of their conservation in evolutionary history.</p> <p>Results</p> <p>We demonstrate the power of integrative bioinformatics by including curated transcription factor binding site information into the UCSC genome browser, using wiki and custom tracks, which enable easy publication of annotation data. Data integration allows to investigate the evolution of gene regulation of the pluripotency-associated genes Oct4, Sox2 and Nanog. For the first time, experimentally validated transcription factor binding sites in the regulatory regions of all three genes were assembled together based on manual curation of data from 39 publications. Using the UCSC genome browser, these data were then visualized in the context of multi-species conservation based on genomic alignment. We confirm previous hypotheses regarding the evolutionary age of specific regulatory patterns, establishing their "deep homology". We also confirm some other principles of Carroll's "Genetic theory of Morphological Evolution", such as "mosaic pleiotropy", exemplified by the dual role of Sox2 reflected in its regulatory region.</p> <p>Conclusions</p> <p>We were able to elucidate some aspects of the evolution of gene regulation for three genes associated with pluripotency. Based on the expected return on investment for the community, we encourage other scientists to contribute experimental data on gene regulation (original work as well as data collected for reviews) to the UCSC system, to enable studies of the evolution of gene regulation on a large scale, and to report their findings.</p> <p>Reviewers</p> <p>This article was reviewed by Dr. Gustavo Glusman and Dr. Juan Caballero, Institute for Systems Biology, Seattle, USA (nominated by Dr. Doron Lancet, Department of Molecular Genetics, Weizmann Institute of Science, Rehovot, Israel), Dr. Niels Grabe, TIGA Center (BIOQUANT) and Medical Systems Biology Group, Institute of Medical Biometry and Informatics, University Hospital Heidelberg, Germany (nominated by Dr. Mikhail Gelfand, Department of Bioinformatics, Institute of Information Transfer Problems, Russian Academy of Science, Moscow, Russian Federation) and Dr. Franz-Josef Müller, Center for Regenerative Medicine, The Scripps Research Institute, La Jolla, CA, USA and University Hospital for Psychiatry and Psychotherapy (part of ZIP gGmbH), University of Kiel, Germany (nominated by Dr. Trey Ideker, University of California, San Diego, La Jolla CA, United States).</p
Finding common protein interaction patterns across organisms
Protein interactions are an important resource to obtain an understanding of cell function. Recently, researchers have compared networks of interactions in order to understand network evolution. While current methods first infer homologs and then compare topologies, we here present a method which first searches for interesting topologies and then looks for homologs. PINA (protein interaction network analysis) takes the protein interaction networks of two organisms, scans both networks for subnetworks deemed interesting, and then tries to find orthologs among the interesting subnetworks. The application is very fast because orthology investigations are restricted to subnetworks like hubs and clusters that fulfill certain criteria regarding neighborhood and connectivity. Finally, the hubs or clusters found to be related can be visualized and analyzed according to protein annotation
IsoSVM – Distinguishing isoforms and paralogs on the protein level
BACKGROUND: Recent progress in cDNA and EST sequencing is yielding a deluge of sequence data. Like database search results and proteome databases, this data gives rise to inferred protein sequences without ready access to the underlying genomic data. Analysis of this information (e.g. for EST clustering or phylogenetic reconstruction from proteome data) is hampered because it is not known if two protein sequences are isoforms (splice variants) or not (i.e. paralogs/orthologs). However, even without knowing the intron/exon structure, visual analysis of the pattern of similarity across the alignment of the two protein sequences is usually helpful since paralogs and orthologs feature substitutions with respect to each other, as opposed to isoforms, which do not. RESULTS: The IsoSVM tool introduces an automated approach to identifying isoforms on the protein level using a support vector machine (SVM) classifier. Based on three specific features used as input of the SVM classifier, it is possible to automatically identify isoforms with little effort and with an accuracy of more than 97%. We show that the SVM is superior to a radial basis function network and to a linear classifier. As an example application we use IsoSVM to estimate that a set of Xenopus laevis EST clusters consists of approximately 81% cases where sequences are each other's paralogs and 19% cases where sequences are each other's isoforms. The number of isoforms and paralogs in this allotetraploid species is of interest in the study of evolution. CONCLUSION: We developed an SVM classifier that can be used to distinguish isoforms from paralogs with high accuracy and without access to the genomic data. It can be used to analyze, for example, EST data and database search results. Our software is freely available on the Web, under the name IsoSVM
Simplifying gene trees for easier comprehension
BACKGROUND: In the genomic age, gene trees may contain large amounts of data making them hard to read and understand. Therefore, an automated simplification is important. RESULTS: We present a simplification tool for gene trees called TreeSimplifier. Based on species tree information and HUGO gene names, it summarizes "monophyla". These monophyla correspond to subtrees of the gene tree where the evolution of a gene follows species phylogeny, and they are simplified to single leaves in the gene tree. Such a simplification may fail, for example, due to genes in the gene tree that are misplaced. In this way, misplaced genes can be identified. Optionally, our tool glosses over a limited degree of "paraphyly" in a further simplification step. In both simplification steps, species can be summarized into groups and treated as equivalent. In the present study we used our tool to derive a simplified tree of 397 leaves from a tree of 1138 leaves. Comparing the simplified tree to a "cartoon tree" created manually, we note that both agree to a high degree. CONCLUSION: Our automatic simplification tool for gene trees is fast, accurate, and effective. It yields results of similar quality as manual simplification. It should be valuable in phylogenetic studies of large protein families. The software is available at
The Intrinsic Connectome of the Rat Amygdala
The connectomes of nervous systems or parts there of are becoming important subjects of study as the amount of connectivity data increases. Because most tract-tracing studies are performed on the rat, we conducted a comprehensive analysis of the amygdala connectome of this species resulting in a meta-study. The data were imported into the neuroVIISAS system, where regions of the connectome are organized in a controlled ontology and network analysis can be performed. A weighted digraph represents the bilateral intrinsic (connections of regions of the amygdala) and extrinsic (connections of regions of the amygdala to non-amygdaloid regions) connectome of the amygdala. Its structure as well as its local and global network parameters depend on the arrangement of neuronal entities in the ontology. The intrinsic amygdala connectome is a small-world and scale-free network. The anterior cortical nucleus (72 in- and out-going edges), the posterior nucleus (45), and the anterior basomedial nucleus (44) are the nuclear regions that posses most in- and outdegrees. The posterior nucleus turns out to be the most important nucleus of the intrinsic amygdala network since its Shapley rate is minimal. Within the intrinsic amygdala, regions were determined that are essential for network integrity. These regions are important for behavioral (processing of emotions and motivation) and functional (memory) performances of the amygdala as reported in other studies
Health and longevity studies in C. elegans: the ‘‘healthy worm database’’ reveals strengths, weaknesses and gaps of test compound-based studies
Several biogerontology databases exist
that focus on genetic or gene expression data linked
to health as well as survival, subsequent to compound
treatments or genetic manipulations in animal models.
However, none of these has yet collected experimental
results of compound-related health changes. Since
quality of life is often regarded as more valuable than
length of life, we aim to fill this gap with the ‘‘Healthy
Worm Database’’ (http://healthy-worm-database.eu).
Literature describing health-related compound studies
in the aging model Caenorhabditis elegans was
screened, and data for 440 compounds collected. The
database considers 189 publications describing 89
different phenotypes measured in 2995 different conditions. Besides enabling a targeted search for
promising compounds for further investigations, this
database also offers insights into the research field of
studies on healthy aging based on a frequently used
model organism. Some weaknesses of C. elegansbased aging studies, like underrepresented phenotypes, especially concerning cognitive functions, as
well as the convenience-based use of young worms as
the starting point for compound treatment or phenotype measurement are discussed. In conclusion, the
database provides an anchor for the search for compounds affecting health, with a link to public databases, and it further highlights some potential
shortcomings in current aging research.Peer Reviewe
Comparative computational analysis of pluripotency in human and mouse stem cells
Pluripotent cells can be subdivided into two distinct states, the naïve and
the primed state, the latter being further advanced on the path of
differentiation. There are substantial differences in the regulation of
pluripotency between human and mouse, and in humans only stem cells that
resemble the primed state in mouse are readily available. Reprogramming of
human stem cells into a more naïve-like state is an important research focus.
Here, we developed a pipeline to reanalyze transcriptomics data sets that
describe both states, naïve and primed pluripotency, in human and mouse. The
pipeline consists of identifying regulated start-ups/shut-downs in terms of
molecular interactions, followed by functional annotation of the genes
involved and aggregation of results across conditions, yielding sets of
mechanisms that are consistently regulated in transitions towards similar
states of pluripotency. Our results suggest that one published protocol for
naïve human cells gave rise to human cells that indeed share putative
mechanisms with the prototypical naïve mouse pluripotent cells, such as DNA
damage response and histone acetylation. However, cellular response and
differentiation-related mechanisms are similar between the naïve human state
and the primed mouse state, so the naïve human state did not fully reflect the
naïve mouse state
The PluriNetWork: An Electronic Representation of the Network Underlying Pluripotency in Mouse, and Its Applications
BACKGROUND: Analysis of the mechanisms underlying pluripotency and reprogramming would benefit substantially from easy access to an electronic network of genes, proteins and mechanisms. Moreover, interpreting gene expression data needs to move beyond just the identification of the up-/downregulation of key genes and of overrepresented processes and pathways, towards clarifying the essential effects of the experiment in molecular terms. METHODOLOGY/PRINCIPAL FINDINGS: We have assembled a network of 574 molecular interactions, stimulations and inhibitions, based on a collection of research data from 177 publications until June 2010, involving 274 mouse genes/proteins, all in a standard electronic format, enabling analyses by readily available software such as Cytoscape and its plugins. The network includes the core circuit of Oct4 (Pou5f1), Sox2 and Nanog, its periphery (such as Stat3, Klf4, Esrrb, and c-Myc), connections to upstream signaling pathways (such as Activin, WNT, FGF, BMP, Insulin, Notch and LIF), and epigenetic regulators as well as some other relevant genes/proteins, such as proteins involved in nuclear import/export. We describe the general properties of the network, as well as a Gene Ontology analysis of the genes included. We use several expression data sets to condense the network to a set of network links that are affected in the course of an experiment, yielding hypotheses about the underlying mechanisms. CONCLUSIONS/SIGNIFICANCE: We have initiated an electronic data repository that will be useful to understand pluripotency and to facilitate the interpretation of high-throughput data. To keep up with the growth of knowledge on the fundamental processes of pluripotency and reprogramming, we suggest to combine Wiki and social networking software towards a community curation system that is easy to use and flexible, and tailored to provide a benefit for the scientist, and to improve communication and exchange of research results. A PluriNetWork tutorial is available at http://www.ibima.med.uni-rostock.de/IBIMA/PluriNetWork/
In Silico Approaches and the Role of Ontologies in Aging Research
The 2013 Rostock Symposium on Systems Biology and Bioinformatics in Aging Research was again dedicated to dissecting the aging process using in silico means. A particular focus was on ontologies, as these are a key technology to systematically integrate heterogeneous information about the aging process. Related topics were databases and data integration. Other talks tackled modeling issues and applications, the latter including talks focussed on marker development and cellular stress as well as on diseases, in particular on diseases of kidney and skin
- …