Search CORE

31 research outputs found

Correction: Improving the Precision of the Structure–Function Relationship by Considering Phylogenetic Context

Author: Shakhnovich Boris E
Publication venue: Public Library of Science
Publication date: 01/01/2005
Field of study

Crossref

Directory of Open Access Journals

PubMed Central

A first-principles model of early evolution: Emergence of gene families, species and preferred protein folds

Author: Chen Peiqiu
Shakhnovich Boris
Shakhnovich Eugene
Zeldovich Konstantin
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2007
Field of study

In this work we develop a microscopic physical model of early evolution, where phenotype,organism life expectancy, is directly related to genotype, the stability of its proteins in their native conformations which can be determined exactly in the model. Simulating the model on a computer, we consistently observe the Big Bang scenario whereby exponential population growth ensues as soon as favorable sequence-structure combinations (precursors of stable proteins) are discovered. Upon that, random diversity of the structural space abruptly collapses into a small set of preferred proteins. We observe that protein folds remain stable and abundant in the population at time scales much greater than mutation or organism lifetime, and the distribution of the lifetimes of dominant folds in a population approximately follows a power law. The separation of evolutionary time scales between discovery of new folds and generation of new sequences gives rise to emergence of protein families and superfamilies whose sizes are power-law distributed, closely matching the same distributions for real proteins. On the population level we observe emergence of species, subpopulations which carry similar genomes. Further we present a simple theory that relates stability of evolving proteins to the sizes of emerging genomes. Together, these results provide a microscopic first principles picture of how first gene families developed in the course of early evolutionComment: In press, PLoS Computational Biolog

arXiv.org e-Print Archive

Public Library of Science (PLOS)

Harvard University - DASH

Directory of Open Access Journals

PubMed Central

Protein structure and evolutionary history determine sequence space topology

Author: Deeds Eric J.
Delisi Charles
Shakhnovich Boris E.
Shakhnovich Eugene I.
Publication venue: 'Cold Spring Harbor Laboratory'
Publication date: 01/03/2005
Field of study

This is the publisher's version, also available electronically from http://genome.cshlp.org/content/15/3/385.Understanding the observed variability in the number of homologs of a gene is a very important unsolved problem that has broad implications for research into coevolution of structure and function, gene duplication, pseudogene formation, and possibly for emerging diseases. Here, we attempt to define and elucidate some possible causes behind the observed irregularity in sequence space. We present evidence that sequence variability and functional diversity of a gene or fold family is influenced by quantifiable characteristics of the protein structure. These characteristics reflect the structural potential for sequence plasticity, i.e., the ability to accept mutation without losing thermodynamic stability. We identify a structural feature of a protein domain—contact density—that serves as a determinant of entropy in sequence space, i.e., the ability of a protein to accept mutations without destroying the fold (also known as fold designability). We show that (log) of average gene family size exhibits statistical correlation (R2 > 0.9.) with contact density of its three-dimensional structure. We present evidence that the size of individual gene families are influenced not only by the designability of the structure, but also by evolutionary history, e.g., the amount of time the gene family was in existence. We further show that our observed statistical correlation between gene family size and contact density of the structure is valid on many levels of evolutionary divergence, i.e., not only for closely related sequence, but also for less-related fold and superfamily levels of homology

KU ScholarWorks

PubMed Central

Recommended from our members

Improvisation in evolution of genes and genomes: whose structure is it anyway?

Author: Shakhnovich Boris E
Shakhnovich Eugene Isaacovitch
Publication venue: 'Elsevier BV'
Publication date: 18/07/2017
Field of study

Significant progress has been made in recent years in a variety of seemingly unrelated fields such as sequencing, protein structure prediction, and high-throughput transcriptomics and metabolomics. At the same time new microscopic models were developed that made it possible to analyze evolution of genes and genomes from first principles. The results from these efforts enable, for the first time, a comprehensive insight into the evolution of complex systems and organisms on all scales – from sequences to organisms and populations. Every newly sequenced genome uncovers new genes, families, and folds. Where do these new genes come from? How does gene duplication and subsequent divergence of sequence and structure affect the fitness of the organism? What role does regulation play in the evolution of proteins and folds? Emerging synergism between data and modeling provide first robust answers to these questions.Chemistry and Chemical Biolog

Harvard University - DASH

Recommended from our members

ELISA: Structure-Function Inferences Based On Statistically Significant and Evolutionarily Inspired Observations

Author: Comeau Steve
DeLisi Charles
Harvey John
Lorenz David
Shakhnovich Boris E
Shakhnovich Eugene Isaacovitch
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 06/10/2010
Field of study

The problem of functional annotation based on homology modeling is primary to current bioinformatics research. Researchers have noted regularities in sequence, structure and even chromosome organization that allow valid functional cross-annotation. However, these methods provide a lot of false negatives due to limited specificity inherent in the system. We want to create an evolutionarily inspired organization of data that would approach the issue of structure-function correlation from a new, probabilistic perspective. Such organization has possible applications in phylogeny, modeling of functional evolution and structural determination. ELISA (Evolutionary Lineage Inferred from Structural Analysis, ) is an online database that combines functional annotation with structure and sequence homology modeling to place proteins into sequence-structure-function "neighborhoods". The atomic unit of the database is a set of sequences and structural templates that those sequences encode. A graph that is built from the structural comparison of these templates is called PDUG (protein domain universe graph). We introduce a method of functional inference through a probabilistic calculation done on an arbitrary set of PDUG nodes. Further, all PDUG structures are mapped onto all fully sequenced proteomes allowing an easy interface for evolutionary analysis and research into comparative proteomics. ELISA is the first database with applicability to evolutionary structural genomics explicitly in mind. Availability: The database is available at http://romi.bu.edu/elisa.Chemistry and Chemical Biolog

Harvard University - DASH

ELISA: Structure-Function Inferences based on statistically significant and evolutionarily inspired observations

Author: Comeau Steve
DeLisi Charles
Harvey John M
Lorenz David
Shakhnovich Boris E
Shakhnovich Eugene
Publication venue: BioMed Central
Publication date: 01/01/2003
Field of study

Boston University Institutional Repository (OpenBU)

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Binding Site Graphs: A New Graph Theoretical Framework for Prediction of Transcription Factor Binding Sites

Author: Boris E Shakhnovich
Charles DeLisi
Gary Stormo
Timothy E Reddy
Publication venue: Public Library of Science
Publication date: 01/01/2007
Field of study

Computational prediction of nucleotide binding specificity for transcription factors remains a fundamental and largely unsolved problem. Determination of binding positions is a prerequisite for research in gene regulation, a major mechanism controlling phenotypic diversity. Furthermore, an accurate determination of binding specificities from high-throughput data sources is necessary to realize the full potential of systems biology. Unfortunately, recently performed independent evaluation showed that more than half the predictions from most widely used algorithms are false. We introduce a graph-theoretical framework to describe local sequence similarity as the pair-wise distances between nucleotides in promoter sequences, and hypothesize that densely connected subgraphs are indicative of transcription factor binding sites. Using a well-established sampling algorithm coupled with simple clustering and scoring schemes, we identify sets of closely related nucleotides and test those for known TF binding activity. Using an independent benchmark, we find our algorithm predicts yeast binding motifs considerably better than currently available techniques and without manual curation. Importantly, we reduce the number of false positive predictions in yeast to less than 30%. We also develop a framework to evaluate the statistical significance of our motif predictions. We show that our approach is robust to the choice of input promoters, and thus can be used in the context of predicting binding positions from noisy experimental data. We apply our method to identify binding sites using data from genome scale ChIP–chip experiments. Results from these experiments are publicly available at http://cagt10.bu.edu/BSG. The graphical framework developed here may be useful when combining predictions from numerous computational and experimental measures. Finally, we discuss how our algorithm can be used to improve the sensitivity of computational predictions of transcription factor binding specificities

Public Library of Science (PLOS)

CiteSeerX

Crossref

Boston University Institutional Repository (OpenBU)

Directory of Open Access Journals

PubMed Central

Positional clustering improves computational binding site detection and identifies novel cis-regulatory sites in mammalian GABA(A) receptor subunit genes

Author: Aerts
Anand
Ballas
Ballas
Blackwood
Boris E. Shakhnovich
Bosman
Brooks-Kayal
Bussemaker
Charles DeLisi
Daniel S. Roberts
Dawson
Dolan
Friberg
Frith
Gray
Harbison
Iyer
Kaplan
Kerr
Kirkness
Kuo
Lawrence
Lee
Lewin
Li
Liu
Macisaac
MacIsaac
Madhani
Morozov
Niehrs
Pellegrini
Perier
Pietrokovski
Purves
Reddy
Roberts
Roberts
Roth
Saffer
Shelley J. Russek
Siegel
Steiger
Stormo
Stormo
Swendeman
Temple
Therrien
Thiagalingam
Thijs
Timothy E. Reddy
Tompa
Treiman
Wall
Wasserman
Winderickx
Wingender
Wu
Publication venue: Oxford University Press
Publication date: 03/01/2007
Field of study

Understanding transcription factor (TF) mediated control of gene expression remains a major challenge at the interface of computational and experimental biology. Computational techniques predicting TF-binding site specificity are frequently unreliable. On the other hand, comprehensive experimental validation is difficult and time consuming. We introduce a simple strategy that dramatically improves robustness and accuracy of computational binding site prediction. First, we evaluate the rate of recurrence of computational TFBS predictions by commonly used sampling procedures. We find that the vast majority of results are biologically meaningless. However clustering results based on nucleotide position improves predictive power. Additionally, we find that positional clustering increases robustness to long or imperfectly selected input sequences. Positional clustering can also be used as a mechanism to integrate results from multiple sampling approaches for improvements in accuracy over each one alone. Finally, we predict and validate regulatory sequences partially responsible for transcriptional control of the mammalian type A γ-aminobutyric acid receptor (GABA(A)R) subunit genes. Positional clustering is useful for improving computational binding site predictions, with potential application to improving our understanding of mammalian gene expression. In particular, predicted regulatory mechanisms in the mammalian GABA(A)R subunit gene family may open new avenues of research towards understanding this pharmacologically important neurotransmitter receptor system

Crossref

Boston University Institutional Repository (OpenBU)

PubMed Central

A graph-theoretical treatment of protein domain evolution

Author: Shakhnovich Boris E.
Publication venue: Boston University
Publication date: 01/01/2004
Field of study

Thesis (Ph.D.)--Boston University. PLEASE NOTE: Boston University Libraries did not receive an Authorization To Manage form for this thesis or dissertation. It is therefore not openly accessible, though it may be available by request. If you are the author or principal advisor of this work and would like to request open access for it, please contact us at [email protected]. Thank you.Understanding the mechanisms and driving forces behind molecular evolution is the defining challenge ofcomputational biology. However, a comprehensive, quantitative theory ofmolecular evolution remains elusive. We evaluate a new graph-theoretic treatment ofthis problem. We start by defining a multi-dimensional protein domain universe graph (PDUG). The nodes in this graph are the atomic units of evolution - structures ofrecurring domains and sequences that fold into those structures. Each ofthe three dimensions in PDUG-structure, function and phylogeny represents a potential constraint from evolutionary pressure. We go on to characterize graph-theoretic properties such as phase transitions, power-law degree distributions, and correlations between the three dimensions. We compare the observed properties with those expected from random graphs. The comparison enables us to identify the likely contours of sets of co-evolved proteins. We further our understanding by assessing several computationally tractable models of evolution that recapitulate some fundamental characteristics of PDUG. We go on to define fitness characteristics derived from simple physical properties of structure and function that serve to clarify the uneven relationship between fold and sequence space topology. However, we also find that evolutionary history plays a crucial role since structural fitness is only the potential for sequence entropy, while variable time of evolutionary search determines the fulfillment of that potential. Armed with our new understanding of protein fitness we describe its progression over time. We establish that eukaryotic domains enjoy a faster exploration of sequence and function space than prokaryotic ones. We further note that biological phenomena such as thermophilic adaptation and duplication success may be explained in light of our newly found understanding ofprotein fitness. Finally, we employ the newly developed PDUG paradigm to quantify the structure-function relationship. We show through modeling of divergent evolution that functions coalesce non-randomly as sfructural clusters grow. We fmd that the widely held hierarchical description of structure space has theoretical underpinnings in the natural clustering of the PDUG. We finish by calculating the theoretical lower limit of uncertainty inherent in structure function correlation of protein domains.2031-01-0

Boston University Institutional Repository (OpenBU)