Search CORE

arXiv.org e-Print Archive

Size and structure of the sequence space of repeat proteins

Author: Espada Rocio
Ferreiro Diego
Galpern Ezequiel Alejandro
Marchi Jacopo
Mora Thierry
Walczak Aleksandra M.
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 03/07/2019
Field of study

The coding space of protein sequences is shaped by evolutionary constraints set by requirements of function and stability. We show that the coding space of a given protein family-the total number of sequences in that family-can be estimated using models of maximum entropy trained on multiple sequence alignments of naturally occuring amino acid sequences. We analyzed and calculated the size of three abundant repeat proteins families, whose members are large proteins made of many repetitions of conserved portions of *30 amino acids. While amino acid conservation at each position of the alignment explains most of the reduction of diversity relative to completely random sequences, we found that correlations between amino acid usage at different positions significantly impact that diversity. We quantified the impact of different types of correlations, functional and evolutionary, on sequence diversity. Analysis of the detailed structure of the coding space of the families revealed a rugged landscape, with many local energy minima of varying sizes with a hierarchical structure, reminiscent of fustrated energy landscapes of spin glass in physics. This clustered structure indicates a multiplicity of subtypes within each family, and suggests new strategies for protein design.Fil: Marchi, Jacopo. Ecole Normale Supérieure; FranciaFil: Galpern, Ezequiel Alejandro. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Ciudad Universitaria. Instituto de Química Biológica de la Facultad de Ciencias Exactas y Naturales. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Instituto de Química Biológica de la Facultad de Ciencias Exactas y Naturales; ArgentinaFil: Espada, Rocio. PSL University; FranciaFil: Ferreiro, Diego. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Ciudad Universitaria. Instituto de Química Biológica de la Facultad de Ciencias Exactas y Naturales. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Instituto de Química Biológica de la Facultad de Ciencias Exactas y Naturales; ArgentinaFil: Walczak, Aleksandra M.. Ecole Normale Supérieure; FranciaFil: Mora, Thierry. Ecole Normale Supérieure; Franci

CONICET Digital

Hal-Diderot

A Model for Protein Sequence Evolution Based on Selective Pressure for Protein Stability: Application to Hemoglobins

Author: Marsh Lorraine
Publication venue: Libertas Academica
Publication date: 01/01/2009
Field of study

Negative selection against protein instability is a central influence on evolution of proteins. Protein stability is maintained over evolution despite changes in underlying sequences. An empirical all-site stability-based model of evolution was developed to focus on the selection of residues arising from their contributions to protein stability. In this model, site rates could vary. A structure-based method was used to predict stationary frequencies of hemoglobin residues based on their propensity to promote protein stability at a site. Sites with destabilizing residues were shown to change more rapidly in hemoglobins than sites with stabilizing residues. For diverse proteins the results were consistent with stability-based selection. Maximum likelihood studies with hemoglobins supported the stability-based model over simple Poisson-based methods. These observations are consistent with suggestions that purifying selection to maintain protein structural stability plays a dominant role in protein evolution

First complete mitochondrial genome of the South American annual fish Austrolebias charrua (Cyprinodontiformes: Rivulidae): peculiar features among cyprinodontiforms mitogenomes

Author: Graciela García
Hugo Naya
Natalia Rego
Verónica Gutiérrez
Publication venue: Springer Nature
Publication date: 01/01/2015
Field of study

Selected nucleotide substitution models after the third codon positions were removed from the codon alignments. (PDF 7 kb

Copenhagen University Research Information System

FigShare

Genome of the pitcher plant Cephalotus reveals genetic changes associated with carnivory

Author: Albert Victor A.
Alvarez-Ponce David
Cai Huimin
Carretero-Paulet Lorenzo
Chang Tien-Hao
Chen Cui
Fang Xiaodong
Farr Kimberly M.
Fujita Tomomichi
Fukushima Kenji
Hasebe Mitsuyasu
Hiwatashi Yuji
Hoshi Yoshikazu
Imai Takamasa
Kasahara Masahiro
Li Shuaicheng
Mao Likai
Mori Hitoshi
Nishiyama Tomoaki
Nozawa Masafumi
Pollard Stephen T.
Pollock David D.
Pálfalvi Gergő
Rozas Julio
Sankoff David
Sanz Pablo Librado
Shibata Tomoko F.
Shigenobu Shuji
Sumikawa Naomi
Sánchez-Gracia Alejandro
Uzawa Taketoshi
Xie Meiying
Zheng Chunfang
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2017
Field of study

Carnivorous plants exploit animals as a nutritional source and have inspired long-standing questions about the origin and evolution of carnivory-related traits. To investigate the molecular bases of carnivory, we sequenced the genome of the heterophyllous pitcher plant Cephalotus follicularis, in which we succeeded in regulating the developmental switch between carnivorous and non-carnivorous leaves. Transcriptome comparison of the two leaf types and gene repertoire analysis identified genetic changes associated with prey attraction, capture, digestion and nutrient absorption. Analysis of digestive fluid proteins from C. follicularis and three other carnivorous plants with independent carnivorous origins revealed repeated co-options of stress-responsive protein lineages coupled with convergent amino acid substitutions to acquire digestive physiology. These results imply constraints on the available routes to evolve plant carnivory

University of Nevada, Reno ScholarWorks Repository

Computational analyses of eukaryotic promoters

Author: AD Smith
AD Smith
AD Smith
AD Smith
BA Lewis
BC Foat
C Bock
CD Schmid
CE Lawrence
CE Lawrence
CT Workman
D Das
D Das
DJ Huebert
E Segal
EM Conlon
G Cavalli
GC Yuan
GZ Hertz
HJ Bussemaker
J Friedman
M Tompa
MC Thomas
Michael Q Zhang
MJ Buck
MJ Martinez
MQ Zhang
N Maeda
ND Heintzman
NI Gershenzon
P Carninci
P Gross
P Hong
P Sumazin
PJ Sabo
R Das
RA Rollins
RV Davuluri
S Keles
S Keles
S Sinha
S Sonnenburg
SR Schulze
T Hastie
TA Down
TH Kim
TH Kim
TL Bailey
U Ohler
V Matys
VB Bajic
VB Bajic
VB Bajic
VX Jin
WW Wasserman
X Zhao
Y Suzuki
Publication venue: BioMed Central
Publication date: 01/01/2007
Field of study

Computational analysis of eukaryotic promoters is one of the most difficult problems in computational genomics and is essential for understanding gene expression profiles and reverse-engineering gene regulation network circuits. Here I give a basic introduction of the problem and recent update on both experimental and computational approaches. More details may be found in the extended references. This review is based on a summer lecture given at Max Planck Institute at Berlin in 2005

Crossref

Cold Spring Harbor Laboratory Institutional Repository

University of Nevada, Las Vegas Repository

Analysis of Genomic Sequence Data Reveals the Origin and Evolutionary Separation of Hawaiian Hoary Bat Populations

Author: Adam Eyre-Walker
Allendorf
Altschul
Aronesty
Baird
Baird
Bankevich
Barclay
Belwood
Biollaz
Boitard
Bonaccorso
Bonaccorso
Bonaccorso
Bouckaert
Bouckaert
Bryant
Burns
Campana
Clague
Consortium GRD
Corinna A Pinzari
Cryan
DePristo
Donald K Price
Evanno
Excoffier
Fleischer
Frank J Bonaccorso
Frick
Gorresen
Hayes
Holland
Holt
Howarth
Jacobs
Kirch
Korf
Korstian
Lapierre
Lars S Jermiin
Lerner
Li
Lin Kang
Ma
Mian
Novaes
Novembre
Olson
Parker
Parra
Patterson
Pawel Michalak
Paxinos
Pinzari
Poe
Price
Price
Price
Pritchard
Pylant
Rawlinson
Russell
Salgueiro
Salinas-Ramos
Schwartz
Seim
Shen
Shultz
Simão
Sovic
Speer
Stanke
Turmelle
Vonhof
Weyeneth
Wilson
Wilson
Ziegler
Ziegler
Publication venue: Digital Scholarship@UNLV
Publication date: 27/08/2020
Field of study

We examine the genetic history and population status of Hawaiian hoary bats (Lasiurus semotus), the most isolated bats on Earth, and their relationship to northern hoary bats (Lasiurus cinereus), through whole-genome analysis of single-nucleotide polymorphisms mapped to a de novo-assembled reference genome. Profiles of genomic diversity and divergence indicate that Hawaiian hoary bats are distinct from northern hoary bats, and form a monophyletic group, indicating a single ancestral colonization event 1.34 Ma, followed by substantial divergence between islands beginning 0.51 Ma. Phylogenetic analysis indicates Maui is central to the radiation across the archipelago, with the southward expansion to Hawai‘i and westward to O‘ahu and Kaua‘i. Because this endangered species is of conservation concern, a clearer understanding of the population genetic structure of this bat in the Hawaiian Islands is of timely importance

Crossref

The Australian National University

Genomic Legacy of the African Cheetah, Acinonyx jubatus

Background Patterns of genetic and genomic variance are informative in inferring population history for human, model species and endangered populations. Results Here the genome sequence of wild-born African cheetahs reveals extreme genomic depletion in SNV incidence, SNV density, SNVs of coding genes, MHC class I and II genes, and mitochondrial DNA SNVs. Cheetah genomes are on average 95 % homozygous compared to the genomes of the outbred domestic cat (24.08 % homozygous), Virunga Mountain Gorilla (78.12 %), inbred Abyssinian cat (62.63 %), Tasmanian devil, domestic dog and other mammalian species. Demographic estimators impute two ancestral population bottlenecks: one \u3e100,000 years ago coincident with cheetah migrations out of the Americas and into Eurasia and Africa, and a second 11,084–12,589 years ago in Africa coincident with late Pleistocene large mammal extinctions. MHC class I gene loss and dramatic reduction in functional diversity of MHC genes would explain why cheetahs ablate skin graft rejection among unrelated individuals. Significant excess of non-synonymous mutations in AKAP4 (p\u3c0.02), a gene mediating spermatozoon development, indicates cheetah fixation of five function-damaging amino acid variants distinct from AKAP4 homologues of other Felidae or mammals; AKAP4 dysfunction may cause the cheetah’s extremely high (\u3e80 %) pleiomorphic sperm. Conclusions The study provides an unprecedented genomic perspective for the rare cheetah, with potential relevance to the species’ natural history, physiological adaptations and unique reproductive disposition

Harvard University - DASH

Copenhagen University Research Information System

ScholarWorks@UNIST

UPF Digital Repository

Digital.CSIC

NSU Works

Detecting coevolution without phylogenetic trees? Tree-ignorant metrics of coevolution perform as well as tree-aware metrics

Author: Caporaso J Gregory
Easton Brett C
Hunter Lawrence
Huttley Gavin A
Knight Rob
Smit Sandra
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Abstract Background Identifying coevolving positions in protein sequences has myriad applications, ranging from understanding and predicting the structure of single molecules to generating proteome-wide predictions of interactions. Algorithms for detecting coevolving positions can be classified into two categories: tree-aware, which incorporate knowledge of phylogeny, and tree-ignorant, which do not. Tree-ignorant methods are frequently orders of magnitude faster, but are widely held to be insufficiently accurate because of a confounding of shared ancestry with coevolution. We conjectured that by using a null distribution that appropriately controls for the shared-ancestry signal, tree-ignorant methods would exhibit equivalent statistical power to tree-aware methods. Using a novel t-test transformation of coevolution metrics, we systematically compared four tree-aware and five tree-ignorant coevolution algorithms, applying them to myoglobin and myosin. We further considered the influence of sequence recoding using reduced-state amino acid alphabets, a common tactic employed in coevolutionary analyses to improve both statistical and computational performance. Results Consistent with our conjecture, the transformed tree-ignorant metrics (particularly Mutual Information) often outperformed the tree-aware metrics. Our examination of the effect of recoding suggested that charge-based alphabets were generally superior for identifying the stabilizing interactions in alpha helices. Performance was not always improved by recoding however, indicating that the choice of alphabet is critical. Conclusion The results suggest that t-test transformation of tree-ignorant metrics can be sufficient to control for patterns arising from shared ancestry.</p

Crossref