Search CORE

14 research outputs found

Archaea: The First Domain of Diversified Life

Author: Arshan Nasir
Derek Caetano-Anollés
Feng-Jie Sun
Gustavo Caetano-Anollés
Jay E. Mittenthal
Kaiyue Zhou
Kyung Mo Kim
Publication venue: 'Hindawi Limited'
Publication date: 01/01/2014
Field of study

Inverting the model of genomics data sharing with the NHGRI Genomic Data Science Analysis, Visualization, and Informatics Lab-space

Author: Abel Haley J
Afgan Enis
Baker Dannon
Banasiewicz M Katie
Banks Eric
Baumann Alexander
Baumann Michael
Bernard Clare
Blauvelt Lon
Cabansay Louise
Caetano-Anollés Derek
Canas Justin
Carey Vincent J
Carroll Robert J
Chaluvadi Sushma
Chilton John
Clements Dave
Cox Katherine EL
Culotti Alessandro
Di Francesco Valentina
Disman William
Ellrott Kyle
Geistlinger Ludwig
Ghanaim Elena M
Goecks Jeremy
Golitsynskiy Sergey
Grossman Robert L
Gupta Namrata
Hajian Allie
Hall Ira M
Hannafious Brian
Hansen Kasper D
Harris Tim
Hastie Mim
Herman Kate
Hutter Carolyn
Jalili Vahid
Kammers Kai
Kiernan Elizabeth
Kovalsy Anton
Kucher Nataliya
Lawson Jonathan
Leek Jeffrey T
Lucas Julian
Luria Anne O’Donnell
Mahmoud Alexandru
McDade Frances
Morgan Martin
Mosher Stephen
Munshi Ruchi
Nekrutenko Anton
Oh Sehyun
Osborn Kevin
Ostrovsky Alexander
Overbeck Charles
O’Connor Brian D
O’Farrell Ash
Paten Benedict
Patterson Candace
Philippakis Anthony A
Ramos Marcel
Reddy Radhika
Reeves Valerie
Reid Charles
Rogers Dave
Rula Andrew
s Yuen Deni
Sargent Luke
Schatz Michael C
Sen Shurjo K
Sheets Elizabeth A
Shepherd Lori
Simeon Marianie
Steinberg David Charles
Stevens Ana
Stubbs BJ
Suderman Keith
Tan Frederick J
Taylor Casey Overby
Taylor M Morgan
Thomas Salin
Title Robert
Torstenson Eric
Turaga Nitesh
Van der Auwera Geraldine A
Vessio Jennifer
Vizzier Benton A
Vosburg Trish
Waldron Levi
Walker Jason
Walsh Brian
Wang Qi
Wang Ting
Warren Noah
Wellington Christopher
Wheelan Sarah J
Wiley Ken L
Wuichet Kristin
Yuksel Kaan
Zarate Samantha
Publication venue: 'Elsevier BV'
Publication date: 12/01/2022
Field of study

The NHGRI Genomic Data Science Analysis, Visualization, and Informatics Lab-space (AnVIL; https://anvilproject.org) was developed to address a widespread community need for a unified computing environment for genomics data storage, management, and analysis. In this perspective, we present AnVIL, describe its ecosystem and interoperability with other platforms, and highlight how this platform and associated initiatives contribute to improved genomic data sharing efforts. The AnVIL is a federated cloud platform designed to manage and store genomics and related data, enable population-scale analysis, and facilitate collaboration through the sharing of data, code, and analysis results. By inverting the traditional model of data sharing, the AnVIL eliminates the need for data movement while also adding security measures for active threat detection and monitoring and provides scalable, shared computing resources for any researcher. We describe the core data management and analysis components of the AnVIL, which currently consists of Terra, Gen3, Galaxy, RStudio/Bioconductor, Dockstore, and Jupyter, and describe several flagship genomics datasets available within the AnVIL. We continue to extend and innovate the AnVIL ecosystem by implementing new capabilities, including mechanisms for interoperability and responsible data sharing, while streamlining access management. The AnVIL opens many new opportunities for analysis, collaboration, and data sharing that are needed to drive research and to make discoveries through the joint analysis of hundreds of thousands to millions of genomes along with associated clinical and molecular data types

Cold Spring Harbor Laboratory Institutional Repository

Piecemeal Buildup of the Genetic Code, Ribosomes, and Genomes from Primordial tRNA Building Blocks

Author: Derek Caetano-Anollés
Gustavo Caetano-Anollés
Publication venue: 'MDPI AG'
Publication date: 01/12/2016
Field of study

The origin of biomolecular machinery likely centered around an ancient and central molecule capable of interacting with emergent macromolecular complexity. tRNA is the oldest and most central nucleic acid molecule of the cell. Its co-evolutionary interactions with aminoacyl-tRNA synthetase protein enzymes define the specificities of the genetic code and those with the ribosome their accurate biosynthetic interpretation. Phylogenetic approaches that focus on molecular structure allow reconstruction of evolutionary timelines that describe the history of RNA and protein structural domains. Here we review phylogenomic analyses that reconstruct the early history of the synthetase enzymes and the ribosome, their interactions with RNA, and the inception of amino acid charging and codon specificities in tRNA that are responsible for the genetic code. We also trace the age of domains and tRNA onto ancient tRNA homologies that were recently identified in rRNA. Our findings reveal a timeline of recruitment of tRNA building blocks for the formation of a functional ribosome, which holds both the biocatalytic functions of protein biosynthesis and the ability to store genetic memory in primordial RNA genomic templates

Multidisciplinary Digital Publishing Institute

Directory of Open Access Journals

PubMed Central

An Evolutionarily Structured Universe of Protein Architecture

Author: Caetano-Anollés Derek
Caetano-Anollés Gustavo
Publication venue: Cold Spring Harbor Laboratory Press
Publication date: 01/01/2003
Field of study

Protein structural diversity encompasses a finite set of architectural designs. Embedded in these topologies are evolutionary histories that we here uncover using cladistic principles and measurements of protein-fold usage and sharing. The reconstructed phylogenies are inherently rooted and depict histories of protein and proteome diversification. Proteome phylogenies showed two monophyletic sister-groups delimiting Bacteria and Archaea, and a topology rooted in Eucarya. This suggests three dramatic evolutionary events and a common ancestor with a eukaryotic-like, gene-rich, and relatively modern organization. Conversely, a general phylogeny of protein architectures showed that structural classes of globular proteins appeared early in evolution and in defined order, the α/β class being the first. Although most ancestral folds shared a common architecture of barrels or interleaved β-sheets and α-helices, many were clearly derived, such as polyhedral folds in the all-α class and β-sandwiches, β-propellers, and β-prisms in all-β proteins. We also describe transformation pathways of architectures that are prevalently used in nature. For example, β-barrels with increased curl and stagger were favored evolutionary outcomes in the all-β class. Interestingly, we found cases where structural change followed the α-to-β tendency uncovered in the tree of architectures. Lastly, we traced the total number of enzymatic functions associated with folds in the trees and show that there is a general link between structure and enzymatic function

CiteSeerX

Crossref

PubMed Central

Computing the origin and evolution of the ribosome from its structure — Uncovering processes of macromolecular accretion benefiting synthetic biology

Author: Derek Caetano-Anollés
Gustavo Caetano-Anollés
Publication venue: Elsevier
Publication date: 01/01/2015
Field of study

Accretion occurs pervasively in nature at widely different timeframes. The process also manifests in the evolution of macromolecules. Here we review recent computational and structural biology studies of evolutionary accretion that make use of the ideographic (historical, retrodictive) and nomothetic (universal, predictive) scientific frameworks. Computational studies uncover explicit timelines of accretion of structural parts in molecular repertoires and molecules. Phylogenetic trees of protein structural domains and proteomes and their molecular functions were built from a genomic census of millions of encoded proteins and associated terminal Gene Ontology terms. Trees reveal a ‘metabolic-first’ origin of proteins, the late development of translation, and a patchwork distribution of proteins in biological networks mediated by molecular recruitment. Similarly, the natural history of ancient RNA molecules inferred from trees of molecular substructures built from a census of molecular features shows patchwork-like accretion patterns. Ideographic analyses of ribosomal history uncover the early appearance of structures supporting mRNA decoding and tRNA translocation, the coevolution of ribosomal proteins and RNA, and a first evolutionary transition that brings ribosomal subunits together into a processive protein biosynthetic complex. Nomothetic structural biology studies of tertiary interactions and ancient insertions in rRNA complement these findings, once concentric layering assumptions are removed. Patterns of coaxial helical stacking reveal a frustrated dynamics of outward and inward ribosomal growth possibly mediated by structural grafting. The early rise of the ribosomal ‘turnstile’ suggests an evolutionary transition in natural biological computation. Results make explicit the need to understand processes of molecular growth and information transfer of macromolecules

Elsevier - Publisher Connector

Directory of Open Access Journals

Rooting Phylogenies and the Tree of Life While Minimizing Ad Hoc and Auxiliary Assumptions

Author: Arshan Nasir
Derek Caetano-Anollés
Gustavo Caetano-Anollés
Kyung Mo Kim
Publication venue: 'SAGE Publications'
Publication date: 01/10/2018
Field of study

Phylogenetic methods unearth evolutionary history when supported by three starting points of reason: (1) the continuity axiom begs the existence of a “model” of evolutionary change, (2) the singularity axiom defines the historical ground plan (phylogeny) in which biological entities (taxa) evolve, and (3) the memory axiom demands identification of biological attributes (characters) with historical information. Axiom consequences are interlinked, making the retrodiction enterprise an endeavor of reciprocal fulfillment. In particular, establishing direction of evolutionary change (character polarization) roots phylogenies and enables testing the existence of historical memory (homology). Unfortunately, rooting phylogenies, especially the “tree of life,” generally follow narratives instead of integrating empirical and theoretical knowledge of retrodictive exploration. This stems mostly from a focus on molecular sequence analysis and uncertainties about rooting methods. Here, we review available rooting criteria, highlighting the need to minimize both ad hoc and auxiliary assumptions, especially argumentative ad hocness. We show that while the outgroup comparison method has been widely adopted, the generality criterion of nesting and additive phylogenetic change embodied in Weston rule offers the most powerful rooting approach. We also propose a change of focus, from phylogenies that describe the evolution of biological systems to those that describe the evolution of parts of those systems. This weakens violation of character independence, helps formalize the generality criterion of rooting, and provides new ways to study the problem of evolution

Directory of Open Access Journals

MPG.PuRe

Evolutionary heat maps describing the amino acid and dipeptide compositions of FF domain structures of different age.

Author: Derek Caetano-Anollés (5656000)
Gustavo Caetano-Anollés (65405)
Minglei Wang (102075)
Publication venue
Publication date
Field of study

A. Frequency of amino acids in FFs. The color array of 29,480 cells (1,475 rows×20 columns) describes the amino acid composition of 1,475 FFs along the evolutionary timeline. Columns represent the 20 standard amino acids ordered (from left to right) according to average amino acid frequency and rows represent FFs ordered (from top to bottom) according to domain age (ndFF = 0 ∼ 1). B. Frequency of dipeptides in FFs. The color array of 589,600 cells (1,475 rows×400 columns) describes the 400-dipeptide composition of FFs along the timeline. Columns represent dipeptide types ordered (from left to right) according to average frequency (from LL to WW) and rows represent FFs ordered according to age. The heat maps confirm the existence of non-random patterns of amino acid and dipeptide compositions along the evolutionary timeline of FFs and reveal unique signatures of amino acid and dipeptide use in FFs. Amino acids are described with single-letter codes.</p

FigShare

Reductive evolution of architectural repertoires in proteomes and the birth of the tripartite world

Author: Caetano-Anollés Derek
Caetano-Anollés Gustavo
Mittenthal Jay E.
Wang Minglei
Yafremava Liudmila S.
Publication venue: Cold Spring Harbor Laboratory Press
Publication date
Field of study

The repertoire of protein architectures in proteomes is evolutionarily conserved and capable of preserving an accurate record of genomic history. Here we use a census of protein architecture in 185 genomes that have been fully sequenced to generate genome-based phylogenies that describe the evolution of the protein world at fold (F) and fold superfamily (FSF) levels. The patterns of representation of F and FSF architectures over evolutionary history suggest three epochs in the evolution of the protein world: (1) architectural diversification, where members of an architecturally rich ancestral community diversified their protein repertoire; (2) superkingdom specification, where superkingdoms Archaea, Bacteria, and Eukarya were specified; and (3) organismal diversification, where F and FSF specific to relatively small sets of organisms appeared as the result of diversification of organismal lineages. Functional annotation of FSF along these architectural chronologies revealed patterns of discovery of biological function. Most importantly, the analysis identified an early and extensive differential loss of architectures occurring primarily in Archaea that segregates the archaeal lineage from the ancient community of organisms and establishes the first organismal divide. Reconstruction of phylogenomic trees of proteomes reflects the timeline of architectural diversification in the emerging lineages. Thus, Archaea undertook a minimalist strategy using only a small subset of the full architectural repertoire and then crystallized into a diversified superkingdom late in evolution. Our analysis also suggests a communal ancestor to all life that was molecularly complex and adopted genomic strategies currently present in Eukarya

Crossref

PubMed Central