Search CORE

4 research outputs found

PDBench: Evaluating Computational Methods for Protein Sequence Design

Author: Castorina Leonardo V.
Petrenas Rokas
Subr Kartic
Wood Christopher W.
Publication venue
Publication date: 16/09/2021
Field of study

PDBench: Evaluating Computational Methods for Protein Sequence Design

Author: Castorina Leonardo V.
Petrenas Rokas
Subr Kartic
Wood Christopher W.
Publication venue
Publication date: 16/09/2021
Field of study

Proteins perform critical processes in all living systems: converting solar energy into chemical energy, replicating DNA, as the basis of highly performant materials, sensing and much more. While an incredible range of functionality has been sampled in nature, it accounts for a tiny fraction of the possible protein universe. If we could tap into this pool of unexplored protein structures, we could search for novel proteins with useful properties that we could apply to tackle the environmental and medical challenges facing humanity. This is the purpose of protein design. Sequence design is an important aspect of protein design, and many successful methods to do this have been developed. Recently, deep-learning methods that frame it as a classification problem have emerged as a powerful approach. Beyond their reported improvement in performance, their primary advantage over physics-based methods is that the computational burden is shifted from the user to the developers, thereby increasing accessibility to the design method. Despite this trend, the tools for assessment and comparison of such models remain quite generic. The goal of this paper is to both address the timely problem of evaluation and to shine a spotlight, within the Machine Learning community, on specific assessment criteria that will accelerate impact. We present a carefully curated benchmark set of proteins and propose a number of standard tests to assess the performance of deep learning based methods. Our robust benchmark provides biological insight into the behaviour of design methods, which is essential for evaluating their performance and utility. We compare five existing models with two novel models for sequence prediction. Finally, we test the designs produced by these models with AlphaFold2, a state-of-the-art structure-prediction algorithm, to determine if they are likely to fold into the intended 3D shapes.Comment: 9 pages, 5 figure

arXiv.org e-Print Archive

Edinburgh Research Explorer

CC+: A Searchable Database of Validated Coiled Coils in PDB Structures and AlphaFold2 Models

Author: Dawson Will M
Kumar Prasun
Levy Emmanuel D
Petrenas Rokas
Schweke Hugo
Woolfson Dek N
Publication venue
Publication date: 01/01/2023
Field of study

α‐Helical coiled coils are common tertiary and quaternary elements of protein structure. In coiled coils, two or more α helices wrap around each other to form bundles. This apparently simple structural motif can generate many architectures and topologies. Coiled coil‐forming sequences can be predicted from heptad repeats of hydrophobic and polar residues, hpphppp , although this is not always reliable. Alternatively, coiled‐coil structures can be identified using the program SOCKET, which finds knobs‐into‐holes (KIH) packing between side chains of neighboring helices. SOCKET also classifies coiled‐coil architecture and topology, thus allowing sequence‐to‐structure relationships to be garnered. In 2009, we used SOCKET to create a relational database of coiled‐coil structures, CC + , from the RCSB Protein Data Bank (PDB). Here, we report an update of CC + following an update of SOCKET (to Socket2) and the recent explosion of structural data and the success of AlphaFold2 in predicting protein structures from genome sequences. With the most‐stringent SOCKET parameters, CC + contains ≈12,000 coiled‐coil assemblies from experimentally determined structures, and ≈120,000 potential coiled‐coil structures within single‐chain models predicted by AlphaFold2 across 48 proteomes. CC + allows these and other less‐stringently defined coiled coils to be searched at various levels of structure, sequence, and side‐chain interactions. The identified coiled coils can be viewed directly from CC + using the Socket2 application, and their associated data can be downloaded for further analyses. CC + is available freely at http://coiledcoils.chm.bris.ac.uk/CCPlus/Home.html . It will be updated automatically. We envisage that CC+ could be used to understand coiled‐coil assemblies and their sequence‐to‐structure relationships, and to aid protein design and engineering.</p

Explore Bristol Research

Archive ouverte UNIGE

Rationally seeded computational protein design of ɑ-helical barrels

Author: Albanese Katherine I
Borucu Ufuk
Dawson William M
Leggett Graham J.
Naudin Elise A
Oliver Thomas A A
Petrenas Rokas
Pirro Fabio
Scott D A
Weiner Orion
Woolfson Dek N
Publication venue
Publication date: 20/06/2024
Field of study

Computational protein design is advancing rapidly. Here we describe efficient routes starting from validated parallel and antiparallel peptide assemblies to design two families of α-helical barrel proteins with central channels that bind small molecules. Computational designs are seeded by the sequences and structures of defined de novo oligomeric barrel-forming peptides, and adjacent helices are connected by loop building. For targets with antiparallel helices, short loops are sufficient. However, targets with parallel helices require longer connectors; namely, an outer layer of helix–turn–helix–turn–helix motifs that are packed onto the barrels. Throughout these computational pipelines, residues that define open states of the barrels are maintained. This minimizes sequence sampling, accelerating the design process. For each of six targets, just two to six synthetic genes are made for expression in Escherichia coli. On average, 70% of these genes express to give soluble monomeric proteins that are fully characterized, including high-resolution structures for most targets that match the design models with high accuracy

Explore Bristol Research