Search CORE

34 research outputs found

Contrastive learning on protein embeddings enlightens midnight zone

Author: Bordin Nicola
Heinzinger Michael
Littmann Maria
Orengo Christine
Rost Burkhard
Sillitoe Ian
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/06/2022
Field of study

Experimental structures are leveraged through multiple sequence alignments, or more generally through homology-based inference (HBI), facilitating the transfer of information from a protein with known annotation to a query without any annotation. A recent alternative expands the concept of HBI from sequence-distance lookup to embedding-based annotation transfer (EAT). These embeddings are derived from protein Language Models (pLMs). Here, we introduce using single protein representations from pLMs for contrastive learning. This learning procedure creates a new set of embeddings that optimizes constraints captured by hierarchical classifications of protein 3D structures defined by the CATH resource. The approach, dubbed ProtTucker, has an improved ability to recognize distant homologous relationships than more traditional techniques such as threading or fold recognition. Thus, these embeddings have allowed sequence comparison to step into the 'midnight zone' of protein similarity, i.e. the region in which distantly related sequences have a seemingly random pairwise sequence similarity. The novelty of this work is in the particular combination of tools and sampling techniques that ascertained good performance comparable or better to existing state-of-the-art sequence comparison methods. Additionally, since this method does not need to generate alignments it is also orders of magnitudes faster. The code is available at https://github.com/Rostlab/EAT

UCL Discovery

PubMed Central

Recommended from our members

FunFam protein families improve residue level molecular function prediction

Author: Littmann Maria
Orengo Christine
Rost Burkhard
Scheibenreif Linus
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2019
Field of study

Background The CATH database provides a hierarchical classification of protein domain structures including a sub-classification of superfamilies into functional families (FunFams). We analyzed the similarity of binding site annotations in these FunFams and incorporated FunFams into the prediction of protein binding residues. Results FunFam members agreed, on average, in 36.9 ± 0.6% of their binding residue annotations. This constituted a 6.7-fold increase over randomly grouped proteins and a 1.2-fold increase (1.1-fold on the same dataset) over proteins with the same enzymatic function (identical Enzyme Commission, EC, number). Mapping de novo binding residue prediction methods (BindPredict-CCS, BindPredict-CC) onto FunFam resulted in consensus predictions for those residues that were aligned and predicted alike (binding/non-binding) within a FunFam. This simple consensus increased the F1-score (for binding) 1.5-fold over the original prediction method. Variation of the threshold for how many proteins in the consensus prediction had to agree provided a convenient control of accuracy/precision and coverage/recall, e.g. reaching a precision as high as 60.8 ± 0.4% for a stringent threshold. Conclusions The FunFams outperformed even the carefully curated EC numbers in terms of agreement of binding site residues. Additionally, we assume that our proof-of-principle through the prediction of protein binding residues will be relevant for many other solutions profiting from FunFams to infer functional information at the residue level

Columbia University Academic Commons

Novel machine learning approaches revolutionize protein knowledge

Author: Bordin Nicola
Dallago Christian
Heinzinger Michael
Kim Stephanie
Littmann Maria
Orengo Christine
Rauer Clemens
Rost Burkhard
Steinegger Martin
Publication venue: 'Elsevier BV'
Publication date: 08/12/2022
Field of study

Breakthrough methods in machine learning (ML), protein structure prediction, and novel ultrafast structural aligners are revolutionizing structural biology. Obtaining accurate models of proteins and annotating their functions on a large scale is no longer limited by time and resources. The most recent method to be top ranked by the Critical Appraisal Skills Program (CASP) assessment, AlphaFold 2 (AF2), is capable of building structural models with an accuracy comparable to that of experimental structures. Annotations of 3D models are keeping pace with the deposition of the structures due to advancements in protein language models (pLMs) and structural aligners that help validate these transferred annotations. In this review we describe how recent developments in ML for protein science are making large-scale structural bioinformatics available to the general scientific community

UCL Discovery

Novel machine learning approaches revolutionize protein knowledge

Author: Bordin Nicola
Dallago Christian
Heinzinger Michael
Kim Stephanie
Littmann Maria
Orengo Christine
Rauer Clemens
Rost Burkhard
Steinegger Martin
Publication venue: 'Elsevier BV'
Publication date: 01/04/2023
Field of study

Breakthrough methods in machine learning (ML), protein structure prediction, and novel ultrafast structural aligners are revolutionizing structural biology. Obtaining accurate models of proteins and annotating their functions on a large scale is no longer limited by time and resources. The most recent method to be top ranked by the Critical Assessment of Structure Prediction (CASP) assessment, AlphaFold 2 (AF2), is capable of building structural models with an accuracy comparable to that of experimental structures. Annotations of 3D models are keeping pace with the deposition of the structures due to advancements in protein language models (pLMs) and structural aligners that help validate these transferred annotations. In this review we describe how recent developments in ML for protein science are making large-scale structural bioinformatics available to the general scientific communit

Biblos-e Archivo

Recommended from our members

Correction to: Detailed prediction of protein sub-nuclear localization

Author: Bodén Mikael
Goldberg Tatyana
Littmann Maria
Rost Burkhard
Seitz Sebastian
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2019
Field of study

Following publication of the original article [1], the author reported that an incorrect figure has been published as Figure 2. The correct Figure 2 is shown below

Columbia University Academic Commons

Recommended from our members

Detailed prediction of protein sub-nuclear localization

Author: Bodén Mikael
Goldberg Tatyana
Littmann Maria
Rost Burkhard
Seitz Sebastian
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2019
Field of study

Background Sub-nuclear structures or locations are associated with various nuclear processes. Proteins localized in these substructures are important to understand the interior nuclear mechanisms. Despite advances in high-throughput methods, experimental protein annotations remain limited. Predictions of cellular compartments have become very accurate, largely at the expense of leaving out substructures inside the nucleus making a fine-grained analysis impossible. Results Here, we present a new method (LocNuclei) that predicts nuclear substructures from sequence alone. LocNuclei used a string-based Profile Kernel with Support Vector Machines (SVMs). It distinguishes sub-nuclear localization in 13 distinct substructures and distinguishes between nuclear proteins confined to the nucleus and those that are also native to other compartments (traveler proteins). High performance was achieved by implicitly leveraging a large biological knowledge-base in creating predictions by homology-based inference through BLAST. Using this approach, the performance reached AUC = 0.70–0.74 and Q13 = 59–65%. Travelling proteins (nucleus and other) were identified at Q2 = 70–74%. A Gene Ontology (GO) analysis of the enrichment of biological processes revealed that the predicted sub-nuclear compartments matched the expected functionality. Analysis of protein-protein interactions (PPI) show that formation of compartments and functionality of proteins in these compartments highly rely on interactions between proteins. This suggested that the LocNuclei predictions carry important information about function. The source code and data sets are available through GitHub: https://github.com/Rostlab/LocNuclei . Conclusions LocNuclei predicts subnuclear compartments and traveler proteins accurately. These predictions carry important information about functionality and PPIs

Columbia University Academic Commons

CATHe: Detection of remote homologues for CATH superfamilies using embeddings from protein language models

Author: Bordin Nicola
Heinzinger Michael
Littmann Maria
Nallapareddy Vamsi
Orengo Christine
Rost Burkhard
Sen Neeladri
Sillitoe Ian
Waman Vaishali P
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/01/2023
Field of study

MOTIVATION: CATH is a protein domain classification resource that exploits an automated workflow of structure and sequence comparison alongside expert manual curation to construct a hierarchical classification of evolutionary and structural relationships. The aim of this study was to develop algorithms for detecting remote homologues missed by state-of-the-art HMM-based approaches. The method developed (CATHe) combines a neural network with sequence representations obtained from protein Language Models. It was assessed using a dataset of remote homologues having less than 20% sequence identity to any domain in the training set. RESULTS: The CATHe models trained on 1773 largest and 50 largest CATH superfamilies had an accuracy of 85.6 ± 0.4%, and 98.2 ± 0.3% respectively. As a further test of the power of CATHe to detect more remote homologues missed by HMMs derived from CATH domains, we used a dataset consisting of protein domains that had annotations in Pfam, but not in CATH. By using highly reliable CATHe predictions (expected error rate <0.5%), we were able to provide CATH annotations for 4.62 million Pfam domains. For a subset of these domains from Homo sapiens, we structurally validated 90.86% of the predictions by comparing their corresponding AlphaFold 2 structures with structures from the CATH superfamilies to which they were assigned. AVAILABILITY AND IMPLEMENTATION: The code for the developed models can be found on https://github.com/vam-sin/CATHe. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online

UCL Discovery

PubMed Central

AlphaFold2 reveals commonalities and novelties in protein structure space for 21 model organisms

Author: Bordin Nicola
Heinzinger Michael
Kim Stephanie
Lam Su Datt
Littmann Maria
Nallapareddy Vamsi
Orengo Christine
Rauer Clemens
Rost Burkhard
Sen Neeladri
Sillitoe Ian
Steinegger Martin
Velankar Sameer
Waman Vaishali P
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2023
Field of study

Deep-learning (DL) methods like DeepMind's AlphaFold2 (AF2) have led to substantial improvements in protein structure prediction. We analyse confident AF2 models from 21 model organisms using a new classification protocol (CATH-Assign) which exploits novel DL methods for structural comparison and classification. Of ~370,000 confident models, 92% can be assigned to 3253 superfamilies in our CATH domain superfamily classification. The remaining cluster into 2367 putative novel superfamilies. Detailed manual analysis on 618 of these, having at least one human relative, reveal extremely remote homologies and further unusual features. Only 25 novel superfamilies could be confirmed. Although most models map to existing superfamilies, AF2 domains expand CATH by 67% and increases the number of unique 'global' folds by 36% and will provide valuable insights on structure function relationships. CATH-Assign will harness the huge expansion in structural data provided by DeepMind to rationalise evolutionary changes driving functional divergence

UCL Discovery

Ausweitung des Sojaanbaus in Deutschland durch züchterische Anpassung sowie pflanzenbauliche und verarbeitungstechnische Optimierung

Die Arbeiten im Sojaforschungsprojekt waren erfolgreich und konnten wichtige Impulse für die Ausweitung des Sojaanbaus in Deutschland geben. So sind die entwickelten Stämme und Kreuzungsnachkommen eine Basis für den Aufbau einer eigenständigen deutschen Sojazüchtung. Die Sorten Korus und Protibus erwiesen sich als besonders geeignet für die Tofuherstellung. Die im Projekt entwickelte Labortofurei ist ein Züchtungsinstrument zur Identifikation vielversprechender Genotypen, mit dem auch die weitere Entwicklung frühreifer Tofusojasorten unterstützt werden kann. In Gefäßversuchen konnte gezeigt werden, dass die Reaktion auf Kühlestress während der Hülsenansatzphase zwischen den Sorten variiert und es tolerante, kompensierende und sensitive Sorten gibt. Die praktische Selektion auf Kältetoleranz war erfolgreich und für die Selektion auf Unkrauttoleranz konnte ein System etabliert werden. Bis auf das Präparat Radicin können die vorhandenen kommerziellen Bradyrhizobienpräparate für den Praxiseinsatz empfohlen werden. Die Hypothese, dass die Selektion des Symbiosepartners auf Kühletoleranz lohnenswert ist, wurde bestätigt. Bei der Sortenprüfung in ganz Deutschland zeigte sich, dass die Anbauwürdigkeit von Soja gut und nur an wenigen der geprüften Standorte nicht gegeben war. Die 00-Sorte ES-Mentor lieferte insgesamt die höchsten Relativerträge sowie den höchsten Rohproteinertrag, bei den 000-Sorten schnitt Sultana besonders gut ab. Eine Variation der Saatzeit sowie verschiedene Verfrühungstechniken erweisen sich nicht als ertragsrelevant. Beim Erfolg der Unkrautregulierung mit Torsionshacke, Fingerhacke und Flachhäufler gab es keine Unterschiede. Im Dammanbau lassen sich Sojabohnen mit gutem Unkrautregulierungserfolg kultivieren. Bei der Sojaaufbereitung sollte eine unnötig hohe Erhitzung der Bohnen bei der Aufbereitung vermieden werden, da durch die Erhitzung neben der Trypsininhibitoraktivität auch Eiweißverdaulichkeit reduziert werden. Mit ausschließlich indirekter, länger einwirkender, trockener Wärme (z. B. Biogasabwärme), ist es schwierig, gute Aufbereitungsqualitäten zu erzielen. Der Wissenstransfer mit Feldtagen und Website www.sojainfo.de war wichtig und erfolgreich zur Steigerung des Interesses am heimischen Sojaanbau

Organic Eprints

Beschluss der STIKO zur 7. Aktualisierung der COVID-19-Impfempfehlung und die dazugehörige wissenschaftliche Begründung

Author: Bogdan Christian
Heininger Ulrich
Koch Judith
Littmann Martina
Meerpohl Joerg
Mertens Thomas
Meyer Heidi
Schmid-Küpke Nora
Scholz Stefan
Steffen Annika
Terhardt Martin
van der Sande Marianne
von Kries Rüdiger
Vygen-Bonnet Sabine
Waize Maria
Wichmann Ole
Wicker Sabine
Wiedermann Ursula
Wild Verina
Überla Klaus
Publication venue: Robert Koch-Institut
Publication date: 24/06/2021
Field of study

Die STIKO empfiehlt die Impfung gegen COVID-19 mit einem der beiden zugelassenen mRNA-Impfstoffe (Comirnaty von BioNTech/Pfizer, COVID-19-Vaccine von Moderna) oder einer der bei-den zugelassenen Vektor-basierten Impfstoffe (Vax¬zevria von AstraZeneca, COVID-19 Vaccine Janssen von Janssen-Cilag International). Die Impfung gegen COVID-19 soll allen Personen ab dem Alter von 18 Jahren angeboten werden. Aufgrund des Fortschritts in der Impfkampagne und zunehmender Verfügbarkeit von COVID-19-Impfstoffen ist ein stufenweises Vorgehen (Priorisierungsempfehlung) auf nationaler Ebene nicht mehr notwendig.Peer Reviewe

Publikationsserver des Robert Koch-Instituts