PEER: A Comprehensive and Multi-Task Benchmark for Protein Sequence
  Understanding

Liu, Runcheng; Lu, Jiarui; Ma, Chang; Tang, Jian; Xu, Minghao; Zhang, Yangtian; Zhang, Zuobai; Zhu, Zhaocheng

PEER: A Comprehensive and Multi-Task Benchmark for Protein Sequence Understanding

Authors: Runcheng Liu
Jiarui Lu
Chang Ma
Jian Tang
Minghao Xu
Yangtian Zhang
Zuobai Zhang
Zhaocheng Zhu
Publication date: 19 September 2022
Publisher

Abstract

We are now witnessing significant progress of deep learning methods in a variety of tasks (or datasets) of proteins. However, there is a lack of a standard benchmark to evaluate the performance of different methods, which hinders the progress of deep learning in this field. In this paper, we propose such a benchmark called PEER, a comprehensive and multi-task benchmark for Protein sEquence undERstanding. PEER provides a set of diverse protein understanding tasks including protein function prediction, protein localization prediction, protein structure prediction, protein-protein interaction prediction, and protein-ligand interaction prediction. We evaluate different types of sequence-based methods for each task including traditional feature engineering approaches, different sequence encoding methods as well as large-scale pre-trained protein language models. In addition, we also investigate the performance of these methods under the multi-task learning setting. Experimental results show that large-scale pre-trained protein language models achieve the best performance for most individual tasks, and jointly training multiple tasks further boosts the performance. The datasets and source codes of this benchmark are all available at https://github.com/DeepGraphLearning/PEER_BenchmarkComment: Accepted by NeurIPS 2022 Dataset and Benchmark Track. arXiv v2: source code released; arXiv v1: release all benchmark result

Similar works

Full text

Available Versions

arXiv.org e-Print Archive

oai:arXiv.org:2206.02096

Last time updated on 16/08/2022