The M5nr: a novel non-redundant database containing protein sequences and annotations from multiple sources and associated tools

Wilke Andreas; Harrison Travis; Wilkening Jared; Field Dawn; Glass Elizabeth M; Kyrpides Nikos; Mavrommatis Konstantinos; Meyer Folker

oai:doaj.org/article:d3751400d4a545e08bc25c41b5a867d7

The M5nr: a novel non-redundant database containing protein sequences and annotations from multiple sources and associated tools

Authors: Wilke Andreas
Harrison Travis
Wilkening Jared
Field Dawn
Glass Elizabeth M
Kyrpides Nikos
Mavrommatis Konstantinos
Meyer Folker
Publication date: 1 June 2012
Publisher: 'Springer Science and Business Media LLC'
Doi

Abstract

Abstract Background Computing of sequence similarity results is becoming a limiting factor in metagenome analysis. Sequence similarity search results encoded in an open, exchangeable format have the potential to limit the needs for computational reanalysis of these data sets. A prerequisite for sharing of similarity results is a common reference. Description We introduce a mechanism for automatically maintaining a comprehensive, non-redundant protein database and for creating a quarterly release of this resource. In addition, we present tools for translating similarity searches into many annotation namespaces, e.g. KEGG or NCBI's GenBank. Conclusions The data and tools we present allow the creation of multiple result sets using a single computation, permitting computational results to be shared between groups for large sequence data sets.</p

Similar works

Full text

Directory of Open Access Journals

oai:doaj.org/article:d3751400d...

Last time updated on 17/12/2014

This paper was published in Directory of Open Access Journals.

Having an issue?

Is data on this page outdated, violates copyrights or anything else? Report the problem now and we will take corresponding actions after reviewing your request.