Search CORE

3 research outputs found

A Gene Ontology Tutorial in Python.

Author: Dessimoz C.
Vesztrocy A.W.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2017
Field of study

This chapter is a tutorial on using Gene Ontology resources in the Python programming language. This entails querying the Gene Ontology graph, retrieving Gene Ontology annotations, performing gene enrichment analyses, and computing basic semantic similarity between GO terms. An interactive version of the tutorial, including solutions, is available at http://gohandbook.org

Serveur académique lausannois

UCL Discovery

Expanding the Orthologous Matrix (OMA) programmatic interfaces: REST API and the OmaDB packages for R and Python.

Author: Altenhoff A.
Dessimoz C.
Kaleb K.
Vesztrocy A.W.
Publication venue: 'F1000 Research Ltd'
Publication date: 01/01/2019
Field of study

The Orthologous Matrix (OMA) is a well-established resource to identify orthologs among many genomes. Here, we present two recent additions to its programmatic interface, namely a REST API, and user-friendly R and Python packages called OmaDB. These should further facilitate the incorporation of OMA data into computational scripts and pipelines. The REST API can be freely accessed at https://omabrowser.org/api. The R OmaDB package is available as part of Bioconductor at http://bioconductor.org/packages/OmaDB/, and the omadb Python package is available from the Python Package Index (PyPI) at https://pypi.org/project/omadb/

Repository for Publications and Research Data

Serveur académique lausannois

Directory of Open Access Journals

UCL Discovery

OMAmer: tree-driven and alignment-free protein assignment to subfamilies outperforms closest sequence approaches.

Author: Dessimoz C.
Robinson-Rechavi M.
Rossier V.
Vesztrocy A.W.
Publication venue: 'Oxford University Press (OUP)'
Publication date: 31/03/2021
Field of study

Assigning new sequences to known protein families and subfamilies is a prerequisite for many functional, comparative and evolutionary genomics analyses. Such assignment is commonly achieved by looking for the closest sequence in a reference database, using a method such as BLAST. However, ignoring the gene phylogeny can be misleading because a query sequence does not necessarily belong to the same subfamily as its closest sequence. For example, a hemoglobin which branched out prior to the hemoglobin alpha/beta duplication could be closest to a hemoglobin alpha or beta sequence, whereas it is neither. To overcome this problem, phylogeny-driven tools have emerged but rely on gene trees, whose inference is computationally expensive. Here, we first show that in multiple animal and plant datasets, 18 to 62% of assignments by closest sequence are misassigned, typically to an over-specific subfamily. Then, we introduce OMAmer, a novel alignment-free protein subfamily assignment method, which limits over-specific subfamily assignments and is suited to phylogenomic databases with thousands of genomes. OMAmer is based on an innovative method using evolutionarily-informed k-mers for alignment-free mapping to ancestral protein subfamilies. Whilst able to reject non-homologous family-level assignments, we show that OMAmer provides better and quicker subfamily-level assignments than approaches relying on the closest sequence, whether inferred exactly by Smith-Waterman or by the fast heuristic DIAMOND. OMAmer is available from the Python Package Index (as omamer), with the source code and a precomputed database available at https://github.com/DessimozLab/omamer. Supplementary data are available at Bioinformatics online

Serveur académique lausannois

PubMed Central

UCL Discovery