Search CORE

26,403 research outputs found

A study of hierarchical and flat classification of proteins

Author: Buchwald Fabian
Frank Eibe
Kramer Stefan
Zimek Arthur
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2008
Field of study

Automatic classification of proteins using machine learning is an important problem that has received significant attention in the literature. One feature of this problem is that expert-defined hierarchies of protein classes exist and can potentially be exploited to improve classification performance. In this article we investigate empirically whether this is the case for two such hierarchies. We compare multi-class classification techniques that exploit the information in those class hierarchies and those that do not, using logistic regression, decision trees, bagged decision trees, and support vector machines as the underlying base learners. In particular, we compare hierarchical and flat variants of ensembles of nested dichotomies. The latter have been shown to deliver strong classification performance in multi-class settings. We present experimental results for synthetic, fold recognition, enzyme classification, and remote homology detection data. Our results show that exploiting the class hierarchy improves performance on the synthetic data, but not in the case of the protein classification problems. Based on this we recommend that strong flat multi-class methods be used as a baseline to establish the benefit of exploiting class hierarchies in this area

Research Commons@Waikato

AGMIAL: implementing an annotation strategy for prokaryote genomes as a distributed system

Author: Bessières P.
Bossy R.
Bryson K.
Chaillou S.
Gibrat J.-F.
Hoebeke M.
Loux V.
Maguin E.
Nicolas P.
Penaud S.
van de Guchte M.
Publication venue
Publication date: 01/07/2006
Field of study

We have implemented a genome annotation system for prokaryotes called AGMIAL. Our approach embodies a number of key principles. First, expert manual annotators are seen as a critical component of the overall system; user interfaces were cyclically refined to satisfy their needs. Second, the overall process should be orchestrated in terms of a global annotation strategy; this facilitates coordination between a team of annotators and automatic data analysis. Third, the annotation strategy should allow progressive and incremental annotation from a time when only a few draft contigs are available, to when a final finished assembly is produced. The overall architecture employed is modular and extensible, being based on the W3 standard Web services framework. Specialized modules interact with two independent core modules that are used to annotate, respectively, genomic and protein sequences. AGMIAL is currently being used by several INRA laboratories to analyze genomes of bacteria relevant to the food-processing industry, and is distributed under an open source license

UCL Discovery

Integration of Biological Sources: Exploring the Case of Protein Homology

Author: Boerman Tjeerd W.
Keulen Maurice van
Severing Edouard I.
Vet Paul van der
Publication venue: University of Twente, Centre for Telematics and Information Technology
Publication date: 01/01/2011
Field of study

Data integration is a key issue in the domain of bioin- formatics, which deals with huge amounts of heteroge- neous biological data that grows and changes rapidly. This paper serves as an introduction in the field of bioinformatics and the biological concepts it deals with, and an exploration of the integration problems a bioinformatics scientist faces. We examine ProGMap, an integrated protein homology system used by bioin- formatics scientists at Wageningen University, and several use cases related to protein homology. A key issue we identify is the huge manual effort required to unify source databases into a single resource. Un- certain databases are able to contain several possi- ble worlds, and it has been proposed that they can be used to significantly reduce initial integration efforts. We propose several directions for future work where uncertain databases can be applied to bioinformatics, with the goal of furthering the cause of bioinformatics integration

University of Twente Research Information

TRAPID : an efficient online tool for the functional and comparative analysis of de novo RNA-Seq transcriptomes

Author: Deforce Dieter
Proost Sebastian
Van Bel Michiel
Van de Peer Yves
Van Neste Christophe
Vandepoele Klaas
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2013
Field of study

Transcriptome analysis through next-generation sequencing technologies allows the generation of detailed gene catalogs for non-model species, at the cost of new challenges with regards to computational requirements and bioinformatics expertise. Here, we present TRAPID, an online tool for the fast and efficient processing of assembled RNA-Seq transcriptome data, developed to mitigate these challenges. TRAPID offers high-throughput open reading frame detection, frameshift correction and includes a functional, comparative and phylogenetic toolbox, making use of 175 reference proteomes. Benchmarking and comparison against state-of-the-art transcript analysis tools reveals the efficiency and unique features of the TRAPID system

Springer - Publisher Connector

Ghent University Academic Bibliography

PubMed Central

UPSpace at the University of Pretoria

Topological descriptors for 3D surface analysis

Author: A Othmani
AE Johnson
C Seiffert
DG Lowe
H Edelsbrunner
HJ Poincaré
J Wohlfeil
M Juda
RM Haralick
S Belongie
T Ojala
U Bauer
V López
Y Freund
Z Guo
Publication venue
Publication date: 01/01/2016
Field of study

We investigate topological descriptors for 3D surface analysis, i.e. the classification of surfaces according to their geometric fine structure. On a dataset of high-resolution 3D surface reconstructions we compute persistence diagrams for a 2D cubical filtration. In the next step we investigate different topological descriptors and measure their ability to discriminate structurally different 3D surface patches. We evaluate their sensitivity to different parameters and compare the performance of the resulting topological descriptors to alternative (non-topological) descriptors. We present a comprehensive evaluation that shows that topological descriptors are (i) robust, (ii) yield state-of-the-art performance for the task of 3D surface analysis and (iii) improve classification performance when combined with non-topological descriptors.Comment: 12 pages, 3 figures, CTIC 201

arXiv.org e-Print Archive

Crossref

Jagiellonian Univeristy Repository