Search CORE

11 research outputs found

Word correlation matrices for protein sequence analysis and remote homology detection

Author: A Ben-Hur
A Krogh
AG Murzin
C Leslie
C Leslie
CS Leslie
G Cohen
H Rangwala
H Saigo
J Park
L Liao
O Chapelle
Peter Meinicke
QW Dong
R Finn
R Kuang
SF Altschul
T Jaakkola
T Lingner
TF Smith
Thomas Lingner
UniProtConsortium
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Abstract Background Classification of protein sequences is a central problem in computational biology. Currently, among computational methods discriminative kernel-based approaches provide the most accurate results. However, kernel-based methods often lack an interpretable model for analysis of discriminative sequence features, and predictions on new sequences usually are computationally expensive. Results In this work we present a novel kernel for protein sequences based on average word similarity between two sequences. We show that this kernel gives rise to a feature space that allows analysis of discriminative features and fast classification of new sequences. We demonstrate the performance of our approach on a widely-used benchmark setup for protein remote homology detection. Conclusion Our word correlation approach provides highly competitive performance as compared with state-of-the-art methods for protein remote homology detection. The learned model is interpretable in terms of biologically meaningful features. In particular, analysis of discriminative words allows the identification of characteristic regions in biological sequences. Because of its high computational efficiency, our method can be applied to ranking of potential homologs in large databases.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

CoMet—a web server for comparative functional profiling of metagenomes

Author: Aoki-Kinoshita
Ashburner
Daniel
Dinsdale
Fabian Schreiber
Finn
Gerlach
Goll
Hoff
Kathrin Petra Aßhauer
Kottmann
Kunin
Li
Lingner
Marchler-Bauer
Markowitz
Meinicke
Meyer
Meyer
Peter Meinicke
Rodriguez-Brito
Schreiber
Seshadri
Tatusov
Thomas Lingner
Torgerson
Tringe
Turnbaugh
Publication venue: Oxford University Press
Publication date
Field of study

Analyzing the functional potential of newly sequenced genomes and metagenomes has become a common task in biomedical and biological research. With the advent of high-throughput sequencing technologies comparative metagenomics opens the way to elucidate the genetically determined similarities and differences of complex microbial communities. We developed the web server ‘CoMet’ (http://comet.gobics.de), which provides an easy-to-use comparative metagenomics platform that is well-suitable for the analysis of large collections of metagenomic short read data. CoMet combines the ORF finding and subsequent assignment of protein sequences to Pfam domain families with a comparative statistical analysis. Besides comprehensive tabular data files, the CoMet server also provides visually interpretable output in terms of hierarchical clustering and multi-dimensional scaling plots and thus allows a quick overview of a given set of metagenomic samples

Crossref

PubMed Central

UFO: a web server for ultra-fast functional profiling of whole genome protein sequences

Author: Meinicke Peter
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Abstract Background Functional profiling is a key technique to characterize and compare the functional potential of entire genomes. The estimation of profiles according to an assignment of sequences to functional categories is a computationally expensive task because it requires the comparison of all protein sequences from a genome with a usually large database of annotated sequences or sequence families. Description Based on machine learning techniques for Pfam domain detection, the UFO web server for ultra-fast functional profiling allows researchers to process large protein sequence collections instantaneously. Besides the frequencies of Pfam and GO categories, the user also obtains the sequence specific assignments to Pfam domain families. In addition, a comparison with existing genomes provides dissimilarity scores with respect to 821 reference proteomes. Considering the underlying UFO domain detection, the results on 206 test genomes indicate a high sensitivity of the approach. In comparison with current state-of-the-art HMMs, the runtime measurements show a considerable speed up in the range of four orders of magnitude. For an average size prokaryotic genome, the computation of a functional profile together with its comparison typically requires about 10 seconds of processing time. Conclusion For the first time the UFO web server makes it possible to get a quick overview on the functional inventory of newly sequenced organisms. The genome scale comparison with a large number of precomputed profiles allows a first guess about functionally related organisms. The service is freely available and does not require user registration or specification of a valid email address.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Protein Remote Homology Detection Based on an Ensemble Learning Approach

Author
Publication venue: 'Hindawi Limited'
Publication date: 01/01/2016
Field of study

Crossref

MotifCluster: an interactive online tool for clustering and visualizing sequences using shared motifs

Author: Hamady Micah
Jeremy Widmann
Micah Hamady
Rob Knight
Shelley D Copley
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

MotifCluster finds related motifs in a set of sequences and clusters the sequences into families using the motifs they contain

Crossref

Springer

Springer - Publisher Connector

PubMed Central

eScholarship - University of California

Significant speedup of database searches with HMMs by search space reduction with PSSM family models

Author: Beckstette Michael
Giegerich Robert
Homann Robert
Kurtz Stefan
Publication venue: Oxford University Press
Publication date: 01/01/2009
Field of study

Motivation: Profile hidden Markov models (pHMMs) are currently the most popular modeling concept for protein families. They provide sensitive family descriptors, and sequence database searching with pHMMs has become a standard task in today's genome annotation pipelines. On the downside, searching with pHMMs is computationally expensive

CiteSeerX

PubMed Central

Publications at Bielefeld University

Estimating evolutionary distances between genomic sequences from spaced-word matches

Author
Publication venue: BioMed Central
Publication date: 11/02/2015
Field of study

Springer - Publisher Connector

Protein Remote Homology Detection Based on an Ensemble Learning Approach

Author: Bingquan Liu
Dong Huang
Junjie Chen
Publication venue
Publication date: 06/03/2020
Field of study

Protein remote homology detection is one of the central problems in bioinformatics. Although some computational methods have been proposed, the problem is still far from being solved. In this paper, an ensemble classifier for protein remote homology detection, called SVM-Ensemble, was proposed with a weighted voting strategy. SVM-Ensemble combined three basic classifiers based on different feature spaces, including Kmer, ACC, and SC-PseAAC. These features consider the characteristics of proteins from various perspectives, incorporating both the sequence composition and the sequence-order information along the protein sequences. Experimental results on a widely used benchmark dataset showed that the proposed SVM-Ensemble can obviously improve the predictive performance for the protein remote homology detection. Moreover, it achieved the best performance and outperformed other state-of-the-art methods

CiteSeerX

Alignment-free Phylogeny Reconstruction Based On Quartet Trees

Author: Dencker Thomas
Publication venue
Publication date: 04/03/2020
Field of study

Georg-August-University Göttingen

Word correlation matrices for protein sequence analysis and remote homology detection-0

Author: Peter Meinicke (18061)
Thomas Lingner (93705)
Publication venue
Publication date
Field of study

Um method and the word correlation method (WCM) using word length = 1, .., 6.Copyright information:Taken from "Word correlation matrices for protein sequence analysis and remote homology detection"http://www.biomedcentral.com/1471-2105/9/259BMC Bioinformatics 2008;9():259-259.Published online 3 Jun 2008PMCID:PMC2438326.</p

FigShare