Search CORE

Directory of Open Access Journals

University of Queensland eSpace

Application of compression-based distance measures to protein sequence classification: a methodological study

Author: András Kocsor
Attila Kertész-Farkas
László Kaján
Sándor Pongor
Publication venue
Publication date: 29/11/2005
Field of study

Abstract Motivation: Distance measures built on the notion of text compression have been used for the comparison and classification of entire genomes and mitochondrial genomes. The present study was undertaken in order to explore their utility in the classification of protein sequences. Results: We constructed compression-based distance measures (CBMs) using the Lempel-Zlv and the PPMZ compression algorithms and compared their performance with that of the Smith–Waterman algorithm and BLAST, using nearest neighbour or support vector machine classification schemes. The datasets included a subset of the SCOP protein structure database to test distant protein similarities, a 3-phosphoglycerate-kinase sequences selected from archaean, bacterial and eukaryotic species as well as low and high-complexity sequence segments of the human proteome, CBMs values show a dependence on the length and the complexity of the sequences compared. In classification tasks CBMs performed especially well on distantly related proteins where the performance of a combined measure, constructed from a CBM and a BLAST score, approached or even slightly exceeded that of the Smith–Waterman algorithm and two hidden Markov model-based algorithms. Contact: [email protected] Supplementary information

Open Access Repository

Virus–Host Coevolution with a Focus on Animal and Human DNA Viruses

Author: Doszpoly Andor
Kaján Győző
Papp Tibor
Tarján Zoltán László
Vidovszky Márton
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2020
Field of study

Repository of the Academy's Library

FreeContact: fast and free software for protein contact prediction from residue co-evolution

Author: Hopf Thomas A
Kaján László
Kalaš Matúš
Marks Debora S
Rost Burkhard
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2014
Field of study

Background: 20 years of improved technology and growing sequences now renders residue-residue contact constraints in large protein families through correlated mutations accurate enough to drive de novo predictions of protein three-dimensional structure. The method EVfold broke new ground using mean-field Direct Coupling Analysis (EVfold-mfDCA); the method PSICOV applied a related concept by estimating a sparse inverse covariance matrix. Both methods (EVfold-mfDCA and PSICOV) are publicly available, but both require too much CPU time for interactive applications. On top, EVfold-mfDCA depends on proprietary software. Results: Here, we present FreeContact, a fast, open source implementation of EVfold-mfDCA and PSICOV. On a test set of 140 proteins, FreeContact was almost eight times faster than PSICOV without decreasing prediction performance. The EVfold-mfDCA implementation of FreeContact was over 220 times faster than PSICOV with negligible performance decrease. EVfold-mfDCA was unavailable for testing due to its dependency on proprietary software. FreeContact is implemented as the free C++ library “libfreecontact”, complete with command line tool “freecontact”, as well as Perl and Python modules. All components are available as Debian packages. FreeContact supports the BioXSD format for interoperability. Conclusions: FreeContact provides the opportunity to compute reliable contact predictions in any environment (desktop or cloud)

University of Bergen

Harvard University - DASH

NORA - Norwegian Open Research Archives

Application of a simple likelihood ratio approximant to protein sequence classification

Author: András Kocsor
Attila Kertész-Farkas
Dino Franklin
László Kaján
Neli Ivanova
Sándor Pongor
Publication venue
Publication date: 01/12/2006
Field of study

Abstract Motivation: Likelihood ratio approximants (LRA) have been widely used for model comparison in statistics. The present study was undertaken in order to explore their utility as a scoring (ranking) function in the classification of protein sequences. Results: We used a simple LRA-based on the maximal similarity (or minimal distance) scores of the two top ranking sequence classes. The scoring methods (Smith–Waterman, BLAST, local alignment kernel and compression based distances) were compared on datasets designed to test sequence similarities between proteins distantly related in terms of structure or evolution. It was found that LRA-based scoring can significantly outperform simple scoring methods. Contact: [email protected]. Supplementary information:

Open Access Repository

Cloud Prediction of Protein Structure and Function with PredictProtein for Debian

Author: Angermüller Christof
Böhm Ariane
Domke Simon
Ertl Julia
Kaján László
Mertes Christian
Mirdita Milot
Reisinger Eva
Rost Burkhard
Staniewski Cedric
Steinegger Martin
Vicedo Esmeralda
Yachdav Guy
Publication venue: 'Hindawi Limited'
Publication date: 01/01/2013
Field of study

We report the release of PredictProtein for the Debian operating system and derivatives, such as Ubuntu, Bio-Linux, and Cloud BioLinux. The PredictProtein suite is available as a standard set of open source Debian packages. The release covers the most popular prediction methods from the Rost Lab, including methods for the prediction of secondary structure and solvent accessibility (profphd), nuclear localization signals (predictnls), and intrinsically disordered regions (norsnet). We also present two case studies that successfully utilize PredictProtein packages for high performance computing in the cloud: the first analyzes protein disorder for whole organisms, and the second analyzes the effect of all possible single sequence variants in protein coding regions of the human genome

SNU Open Repository and Archive

Directory of Open Access Journals

Aberdeen University Research

Community-driven development for computational biology at Sprints, Hackathons and Codefests

Author: Afgan Enis
Banck Michael
Bonnal Raoul JP
Booth Timothy
Chapman Brad A
Chilton John
Cock Peter JA
Guimera Roman Valls
Gumbel Markus
Harris Nomi
Holland Richard
Kaján László
Kalaš Matúš
Katayama Toshiaki
Kibukawa Eri
Möller Steffen
Powel David R
Prins Pjotr
Quinn Jacqueline
Sallou Olivier
Seemann Torsten
Sloggett Clare
Soiland-Reyes Stian
Spooner William
Steinbiss Sascha
Strozzi Francesco
Tille Andreas
Travis Anthony J
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2014
Field of study

Background: Computational biology comprises a wide range of technologies and approaches. Multiple technologies can be combined to create more powerful workflows if the individuals contributing the data or providing tools for its interpretation can find mutual understanding and consensus. Much conversation and joint investigation are required in order to identify and implement the best approaches. Traditionally, scientific conferences feature talks presenting novel technologies or insights, followed up by informal discussions during coffee breaks. In multi-institution collaborations, in order to reach agreement on implementation details or to transfer deeper insights in a technology and practical skills, a representative of one group typically visits the other. However, this does not scale well when the number of technologies or research groups is large. Conferences have responded to this issue by introducing Birds-of-a-Feather (BoF) sessions, which offer an opportunity for individuals with common interests to intensify their interaction. However, parallel BoF sessions often make it hard for participants to join multiple BoFs and find common ground between the different technologies, and BoFs are generally too short to allow time for participants to program together. Results: This report summarises our experience with computational biology Codefests, Hackathons and Sprints, which are interactive developer meetings. They are structured to reduce the limitations of traditional scientific meetings described above by strengthening the interaction among peers and letting the participants determine the schedule and topics. These meetings are commonly run as loosely scheduled "unconferences" (self-organized identification of participants and topics for meetings) over at least two days, with early introductory talks to welcome and organize contributors, followed by intensive collaborative coding sessions. We summarise some prominent achievements of those meetings and describe differences in how these are organised, how their audience is addressed, and their outreach to their respective communities. Conclusions: Hackathons, Codefests and Sprints share a stimulating atmosphere that encourages participants to jointly brainstorm and tackle problems of shared interest in a self-driven proactive environment, as well as providing an opportunity for new participants to get involved in collaborative projects

University of Bergen

Harvard University - DASH

The University of Manchester - Institutional Repository

NORA - Norwegian Open Research Archives

University of Melbourne Institutional Repository

NERC Open Research Archive

Evaluation of 3D-Jury on CASP7 models

Author: A Heger
A Marin
A Vullo
A Yamaguchi
AA Canutescu
AE Torda
B Wallner
B Wallner
D Fischer
D Fischer
D Fischer
DE Kim
J Cheng
J Shi
J Soding
J Xu
JM Bujnicki
JM Bujnicki
K Bryson
K Ginalski
K Ginalski
K Ginalski
K Karplus
K Karplus
K Karplus
K Tomii
KW DeRonne
L Jaroszewski
L Rychlewski
LA Kelley
Leszek Rychlewski
LH Hung
LJ McGuffin
László Kaján
MA Kurowski
N Siew
O Lund
O Teodorescu
PA Bates
S Liu
S Wu
W Jaśkowski
Y Zhang
Publication venue: BioMed Central
Publication date: 01/08/2007
Field of study

Abstract Background 3D-Jury, the structure prediction consensus method publicly available in the Meta Server <url>http://meta.bioinfo.pl/</url>, was evaluated using models gathered in the 7<it>th </it>round of the Critical Assessment of Techniques for Protein Structure Prediction (CASP7). 3D-Jury is an automated expert process that generates protein structure meta-predictions from sets of models obtained from partner servers. Results The performance of 3D-Jury was analysed for three aspects. First, we examined the correlation between the 3D-Jury score and a model quality measure: the number of correctly predicted residues. The 3D-Jury score was shown to correlate significantly with the number of correctly predicted residues, the correlation is good enough to be used for prediction. 3D-Jury was also found to improve upon the competing servers' choice of the best structure model in most cases. The value of the 3D-Jury score as a generic reliability measure was also examined. We found that the 3D-Jury score separates bad models from good models better than the reliability score of the original server in 27 cases and falls short of it in only 5 cases out of a total of 38. We report the release of a new Meta Server feature: instant 3D-Jury scoring of uploaded user models. Conclusion The 3D-Jury score continues to be a good indicator of structural model quality. It also provides a generic reliability score, especially important for models that were not assigned such by the original server. Individual structure modellers can also benefit from the 3D-Jury scoring system by testing their models in the new instant scoring feature <url>http://meta.bioinfo.pl/compare_your_model_example.pl</url> available in the Meta Server.</p

Directory of Open Access Journals