Search CORE

33 research outputs found

Protein classification in a machine learning framework

Author: Kertész-Farkas Attila
Publication venue
Publication date: 25/11/2009
Field of study

SZTE Doktori Értekezések Repozitórium (SZTE Repository of Dissertations)

Application of compression-based distance measures to protein sequence classification: a methodological study

Author: András Kocsor
Attila Kertész-Farkas
László Kaján
Sándor Pongor
Publication venue
Publication date: 29/11/2005
Field of study

Abstract Motivation: Distance measures built on the notion of text compression have been used for the comparison and classification of entire genomes and mitochondrial genomes. The present study was undertaken in order to explore their utility in the classification of protein sequences. Results: We constructed compression-based distance measures (CBMs) using the Lempel-Zlv and the PPMZ compression algorithms and compared their performance with that of the Smith–Waterman algorithm and BLAST, using nearest neighbour or support vector machine classification schemes. The datasets included a subset of the SCOP protein structure database to test distant protein similarities, a 3-phosphoglycerate-kinase sequences selected from archaean, bacterial and eukaryotic species as well as low and high-complexity sequence segments of the human proteome, CBMs values show a dependence on the length and the complexity of the sequences compared. In classification tasks CBMs performed especially well on distantly related proteins where the performance of a combined measure, constructed from a CBM and a BLAST score, approached or even slightly exceeded that of the Smith–Waterman algorithm and two hidden Markov model-based algorithms. Contact: [email protected] Supplementary information

Open Access Repository

Chemical rule-based filtering of MS/MS spectra

Author: Attila Kertész-Farkas
Beáta Reiz
Michael P. Myers
Sándor Pongor
Publication venue
Publication date: 15/02/2013
Field of study

Abstract Motivation: Identification of proteins by mass spectrometry–based proteomics requires automated interpretation of peptide tandem mass spectrometry spectra. The effectiveness of peptide identification can be greatly improved by filtering out extraneous noise peaks before the subsequent database searching steps. Results: Here we present a novel chemical rule-based filtering algorithm, termed CRF, which makes use of the predictable patterns (rules) of collision-induced peptide fragmentation. The algorithm selects peak pairs that obey the common fragmentation rules within plausible limits of mass tolerance as well as peak intensity and produces spectra that can be subsequently submitted to any search engine. CRF increases the positive predictive value and decreases the number of random matches and thus improves performance by 15–20% in terms of peptide annotation using search engines, such as X!Tandem. Importantly, the algorithm also achieves data compression rates of ∼75%. Availability: The MATLAB source code and a web server are available at http://hydrax.icgeb.trieste.it/CRFilter/ Contact: [email protected] Supplementary information: Supplementary data are available at Bioinformatics online

Open Access Repository

Compact representation of Hungarian vocabulary with nondeterministic finite automata

Author: Fülöp Zoltán
Kertész-Farkas Attila
Kocsor András
Publication venue
Publication date: 01/01/2003
Field of study

University of Szeged

Magyar nyelvű szótárak tömör reprezentációja nemdeterminisztikus automatákkal

Author: Fülöp Zoltán
Kertész-Farkas Attila
Kocsor András
Publication venue
Publication date: 01/01/2003
Field of study

University of Szeged

Application of a simple likelihood ratio approximant to protein sequence classification

Author: András Kocsor
Attila Kertész-Farkas
Dino Franklin
László Kaján
Neli Ivanova
Sándor Pongor
Publication venue
Publication date: 01/12/2006
Field of study

Abstract Motivation: Likelihood ratio approximants (LRA) have been widely used for model comparison in statistics. The present study was undertaken in order to explore their utility as a scoring (ranking) function in the classification of protein sequences. Results: We used a simple LRA-based on the maximal similarity (or minimal distance) scores of the two top ranking sequence classes. The scoring methods (Smith–Waterman, BLAST, local alignment kernel and compression based distances) were compared on datasets designed to test sequence similarities between proteins distantly related in terms of structure or evolution. It was found that LRA-based scoring can significantly outperform simple scoring methods. Contact: [email protected]. Supplementary information:

Open Access Repository

A Protein Classification Benchmark collection for machine learning

Author: Dhir Somdutta
Gáspári Zoltán
Kertész-Farkas Attila
Kocsor András
Leunissen Jack A.M.
Pacurar Mircea
Pongor Sándor
Sonego Paolo
Publication venue: Oxford University Press
Publication date: 01/01/2006
Field of study

Protein classification by machine learning algorithms is now widely used in structural and functional annotation of proteins. The Protein Classification Benchmark collection () was created in order to provide standard datasets on which the performance of machine learning methods can be compared. It is primarily meant for method developers and users interested in comparing methods under standardized conditions. The collection contains datasets of sequences and structures, and each set is subdivided into positive/negative, training/test sets in several ways. There is a total of 6405 classification tasks, 3297 on protein sequences, 3095 on protein structures and 10 on protein coding regions in DNA. Typical tasks include the classification of structural domains in the SCOP and CATH databases based on their sequences or structures, as well as various functional and taxonomic classification problems. In the case of hierarchical classification schemes, the classification tasks can be defined at various levels of the hierarchy (such as classes, folds, superfamilies, etc.). For each dataset there are distance matrices available that contain all vs. all comparison of the data, based on various sequence or structure comparison methods, as well as a set of classification performance measures computed with various classifier algorithms

ELTE Digital Institutional Repository (EDIT)

Emergence of Collective Territorial Defense in Bacterial Communities: Horizontal Gene Transfer Can Stabilize Microbiomes

Author: A Vrieze
Attila Kertész-Farkas
B Stecher
BP Willing
DM Cornforth
Dóra Szabó
FJ Richards
G Hardin
GF Gause
JN Thompson
JR Chandler
János Juhász
L Liu
LD McDaniel
LJ Brandt
ME Hibbing
Miklos S. Kellermayer
P Csermely
R Kellermayer
R Mendes
SP Diggle
Sándor Pongor
T Akiba
TD Lawley
TW Schoener
V Bucci
V Koonin E
V Koonin E
V Venturi
V Venturi
Á Kerényi
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 22/04/2014
Field of study

Multispecies bacterial communities such as the microbiota of the gastrointestinal tract can be remarkably stable and resilient even though they consist of cells and species that compete for resources and also produce a large number of antimicrobial agents. Computational modeling suggests that horizontal transfer of resistance genes may greatly contribute to the formation of stable and diverse communities capable of protecting themselves with a battery of antimicrobial agents while preserving a varied metabolic repertoire of the constituent species. In other words horizontal transfer of resistance genes makes a community compatible in terms of exoproducts and capable to maintain a varied and mature metagenome. The same property may allow microbiota to protect a host organism, or if used as a microbial therapy, to purge pathogens and restore a protective environment

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

Semmelweis Repository

GaIn: Human Gait Inference for Lower Limbic Prostheses for Patients Suffering from Double Trans-Femoral Amputation

Author: Attila Kertész-Farkas
Roman Chereshnev
Publication venue: 'MDPI AG'
Publication date: 01/11/2018
Field of study

Several studies have analyzed human gait data obtained from inertial gyroscope and accelerometer sensors mounted on different parts of the body. In this article, we take a step further in gait analysis and provide a methodology for predicting the movements of the legs, which can be applied in prosthesis to imitate the missing part of the leg in walking. In particular, we propose a method, called GaIn, to control non-invasive, robotic, prosthetic legs. GaIn can infer the movements of both missing shanks and feet for humans suffering from double trans-femoral amputation using biologically inspired recurrent neural networks. Predictions are performed for casual walking related activities such as walking, taking stairs, and running based on thigh movement. In our experimental tests, GaIn achieved a 4.55° prediction error for shank movements on average. However, a patient’s intention to stand up and sit down cannot be inferred from thigh movements. In fact, intention causes thigh movements while the shanks and feet remain roughly still. The GaIn system can be triggered by thigh muscle activities measured with electromyography (EMG) sensors to make robotic prosthetic legs perform standing up and sitting down actions. The GaIn system has low prediction latency and is fast and computationally inexpensive to be deployed on mobile platforms and portable devices

Multidisciplinary Digital Publishing Institute

Directory of Open Access Journals