Search CORE

42,061 research outputs found

On Prediction Using Variable Order Markov Models

Author: Begleiter R.
El-Yaniv R.
Yona G.
Publication venue: 'AI Access Foundation'
Publication date: 30/06/2011
Field of study

This paper is concerned with algorithms for prediction of discrete sequences over a finite alphabet, using variable order Markov models. The class of such algorithms is large and in principle includes any lossless compression algorithm. We focus on six prominent prediction algorithms, including Context Tree Weighting (CTW), Prediction by Partial Match (PPM) and Probabilistic Suffix Trees (PSTs). We discuss the properties of these algorithms and compare their performance using real life sequences from three domains: proteins, English text and music pieces. The comparison is made with respect to prediction quality as measured by the average log-loss. We also compare classification algorithms based on these predictors with respect to a number of large protein classification tasks. Our results indicate that a "decomposed" CTW (a variant of the CTW algorithm) and PPM outperform all other algorithms in sequence prediction tasks. Somewhat surprisingly, a different algorithm, which is a modification of the Lempel-Ziv compression algorithm, significantly outperforms all algorithms on the protein classification problems

arXiv.org e-Print Archive

Crossref

Diagnosing serious infections in acutely ill children in ambulatory care (ERNIE 2 study protocol, part A): diagnostic accuracy of a clinical decision tree and added value of a point-of-care C-reactive protein test and oxygen saturation

Author: Aertgeerts B.
Bullens D.M.A.
Buntinx F.
de Burghgraeve T.
de Sutter A.
ERNIE 2 Collaboration The
Lemiengre M.B.
Verbakel J.Y.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2014
Field of study

Background: Acute illness is the most common presentation of children to ambulatory care. In contrast, serious infections are rare and often present at an early stage. To avoid complications or death, early recognition and adequate referral are essential. In a recent large study children were included prospectively to construct a symptom-based decision tree with a sensitivity and negative predictive value of nearly 100%. To reduce the number of false positives, point-of-care tests might be useful, providing an immediate result at bedside. The most probable candidate is C-reactive protein, as well as a pulse oximetry. Methods: This is a diagnostic accuracy study of signs, symptoms and point-of-care tests for serious infections. Acutely ill children presenting to a family physician or paediatrician will be included consecutively in Flanders, Belgium. Children testing positive on the decision tree will get a point-of-care C-reactive protein test. Children testing negative will randomly either receive a point-of-care C-reactive protein test or usual care. The outcome of interest is hospital admission more than 24 hours with a serious infection within 10 days. Aiming to include over 6500 children, we will report the diagnostic accuracy of the decision tree (+/- the point-of-care C-reactive protein test or pulse oximetry) in sensitivity, specificity, positive and negative likelihood ratios, and positive and negative predictive values. New diagnostic algorithms will be constructed through classification and regression tree and multiple logistic regression analysis. Discussion: We aim to improve detection of serious infections, and present a practical tool for diagnostic triage of acutely ill children in primary care. We also aim to reduce the number of investigations and admissions in children with non-serious infections

Maastricht University Research Portal

Springer - Publisher Connector

Ghent University Academic Bibliography

PubMed Central

Oxford University Research Archive

Diagnosis of Ovarian Cancer Using Decision Tree Classification of Mass Spectral Data

Author: Coleman Robert L.
Gregory Betsy W.
Schorge John O.
Vlahou Antonia
Publication venue: Hindawi Publishing Corporation
Publication date: 01/01/2003
Field of study

Recent reports from our laboratory and others support the SELDI ProteinChip technology as a potential clinical diagnostic tool when combined with n-dimensional analyses algorithms. The objective of this study was to determine if the commercially available classification algorithm biomarker patterns software (BPS), which is based on a classification and regression tree (CART), would be effective in discriminating ovarian cancer from benign diseases and healthy controls. Serum protein mass spectrum profiles from 139 patients with either ovarian cancer, benign pelvic diseases, or healthy women were analyzed using the BPS software. A decision tree, using five protein peaks, resulted in an accuracy of 81.5% in the cross-validation analysis and 80% in a blinded set of samples in differentiating the ovarian cancer from the control groups. The potential, advantages, and drawbacks of the BPS system as a bioinformatic tool for the analysis of the SELDI high-dimensional proteomic data are discussed

Crossref

Directory of Open Access Journals

PubMed Central

NcPred for accurate nuclear protein prediction using n-mer statistics with various classification algorithms

Author: A. Ganesh
A. Garg
A. Pierleoni
A. Reinhardt
B. Alberts
B. Chan
B. Mathews
D. Xie
E. Marcotte
G. Hutchinson
H. Bannai
M. Hall
M. Kumar
O. Emanuelson
W. Jassem
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2011
Field of study

Prediction of nuclear proteins is one of the major challenges in genome annotation. A method, NcPred is described, for predicting nuclear proteins with higher accuracy exploiting n-mer statistics with different classification algorithms namely Alternating Decision (AD) Tree, Best First (BF) Tree, Random Tree and Adaptive (Ada) Boost. On BaCello dataset [1], NcPred improves about 20% accuracy with Random Tree and about 10% sensitivity with Ada Boost for Animal proteins compared to existing techniques. It also increases the accuracy of Fungal protein prediction by 20% and recall by 4% with AD Tree. In case of Human protein, the accuracy is improved by about 25% and sensitivity about 10% with BF Tree. Performance analysis of NcPred clearly demonstrates its suitability over the contemporary in-silico nuclear protein classification research

Northumbria Research Link

Crossref

AmorProt: Amino Acid Molecular Fingerprints Repurposing based Protein Fingerprint

Author: Lee Myeonghun
Min Kyoungmin
Publication venue
Publication date: 27/03/2023
Field of study

As protein therapeutics play an important role in almost all medical fields, numerous studies have been conducted on proteins using artificial intelligence. Artificial intelligence has enabled data driven predictions without the need for expensive experiments. Nevertheless, unlike the various molecular fingerprint algorithms that have been developed, protein fingerprint algorithms have rarely been studied. In this study, we proposed the amino acid molecular fingerprints repurposing based protein (AmorProt) fingerprint, a protein sequence representation method that effectively uses the molecular fingerprints corresponding to 20 amino acids. Subsequently, the performances of the tree based machine learning and artificial neural network models were compared using (1) amyloid classification and (2) isoelectric point regression. Finally, the applicability and advantages of the developed platform were demonstrated through a case study and the following experiments: (3) comparison of dataset dependence with feature based methods; (4) feature importance analysis; and (5) protein space analysis. Consequently, the significantly improved model performance and data set independent versatility of the AmorProt fingerprint were verified. The results revealed that the current protein representation method can be applied to various fields related to proteins, such as predicting their fundamental properties or interaction with ligands

arXiv.org e-Print Archive

Exploiting grammatical relations for protein relation extraction and role labeling

Author: Cornelis Chris
De Cock Martine
Fayruzov Timur
Hoste Veronique
Publication venue: Oce-Nederland
Publication date: 01/01/2008
Field of study

Ghent University Academic Bibliography

On the hierarchical classification of G Protein-Coupled Receptors

Author: A. A. Freitas
A. Secker
Attwood
Bhasin
Bhasin
Bissantz
Cardoso
Christopoulos
D. R. Flower
Das
Davies
Flower
Flower
Foord
Gether
Gloriam
Guo
Horn
H bert
J. Timmis
Karchin
Keerthi
Klabunde
Kolakowski
Lapinsh
M. Mendao
M. N. Davies
Milligan
Papasaikas
Prabhu
Sandberg
Schi th
Publication venue: 'Oxford University Press (OUP)'
Publication date: 22/10/2007
Field of study

Motivation: G protein-coupled receptors (GPCRs) play an important role in many physiological systems by transducing an extracellular signal into an intracellular response. Over 50% of all marketed drugs are targeted towards a GPCR. There is considerable interest in developing an algorithm that could effectively predict the function of a GPCR from its primary sequence. Such an algorithm is useful not only in identifying novel GPCR sequences but in characterizing the interrelationships between known GPCRs. Results: An alignment-free approach to GPCR classification has been developed using techniques drawn from data mining and proteochemometrics. A dataset of over 8000 sequences was constructed to train the algorithm. This represents one of the largest GPCR datasets currently available. A predictive algorithm was developed based upon the simplest reasonable numerical representation of the protein's physicochemical properties. A selective top-down approach was developed, which used a hierarchical classifier to assign sequences to subdivisions within the GPCR hierarchy. The predictive performance of the algorithm was assessed against several standard data mining classifiers and further validated against Support Vector Machine-based GPCR prediction servers. The selective top-down approach achieves significantly higher accuracy than standard data mining methods in almost all cases

CiteSeerX

Crossref

Aberystwyth Research Portal

Kent Academic Repository

Random forests with random projections of the output space for high dimensional multi-label classification

Author: D. Achlioptas
D. Kocev
E.J. Candes
F. Pedregosa
G. Madjarov
G. Tsoumakas
G. Tsoumakas
J. Read
J.L. Faulon
L. Breiman
P. Geurts
W.B. Johnson
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2014
Field of study

We adapt the idea of random projections applied to the output space, so as to enhance tree-based ensemble methods in the context of multi-label classification. We show how learning time complexity can be reduced without affecting computational complexity and accuracy of predictions. We also show that random output space projections may be used in order to reach different bias-variance tradeoffs, over a broad panel of benchmark problems, and that this may lead to improved accuracy while reducing significantly the computational burden of the learning stage

arXiv.org e-Print Archive

Crossref

Open Repository and Bibliography - Liège