265 research outputs found

    An Enhanced Random Linear Oracle Ensemble Method using Feature Selection Approach based on Naïve Bayes Classifier

    Get PDF
    Random Linear Oracle (RLO) ensemble replaced each classifier with two mini-ensembles, allowing base classifiers to be trained using different data set, improving the variety of trained classifiers. Naïve Bayes (NB) classifier was chosen as the base classifier for this research due to its simplicity and computational inexpensive. Different feature selection algorithms are applied to RLO ensemble to investigate the effect of different sized data towards its performance. Experiments were carried out using 30 data sets from UCI repository, as well as 6 learning algorithms, namely NB classifier, RLO ensemble, RLO ensemble trained with Genetic Algorithm (GA) feature selection using accuracy of NB classifier as fitness function, RLO ensemble trained with GA feature selection using accuracy of RLO ensemble as fitness function, RLO ensemble trained with t-test feature selection, and RLO ensemble trained with Kruskal-Wallis test feature selection. The results showed that RLO ensemble could significantly improve the diversity of NB classifier in dealing with distinctively selected feature sets through its fusionselection paradigm. Consequently, feature selection algorithms could greatly benefit RLO ensemble, with properly selected number of features from filter approach, or GA natural selection from wrapper approach, it received great classification accuracy improvement, as well as growth in diversity

    Prediction of glycosylation sites using random forests

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Post translational modifications (PTMs) occur in the vast majority of proteins and are essential for function. Prediction of the sequence location of PTMs enhances the functional characterisation of proteins. Glycosylation is one type of PTM, and is implicated in protein folding, transport and function.</p> <p>Results</p> <p>We use the random forest algorithm and pairwise patterns to predict glycosylation sites. We identify pairwise patterns surrounding glycosylation sites and use an odds ratio to weight their propensity of association with modified residues. Our prediction program, GPP (glycosylation prediction program), predicts glycosylation sites with an accuracy of 90.8% for Ser sites, 92.0% for Thr sites and 92.8% for Asn sites. This is significantly better than current glycosylation predictors. We use the trepan algorithm to extract a set of comprehensible rules from GPP, which provide biological insight into all three major glycosylation types.</p> <p>Conclusion</p> <p>We have created an accurate predictor of glycosylation sites and used this to extract comprehensible rules about the glycosylation process. GPP is available online at <url>http://comp.chem.nottingham.ac.uk/glyco/</url>.</p

    Classifier ensembles for f MRI data analysis: an experiment

    Get PDF
    Abstract Functional magnetic resonance imaging (fMRI) is becoming a forefront brain-computer interface tool. To decipher brain patterns, fast, accurate and reliable classifier methods are needed. The support vector machine (SVM) classifier has been traditionally used. Here we argue that state-of-the-art methods from pattern recognition and machine learning, such as classifier ensembles, offer more accurate classification. This study compares 18 classification methods on a publicly available real data set due to Haxby et al. [Science 293 (2001[Science 293 ( ) 2425[Science 293 ( -2430. The data comes from a single-subject experiment, organized in 10 runs where eight classes of stimuli were presented in each run. The comparisons were carried out on voxel subsets of different sizes, selected through seven popular voxel selection methods. We found that, while SVM was robust, accurate and scalable, some classifier ensemble methods demonstrated significantly better performance. The best classifiers were found to be the random subspace ensemble of SVM classifiers, rotation forest and ensembles with random linear and random spherical oracle

    Real-time data mining models for predicting length of stay in intensive care units

    Get PDF
    Nowadays the efficiency of costs and resources planning in hospitals embody a critical role in the management of these units. Length Of Stay (LOS) is a good metric when the goal is to decrease costs and to optimize resources. In Intensive Care Units (ICU) optimization assumes even a greater importance derived from the high costs associated to inpatients. This study presents two data mining approaches to predict LOS in an ICU. The first approach considered the admission variables and some other physiologic variables collected during the first 24 hours of inpatient. The second approach considered admission data and supplementary clinical data of the patient (vital signs and laboratory results) collected in real-time. The results achieved in the first approach are very poor (accuracy of 73 %). However, when the prediction is made using the data collected in real-time, the results are very interesting (sensitivity of 96.104%). The models induced in second experiment are sensitive to the patient clinical situation and can predict LOS according to the monitored variables. Models for predicting LOS at admission are not suited to the ICU particularities. Alternatively, they should be induced in real-time, using online-learning and considering the most recent patient condition when the model is induced.(undefined
    corecore