Search CORE

265 research outputs found

Histogram Transform Ensembles for Large-scale Regression

Author: Hang Hanyuan
Lin Zhouchen
Liu Xiaoyu
Wen Hongwei
Publication venue
Publication date: 01/04/2021
Field of study

University of Twente Research Information

An Enhanced Random Linear Oracle Ensemble Method using Feature Selection Approach based on Naïve Bayes Classifier

Author: Abdul Rahim Norasmadi
Abdul Shukor Shazmin Aniza
Masnan Maz Jamilah
Ooi Boon Pin
Zakaria Ammar
Publication venue: Journal of Telecommunication, Electronic and Computer Engineering (JTEC)
Publication date: 30/12/2017
Field of study

Random Linear Oracle (RLO) ensemble replaced each classifier with two mini-ensembles, allowing base classifiers to be trained using different data set, improving the variety of trained classifiers. Naïve Bayes (NB) classifier was chosen as the base classifier for this research due to its simplicity and computational inexpensive. Different feature selection algorithms are applied to RLO ensemble to investigate the effect of different sized data towards its performance. Experiments were carried out using 30 data sets from UCI repository, as well as 6 learning algorithms, namely NB classifier, RLO ensemble, RLO ensemble trained with Genetic Algorithm (GA) feature selection using accuracy of NB classifier as fitness function, RLO ensemble trained with GA feature selection using accuracy of RLO ensemble as fitness function, RLO ensemble trained with t-test feature selection, and RLO ensemble trained with Kruskal-Wallis test feature selection. The results showed that RLO ensemble could significantly improve the diversity of NB classifier in dealing with distinctively selected feature sets through its fusionselection paradigm. Consequently, feature selection algorithms could greatly benefit RLO ensemble, with properly selected number of features from filter approach, or GA natural selection from wrapper approach, it received great classification accuracy improvement, as well as growth in diversity

Universiti Teknikal Malaysia Melaka: UTeM Open Journal System

Prediction of glycosylation sites using random forests

Author: Hamby Stephen E
Hirst Jonathan D
Publication venue: BioMed Central
Publication date: 01/11/2008
Field of study

Abstract Background Post translational modifications (PTMs) occur in the vast majority of proteins and are essential for function. Prediction of the sequence location of PTMs enhances the functional characterisation of proteins. Glycosylation is one type of PTM, and is implicated in protein folding, transport and function. Results We use the random forest algorithm and pairwise patterns to predict glycosylation sites. We identify pairwise patterns surrounding glycosylation sites and use an odds ratio to weight their propensity of association with modified residues. Our prediction program, GPP (glycosylation prediction program), predicts glycosylation sites with an accuracy of 90.8% for Ser sites, 92.0% for Thr sites and 92.8% for Asn sites. This is significantly better than current glycosylation predictors. We use the trepan algorithm to extract a set of comprehensible rules from GPP, which provide biological insight into all three major glycosylation types. Conclusion We have created an accurate predictor of glycosylation sites and used this to extract comprehensible rules about the glycosylation process. GPP is available online at <url>http://comp.chem.nottingham.ac.uk/glyco/</url>.</p

Springer - Publisher Connector

Directory of Open Access Journals

Classifier ensembles for f MRI data analysis: an experiment

Author: Juan J Rodríguez
Ludmila I Kuncheva
Publication venue
Publication date: 24/04/2020
Field of study

Abstract Functional magnetic resonance imaging (fMRI) is becoming a forefront brain-computer interface tool. To decipher brain patterns, fast, accurate and reliable classifier methods are needed. The support vector machine (SVM) classifier has been traditionally used. Here we argue that state-of-the-art methods from pattern recognition and machine learning, such as classifier ensembles, offer more accurate classification. This study compares 18 classification methods on a publicly available real data set due to Haxby et al. [Science 293 (2001[Science 293 ( ) 2425[Science 293 ( -2430. The data comes from a single-subject experiment, organized in 10 runs where eight classes of stimuli were presented in each run. The comparisons were carried out on voxel subsets of different sizes, selected through seven popular voxel selection methods. We found that, while SVM was robust, accurate and scalable, some classifier ensemble methods demonstrated significantly better performance. The best classifiers were found to be the random subspace ensemble of SVM classifiers, rotation forest and ensembles with random linear and random spherical oracle

CiteSeerX

Mitigating concept drift in data mining applications for intrusion detection systems

Author: Koutrouki Evgenia
Publication venue
Publication date: 29/05/2018
Field of study

Real-time data mining models for predicting length of stay in intensive care units

Author: Abelha António
Machado José Manuel
Portela Filipe
Rua Fernando
Santos Manuel
Silva Álvaro
Veloso Rui
Publication venue: INSTICC Press
Publication date: 01/11/2014
Field of study

Nowadays the efficiency of costs and resources planning in hospitals embody a critical role in the management of these units. Length Of Stay (LOS) is a good metric when the goal is to decrease costs and to optimize resources. In Intensive Care Units (ICU) optimization assumes even a greater importance derived from the high costs associated to inpatients. This study presents two data mining approaches to predict LOS in an ICU. The first approach considered the admission variables and some other physiologic variables collected during the first 24 hours of inpatient. The second approach considered admission data and supplementary clinical data of the patient (vital signs and laboratory results) collected in real-time. The results achieved in the first approach are very poor (accuracy of 73 %). However, when the prediction is made using the data collected in real-time, the results are very interesting (sensitivity of 96.104%). The models induced in second experiment are sensitive to the patient clinical situation and can predict LOS according to the monitored variables. Models for predicting LOS at admission are not suited to the ICU particularities. Alternatively, they should be induced in real-time, using online-learning and considering the most recent patient condition when the model is induced.(undefined