Search CORE

220 research outputs found

Authentication of tequilas using pattern recognition and supervised classification

Author: Andrade-Garda José Manuel
Durán J.J.
Fernández-Lozano Carlos
Jiménez I.
Miguel-Cruz F.
Molina Y.
Olmos P.
Pérez-Caballero G.
Publication venue: 'Elsevier BV'
Publication date: 01/01/2017
Field of study

[Abstract] Sales of reputed, Mexican tequila grown substantially in last years and, therefore, counterfeiting is increasing steadily. Hence, methodologies intended to characterize and authenticate commercial beverages are a real need. They require a combination of analytical characterization and chemometric tools. This work reports concisely on the former and focus on the chemometric tools employed so far in connection with them. Further, a practical case study presents the classification capabilities of nine supervised classification methods to differentiate white, rested, aged and extra-aged tequilas. The largest set of certified tequilas employed so far was considered. In general, non linear methods performed best than linear ones (accuracy higher than 94% in both training and validation). The case study demonstrates that it is possible to develop fast, cheap, easy to implement and reliable analytical methodologies to authenticate and classify samples of tequilas.Xunta de Galicia; GRC2013-047Ministerio de Industria, Energía y Competitividad; FJCI-2015-2607

Repositorio da Universidade da Coruña

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Overview of Random Forest Methodology and Practical Guidance with Emphasis on Computational Biology and Bioinformatics

Author: Boulesteix Anne-Laure
Janitza Silke
Kruppa Jochen
König Inke R.
Publication venue
Publication date: 25/07/2012
Field of study

The Random Forest (RF) algorithm by Leo Breiman has become a standard data analysis tool in bioinformatics. It has shown excellent performance in settings where the number of variables is much larger than the number of observations, can cope with complex interaction structures as well as highly correlated variables and returns measures of variable importance. This paper synthesizes ten years of RF development with emphasis on applications to bioinformatics and computational biology. Special attention is given to practical aspects such as the selection of parameters, available RF implementations, and important pitfalls and biases of RF and its variable importance measures (VIMs). The paper surveys recent developments of the methodology relevant to bioinformatics as well as some representative examples of RF applications in this context and possible directions for future research

Open Access LMU

Random forest application on cognitive level classification of E-learning content

Author: J. Chandra
Thomas Benny
Publication venue: Institute of Advanced Engineering and Science
Publication date: 01/08/2020
Field of study

The e-learning is the primary method of learning for most learners after the regular academics studies. The knowledge delivery through e-learning technologies increased exponentially over the years because of the advancement in internet and e-learning technologies. Knowledge delivery to some people would never have been possible without the e-learning technologies. Most of the working professional do focused studies for carrier advancement, promotion or to improve the domain knowledge. These learner can find many free e-learning web sites from the internet easily in the domain of interest. However it is quite difficult to find the best e-learning content suitable for their learning based on their domain knowledge level. User spent most of the time figuring out the right content from a plethora of available content and end up learning nothing. An intelligent framework using machine learning algorithms with Random Forest Classifier is proposed to address this issue, which classifies the e-learning content based on its difficulty levels and provide the learner the best content suitable based on the knowledge level .The frame work is trained with the data set collected from multiple popular e-learning web sites. The model is tested with real time e-learning web sites links and found that the e-contents in the web sites are recommended to the user based on its difficulty levels as beginner level, intermediate level and advanced level

ZENODO

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

Institute of Advanced Engineering and Science

Predicting disease risks from highly imbalanced data using random forest

Author: AP Bradley
C Chen
D Palmer
DA Davis
DH Mantzaris
E Cohen
F Provost
HCUP Project
J Mingers
JR Quinlan
L Breiman
L Breiman
L Breiman
M Skubic
Mihail Popescu
Mohammed Khalilia
N Japkowicz
P Hebert
Sounak Chakraborty
ST Moturu
T Hastie
T Yi
V Fuster
W Yu
W Zhang
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Abstract Background We present a method utilizing Healthcare Cost and Utilization Project (HCUP) dataset for predicting disease risk of individuals based on their medical diagnosis history. The presented methodology may be incorporated in a variety of applications such as risk management, tailored health communication and decision support systems in healthcare. Methods We employed the National Inpatient Sample (NIS) data, which is publicly available through Healthcare Cost and Utilization Project (HCUP), to train random forest classifiers for disease prediction. Since the HCUP data is highly imbalanced, we employed an ensemble learning approach based on repeated random sub-sampling. This technique divides the training data into multiple sub-samples, while ensuring that each sub-sample is fully balanced. We compared the performance of support vector machine (SVM), bagging, boosting and RF to predict the risk of eight chronic diseases. Results We predicted eight disease categories. Overall, the RF ensemble learning method outperformed SVM, bagging and boosting in terms of the area under the receiver operating characteristic (ROC) curve (AUC). In addition, RF has the advantage of computing the importance of each variable in the classification process. Conclusions In combining repeated random sub-sampling with RF, we were able to overcome the class imbalance problem and achieve promising results. Using the national HCUP data set, we predicted eight disease categories with an average AUC of 88.79%.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central