220 research outputs found

    Authentication of tequilas using pattern recognition and supervised classification

    Get PDF
    [Abstract] Sales of reputed, Mexican tequila grown substantially in last years and, therefore, counterfeiting is increasing steadily. Hence, methodologies intended to characterize and authenticate commercial beverages are a real need. They require a combination of analytical characterization and chemometric tools. This work reports concisely on the former and focus on the chemometric tools employed so far in connection with them. Further, a practical case study presents the classification capabilities of nine supervised classification methods to differentiate white, rested, aged and extra-aged tequilas. The largest set of certified tequilas employed so far was considered. In general, non linear methods performed best than linear ones (accuracy higher than 94% in both training and validation). The case study demonstrates that it is possible to develop fast, cheap, easy to implement and reliable analytical methodologies to authenticate and classify samples of tequilas.Xunta de Galicia; GRC2013-047Ministerio de Industria, EnergĆ­a y Competitividad; FJCI-2015-2607

    Overview of Random Forest Methodology and Practical Guidance with Emphasis on Computational Biology and Bioinformatics

    Get PDF
    The Random Forest (RF) algorithm by Leo Breiman has become a standard data analysis tool in bioinformatics. It has shown excellent performance in settings where the number of variables is much larger than the number of observations, can cope with complex interaction structures as well as highly correlated variables and returns measures of variable importance. This paper synthesizes ten years of RF development with emphasis on applications to bioinformatics and computational biology. Special attention is given to practical aspects such as the selection of parameters, available RF implementations, and important pitfalls and biases of RF and its variable importance measures (VIMs). The paper surveys recent developments of the methodology relevant to bioinformatics as well as some representative examples of RF applications in this context and possible directions for future research

    Random forest application on cognitive level classification of E-learning content

    Get PDF
    The e-learning is the primary method of learning for most learners after the regular academics studies. The knowledge delivery through e-learning technologies increased exponentially over the years because of the advancement in internet and e-learning technologies. Knowledge delivery to some people would never have been possible without the e-learning technologies. Most of the working professional do focused studies for carrier advancement, promotion or to improve the domain knowledge. These learner can find many free e-learning web sites from the internet easily in the domain of interest. However it is quite difficult to find the best e-learning content suitable for their learning based on their domain knowledge level. User spent most of the time figuring out the right content from a plethora of available content and end up learning nothing. An intelligent framework using machine learning algorithms with Random Forest Classifier is proposed to address this issue, which classifies the e-learning content based on its difficulty levels and provide the learner the best content suitable based on the knowledge level .The frame work is trained with the data set collected from multiple popular e-learning web sites. The model is tested with real time e-learning web sites links and found that the e-contents in the web sites are recommended to the user based on its difficulty levels as beginner level, intermediate level and advanced level

    Predicting disease risks from highly imbalanced data using random forest

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>We present a method utilizing Healthcare Cost and Utilization Project (HCUP) dataset for predicting disease risk of individuals based on their medical diagnosis history. The presented methodology may be incorporated in a variety of applications such as risk management, tailored health communication and decision support systems in healthcare.</p> <p>Methods</p> <p>We employed the National Inpatient Sample (NIS) data, which is publicly available through Healthcare Cost and Utilization Project (HCUP), to train random forest classifiers for disease prediction. Since the HCUP data is highly imbalanced, we employed an ensemble learning approach based on repeated random sub-sampling. This technique divides the training data into multiple sub-samples, while ensuring that each sub-sample is fully balanced. We compared the performance of support vector machine (SVM), bagging, boosting and RF to predict the risk of eight chronic diseases.</p> <p>Results</p> <p>We predicted eight disease categories. Overall, the RF ensemble learning method outperformed SVM, bagging and boosting in terms of the area under the receiver operating characteristic (ROC) curve (AUC). In addition, RF has the advantage of computing the importance of each variable in the classification process.</p> <p>Conclusions</p> <p>In combining repeated random sub-sampling with RF, we were able to overcome the class imbalance problem and achieve promising results. Using the national HCUP data set, we predicted eight disease categories with an average AUC of 88.79%.</p
    • ā€¦
    corecore