294 research outputs found

    Comparative study of standalone classifier and ensemble classifier

    Get PDF
    Ensemble learning is one of machine learning method that can solve performance measurement problem. Standalone classifiers often show a poor performance result, thus why combining them with ensemble methods can improve their performance scores. Ensemble learning has several methods, in this study, three methods of ensemble learning are compared with standalone classifiers of support vector machine, Naïve Bayes, and decision tree. bagging, AdaBoost, and voting are the ensemble methods that are combined then compared to standalone classifiers. From 1670 dataset of twitter mentions about tourist’s attraction, ensemble methods did not show a specific improvement in accuracy and precision measurement since it generated the same result as decision tree as standalone classifier. Bagging method showed a significant development in recall, f-measure, and area under curve (AUC) measurement. For overall performance, decision tree as standalone classifier and decision tree with AdaBoost method have the highest score for accuracy and precision measurements, meanwhile support vector machine with bagging method has the highest score for recall, f-measure, and AUC

    Q-learning with online trees

    Get PDF
    Reinforcement learning is one of the major areas of artificial intelligence that has been studied rigorously in recent years. Among numerous methodologies, Q-learning is one of the most fundamental model-free reinforcement learning algorithms, and it has inspired many researchers. Several studies have shown great results by approximating the action-value function, one of the essential elements in Q-learning, using non-linear supervised learning models such as deep neural networks. This combination has led to the surpassing humanlevel performances in complex problems such as the Atari games and Go, which have been difficult to solve with standard tabular Q-learning. However, both Q-learning and the deep neural network typically used as the function approximator require very large computational resources to train. We propose using the online random forest method as the function approximator for the action-value function to mitigate this. We grow one online random forest for each possible action in a Markov decision process (MDP) environment. Each forest approximates the corresponding action-value function for that action, and the agent chooses the action in the succeeding state according to the resulting approximated action-value functions. When the agent executes an action, an observation consisting of the state, action, reward, and the subsequent state is stored in an experience replay. Then, the observations are randomly sampled to participate in the growth of the online random forests. The terminal nodes of the trees in the random forests corresponding to each sample randomly generate tests for the decision tree splits. Among them, the test that gives the lowest residual sum of squares after splitting is selected. The trees of the online random forests grown in this way age each time they take in a sample observation. One of the trees that is older than a certain age is then selected at random and replaced by a new tree according to its out-of-bag error. In our study, forest size plays an important role. Our algorithm constitutes an adaptation of previously developed Online Random Forests to reinforcement learning. To reduce computational costs, we first grow a small-sized forest and then expand them after a certain period of episodes. We observed in our experiments that this forest size expansion showed better performances in later episodes. Furthermore, we found that our method outperformed some deep neural networks in simple MDP environments. We hope that this study will be a medium to promote research on the combination of reinforcement learning and tree-based methods

    Classification Techniques In Blood Donors Sector – A Survey

    Get PDF
    This paper focuses on the classification and the recent trends associated with it. It presents a survey of the classification system and clarifies how classification and data mining are related both to each other. Classification is arranging the blood donor dataset into the predefined group and helpful to predict group membership for data instances. This enables users to search target donors become easier because the blood stocks always required replacing expired stocks after a certain period and useful in emergency demands such as surgery and blood transfusion. This paper has also sought to identify the research area in classification to fulfill gaps where further work can be carried on

    Predicting protein disorder by analyzing amino acid sequence

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Many protein regions and some entire proteins have no definite tertiary structure, presenting instead as dynamic, disorder ensembles under different physiochemical circumstances. These proteins and regions are known as Intrinsically Unstructured Proteins (IUP). IUP have been associated with a wide range of protein functions, along with roles in diseases characterized by protein misfolding and aggregation.</p> <p>Results</p> <p>Identifying IUP is important task in structural and functional genomics. We exact useful features from sequences and develop machine learning algorithms for the above task. We compare our IUP predictor with PONDRs (mainly neural-network-based predictors), disEMBL (also based on neural networks) and Globplot (based on disorder propensity).</p> <p>Conclusion</p> <p>We find that augmenting features derived from physiochemical properties of amino acids (such as hydrophobicity, complexity etc.) and using ensemble method proved beneficial. The IUP predictor is a viable alternative software tool for identifying IUP protein regions and proteins.</p

    A New Efficiency Improvement of Ensemble Learning for Heart Failure Classification by Least Error Boosting

    Get PDF
    Heart failure is a very common disease, often a silent threat. It's also costly to treat and detect. There is also a steadily higher incidence rate of the disease at present. Although researchers have developed classification algorithms. Cardiovascular disease data were used by various ensemble learning methods, but the classification efficiency was not high enough due to the cumulative error that can occur from any weak learner effect and the accuracy of the vote-predicted class label. The objective of this research is the development of a new algorithm that improves the efficiency of the classification of patients with heart failure. This paper proposes Least Error Boosting (LEBoosting), a new algorithm that improves adaboost.m1's performance for higher classification accuracy. The learning algorithm finds the lowest error among various weak learners to be used to identify the lowest possible errors to update distribution to create the best final hypothesis in classification. Our trial will use the heart failure clinical records dataset, which contains 13 features of cardiac patients. Performance metrics are measured through precision, recall, f-measure, accuracy, and the ROC curve. Results from the experiment found that the proposed method had high performance compared to naïve bayes, k-NN,and decision tree, and outperformed other ensembles including bagging, logitBoost, LPBoost, and adaboost.m1, with an accuracy of 98.89%, and classified the capabilities of patients who died accurately as well compared to decision tree and bagging, which were completely indistinguishable. The findings of this study found that LEBoosting was able to maximize error reductions in the weak learner's training process from any weak learner to maximize the effectiveness of cardiology classifiers and to provide theoretical guidance to develop a model for analysis and prediction of heart disease. The novelty of this research is to improve original ensemble learning by finding the weak learner with the lowest error in order to update the best distribution to the final hypothesis, which will give LEBoosting the highest classification efficiency. Doi: 10.28991/ESJ-2023-07-01-010 Full Text: PD

    Data Mining Techniques for Fraud Detection

    Get PDF
    The paper presents application of data mining techniques to fraud analysis. We present some classification and prediction data mining techniques which we consider important to handle fraud detection. There exist a number of data mining algorithms and we present statistics-based algorithm, decision tree-based algorithm and rule-based algorithm. We present Bayesian classification model to detect fraud in automobile insurance. Naïve Bayesian visualization is selected to analyze and interpret the classifier predictions. We illustrate how ROC curves can be deployed for model assessment in order to provide a more intuitive analysis of the models. Keywords: Data Mining, Decision Tree, Bayesian Network, ROC Curve, Confusion Matri
    • …
    corecore