1,940 research outputs found

    Practical feature subset selection for machine learning

    Get PDF
    Machine learning algorithms automatically extract knowledge from machine readable information. Unfortunately, their success is usually dependant on the quality of the data that they operate on. If the data is inadequate, or contains extraneous and irrelevant information, machine learning algorithms may produce less accurate and less understandable results, or may fail to discover anything of use at all. Feature subset selection can result in enhanced performance, a reduced hypothesis search space, and, in some cases, reduced storage requirement. This paper describes a new feature selection algorithm that uses a correlation based heuristic to determine the “goodness” of feature subsets, and evaluates its effectiveness with three common machine learning algorithms. Experiments using a number of standard machine learning data sets are presented. Feature subset selection gave significant improvement for all three algorithm

    Improving stacking methodology for combining classifiers: applications to cosmetic industry

    Get PDF
    International audienceStacking (Wolpert (1992), Breiman (1996)) is known to be a successful way of linearly combining several models. We modify the usual stacking methodology when the response is binary and predictions highly correlated,by combining predictions with PLS-Discriminant Analysis instead of ordinary least squares. For small data sets we develop a strategy based on repeated split samples in order to select relevant variables and ensure the robustness of the nal model. Five base (or level-0) classiers are combined in order to get an improved rule which is applied to a classical benchmark of UCI Machine Learning Repository. Our methodology is then applied to the prediction of dangerousness of 165 chemicals used in the cosmetic industry, described by 35 in vitro and in silico characteristics, since faced to safety constraints, one cannot rely on a single prediction method, especially when the sample sizeis low

    Analysis of Data mining based Software Defect Prediction Techniques

    Get PDF
    Software bug repository is the main resource for fault prone modules. Different data mining algorithms are used to extract fault prone modules from these repositories. Software development team tries to increase the software quality by decreasing the number of defects as much as possible. In this paper different data mining techniques are discussed for identifying fault prone modules as well as compare the data mining algorithms to find out the best algorithm for defect prediction

    Phishing Detection using Base Classifier and Ensemble Technique

    Get PDF
    Phishing attacks continue to pose a significant threat in today's digital landscape, with both individuals and organizations falling victim to these attacks on a regular basis. One of the primary methods used to carry out phishing attacks is through the use of phishing websites, which are designed to look like legitimate sites in order to trick users into giving away their personal information, including sensitive data such as credit card details and passwords. This research paper proposes a model that utilizes several benchmark classifiers, including LR, Bagging, RF, K-NN, DT, SVM, and Adaboost, to accurately identify and classify phishing websites based on accuracy, precision, recall, f1-score, and confusion matrix. Additionally, a meta-learner and stacking model were combined to identify phishing websites in existing systems. The proposed ensemble learning approach using stack-based meta-learners proved to be highly effective in identifying both legitimate and phishing websites, achieving an accuracy rate of up to 97.19%, with precision, recall, and f1 scores of 97%, 98%, and 98%, respectively. Thus, it is recommended that ensemble learning, particularly with stacking and its meta-learner variations, be implemented to detect and prevent phishing attacks and other digital cyber threats

    Student Behaviour Analysis To Detect Learning Styles Using Decision Tree, Naïve Bayes, And K-Nearest Neighbor Method In Moodle Learning Management System

    Get PDF
    A learning management system (LMS) manages online learning and facilitates inter- action in the teaching and learning processes. Teachers can use LMS to determine student activities or interactions with their courses. Everyone learns uniquely. It is necessary to understand their learning style to apply it in students’ learning activi- ties. One factor contributing to learning success is the use of an appropriate learning style, which allows the information received to be appropriately conveyed and clearly understood. As a result, we require a mechanism to identify learning styles. This study develops a learning style detection system based on learning behavior at the LMS of Christian Vocational School Petra Surabaya for the subject of Network System Administration using the Decision Tree, Naïve Bayes, and K-Nearest Neigh- bor. The results of the study showed that the Decision Tree method could better detect and predict learning styles, namely using the 80:20 train split test, which obtained an accuracy of 0.96 process time of 0.000998 seconds, while the K-Fold 10 Cross-Validation test obtained an accuracy of 0.98 and a processing time of 0.04033 seconds
    corecore