Search CORE

1,940 research outputs found

Practical feature subset selection for machine learning

Author: Hall Mark A.
Smith Lloyd A.
Publication venue: 'Springer Fachmedien Wiesbaden GmbH'
Publication date: 01/01/1998
Field of study

Machine learning algorithms automatically extract knowledge from machine readable information. Unfortunately, their success is usually dependant on the quality of the data that they operate on. If the data is inadequate, or contains extraneous and irrelevant information, machine learning algorithms may produce less accurate and less understandable results, or may fail to discover anything of use at all. Feature subset selection can result in enhanced performance, a reduced hypothesis search space, and, in some cases, reduced storage requirement. This paper describes a new feature selection algorithm that uses a correlation based heuristic to determine the “goodness” of feature subsets, and evaluates its effectiveness with three common machine learning algorithms. Experiments using a number of standard machine learning data sets are presented. Feature subset selection gave significant improvement for all three algorithm

CiteSeerX

Research Commons@Waikato

Improving stacking methodology for combining classifiers: applications to cosmetic industry

Author: Gomes Charles
Nocairi Hisham
Saporta Gilbert
Thomas Marie
Publication venue: HAL CCSD
Publication date: 14/10/2016
Field of study

International audienceStacking (Wolpert (1992), Breiman (1996)) is known to be a successful way of linearly combining several models. We modify the usual stacking methodology when the response is binary and predictions highly correlated,by combining predictions with PLS-Discriminant Analysis instead of ordinary least squares. For small data sets we develop a strategy based on repeated split samples in order to select relevant variables and ensure the robustness of the nal model. Five base (or level-0) classiers are combined in order to get an improved rule which is applied to a classical benchmark of UCI Machine Learning Repository. Our methodology is then applied to the prediction of dangerousness of 165 chemicals used in the cosmetic industry, described by 35 in vitro and in silico characteristics, since faced to safety constraints, one cannot rely on a single prediction method, especially when the sample sizeis low

HAL Descartes

Hal-Diderot

Analysis of Data mining based Software Defect Prediction Techniques

Author: Naheed Azeem
Shazia Usmani
Publication venue: Global Journals Inc. (US)
Publication date: 23/08/2011
Field of study

Software bug repository is the main resource for fault prone modules. Different data mining algorithms are used to extract fault prone modules from these repositories. Software development team tries to increase the software quality by decreasing the number of defects as much as possible. In this paper different data mining techniques are discussed for identifying fault prone modules as well as compare the data mining algorithms to find out the best algorithm for defect prediction

Global Journal of Computer Science and Technology (GJCST)

Phishing Detection using Base Classifier and Ensemble Technique

Author: Pal Rekha
Pal Saurabh
Pandey Manish Ranjan
Pandey Mithilesh Kumar
Shahi Shantanu
Shukla Arvind Kumar
Publication venue: Auricle Global Society of Education and Research
Publication date: 07/10/2023
Field of study

Phishing attacks continue to pose a significant threat in today's digital landscape, with both individuals and organizations falling victim to these attacks on a regular basis. One of the primary methods used to carry out phishing attacks is through the use of phishing websites, which are designed to look like legitimate sites in order to trick users into giving away their personal information, including sensitive data such as credit card details and passwords. This research paper proposes a model that utilizes several benchmark classifiers, including LR, Bagging, RF, K-NN, DT, SVM, and Adaboost, to accurately identify and classify phishing websites based on accuracy, precision, recall, f1-score, and confusion matrix. Additionally, a meta-learner and stacking model were combined to identify phishing websites in existing systems. The proposed ensemble learning approach using stack-based meta-learners proved to be highly effective in identifying both legitimate and phishing websites, achieving an accuracy rate of up to 97.19%, with precision, recall, and f1 scores of 97%, 98%, and 98%, respectively. Thus, it is recommended that ensemble learning, particularly with stacking and its meta-learner variations, be implemented to detect and prevent phishing attacks and other digital cyber threats

International Journal on Recent and Innovation Trends in Computing and Communication

Student Behaviour Analysis To Detect Learning Styles Using Decision Tree, Naïve Bayes, And K-Nearest Neighbor Method In Moodle Learning Management System

Author: Sianturi Santi Tiodora
Yuhana Umi Laili
Publication venue: 'Lembaga Penelitian dan Pengabdian kepada Masyarakat ITS'
Publication date: 30/08/2022
Field of study

A learning management system (LMS) manages online learning and facilitates inter- action in the teaching and learning processes. Teachers can use LMS to determine student activities or interactions with their courses. Everyone learns uniquely. It is necessary to understand their learning style to apply it in students’ learning activi- ties. One factor contributing to learning success is the use of an appropriate learning style, which allows the information received to be appropriately conveyed and clearly understood. As a result, we require a mechanism to identify learning styles. This study develops a learning style detection system based on learning behavior at the LMS of Christian Vocational School Petra Surabaya for the subject of Network System Administration using the Decision Tree, Naïve Bayes, and K-Nearest Neigh- bor. The results of the study showed that the Decision Tree method could better detect and predict learning styles, namely using the 80:20 train split test, which obtained an accuracy of 0.96 process time of 0.000998 seconds, while the K-Fold 10 Cross-Validation test obtained an accuracy of 0.98 and a processing time of 0.04033 seconds

Center for Scientific Publication

IPTEK The Journal for Technology and Science