1,091 research outputs found

    A COMPARATIVE STUDY ON PERFORMANCE OF BASIC AND ENSEMBLE CLASSIFIERS WITH VARIOUS DATASETS

    Get PDF
    Classification plays a critical role in machine learning (ML) systems for processing images, text and high -dimensional data. Predicting class labels from training data is the primary goal of classification. An optimal model for a particular classification problem is chosen on the basis of the model's performance and execution time. This paper compares and analyses the performance of basic as well as ensemble classifiers utilizing 10 -fold cross validation and also discusses their essential concepts, advantages, and disadvantages. In this study five basic classifiers namely Naïve Bayes (NB), Multi-layer Perceptron (MLP), Support Vector Machine (SVM), Decision Tree (DT), and Random Forest (RF) and the ensemble of all the five classifiers along with few more combinations are compared with five University of California Irvine (UCI) ML Repository datasets and a Diabetes Health Indicators dataset from kaggle repository. To analyze and compare the performance of classifiers, evaluation metrics like Accuracy, Recall, Precision, Area Under Curve (AUC) and F-Score are used. Experimental results showed that SVM performs best on two out of the six datasets (Diabetes Health Indicators and waveform), RF performs best for Arrhythmia, Sonar, Tic-tac-toe datasets, and the best ensemble combination is found to be DT+SVM+RF on Ionosphere dataset having respective accuracies 72.58%, 90.38%, 81.63%, 73.59%, 94.78% and 94.01% and the proposed ensemble combinations outperformed over the conventional models for few datasets

    Spectrally-normalized margin bounds for neural networks

    Full text link
    This paper presents a margin-based multiclass generalization bound for neural networks that scales with their margin-normalized "spectral complexity": their Lipschitz constant, meaning the product of the spectral norms of the weight matrices, times a certain correction factor. This bound is empirically investigated for a standard AlexNet network trained with SGD on the mnist and cifar10 datasets, with both original and random labels; the bound, the Lipschitz constants, and the excess risks are all in direct correlation, suggesting both that SGD selects predictors whose complexity scales with the difficulty of the learning task, and secondly that the presented bound is sensitive to this complexity.Comment: Comparison to arXiv v1: 1-norm in main bound refined to (2,1)-group-norm. Comparison to NIPS camera ready: typo fixe

    Multiclass Classification of Brain MRI through DWT and GLCM Feature Extraction with Various Machine Learning Algorithms

    Get PDF
    This study delves into the domain of medical diagnostics, focusing on the crucial task of accurately classifying brain tumors to facilitate informed clinical decisions and optimize patient outcomes. Employing a diverse ensemble of machine learning algorithms, the paper addresses the challenge of multiclass brain tumor classification. The investigation centers around the utilization of two distinct datasets: the Brats dataset, encompassing cases of High-Grade Glioma (HGG) and Low-Grade Glioma (LGG), and the Sartaj dataset, comprising instances of Glioma, Meningioma, and No Tumor. Through the strategic deployment of Discrete Wavelet Transform (DWT) and Gray-Level Co-occurrence Matrix (GLCM) features, coupled with the implementation of Support Vector Machines (SVM), k-nearest Neighbors (KNN), Decision Trees (DT), Random Forest, and Gradient Boosting algorithms, the research endeavors to comprehensively explore avenues for achieving precise tumor classification. Preceding the classification process, the datasets undergo pre-processing and the extraction of salient features through DWT-derived frequency-domain characteristics and texture insights harnessed from GLCM. Subsequently, a detailed exposition of the selected algorithms is provided and elucidates the pertinent hyperparameters. The study's outcomes unveil noteworthy performance disparities across diverse algorithms and datasets. SVM and Random Forest algorithms exhibit commendable accuracy rates on the Brats dataset, while the Gradient Boosting algorithm demonstrates superior performance on the Sartaj dataset. The evaluation process encompasses precision, recall, and F1-score metrics, thereby providing a comprehensive assessment of the classification prowess of the employed algorithms

    Web Page Multiclass Classification

    Get PDF
    As the internet age evolves, the volume of content hosted on the Web is rapidly expanding. With this ever-expanding content, the capability to accurately categorize web pages is a current challenge to serve many use cases. This paper proposes a variation in the approach to text preprocessing pipeline whereby noun phrase extraction is performed first followed by lemmatization, contraction expansion, removing special characters, removing extra white space, lower casing, and removal of stop words. The first step of noun phrase extraction is aimed at reducing the set of terms to those that best describe what the web pages are about to improve the categorization capabilities of the model. Separately, a text preprocessing using keyword extraction is evaluated. In addition to the text preprocessing techniques mentioned, feature reduction techniques are applied to optimize model performance. Several modeling techniques are examined using these two approaches and are compared to a baseline model. The baseline model is a Support Vector Machine with linear kernel and is based on text preprocessing and feature reduction techniques that do not include noun phrase extraction or keyword extraction and uses stemming rather than lemmatization. The recommended SVM One-Versus-One model based on noun phrase extraction and lemmatization during text preprocessing shows an accuracy improvement over the baseline model of nearly 1% and a 5-fold reduction in misclassification of web pages as undesirable categories

    Stock market prediction using machine learning classifiers and social media, news

    Get PDF
    Accurate stock market prediction is of great interest to investors; however, stock markets are driven by volatile factors such as microblogs and news that make it hard to predict stock market index based on merely the historical data. The enormous stock market volatility emphasizes the need to effectively assess the role of external factors in stock prediction. Stock markets can be predicted using machine learning algorithms on information contained in social media and financial news, as this data can change investors’ behavior. In this paper, we use algorithms on social media and financial news data to discover the impact of this data on stock market prediction accuracy for ten subsequent days. For improving performance and quality of predictions, feature selection and spam tweets reduction are performed on the data sets. Moreover, we perform experiments to find such stock markets that are difficult to predict and those that are more influenced by social media and financial news. We compare results of different algorithms to find a consistent classifier. Finally, for achieving maximum prediction accuracy, deep learning is used and some classifiers are ensembled. Our experimental results show that highest prediction accuracies of 80.53% and 75.16% are achieved using social media and financial news, respectively. We also show that New York and Red Hat stock markets are hard to predict, New York and IBM stocks are more influenced by social media, while London and Microsoft stocks by financial news. Random forest classifier is found to be consistent and highest accuracy of 83.22% is achieved by its ensemble
    • …
    corecore