Search CORE

22,085 research outputs found

An Ensemble Framework Approach to Crop Type Prediction Using Feature Selection and Multiclass Classification

Author: A. Tamilmani et al.
Publication venue: Auricle Global Society of Education and Research
Publication date: 30/10/2023
Field of study

Crop type classification plays a crucial role in modern agriculture, aiding in yield prediction, resource management, and land-use planning. This paper presents a comprehensive framework for crop type classification utilizing a combination of feature selection techniques, robust classification Algorithm, and a Support Vector Machine (SVM)-based multiclass classification approach. The proposed framework begins with a novel feature selection process that identifies the most relevant attributes from the Agricultural Data and Rainfall data. This feature selection step is essential for reducing data dimensionality, enhancing classification accuracy, and improving model interpretability. Following feature selection, a state-of-the-art multiclass classification strategy based on Support Vector Machines is employed. SVMs are known for their capability to handle high-dimensional data and have demonstrated superior performance in various classification tasks. In this framework, SVMs are adapted to handle multiclass crop type classification efficiently. The model is trained on the selected features and optimized using hyperparameter tuning techniques to ensure robust performance

International Journal on Recent and Innovation Trends in Computing and Communication

Recommended from our members

Comparison of Feature Selection and Classification for MALDI-MS Data

Author: Chen Zhongxue
Deng Youping
Huang Xudong
Liu Qingzhong
Qiao Mengyu
Sung Andrew H
Yang Jack Y
Yang Mary Qu
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 16/03/2011
Field of study

Introduction: In the classification of Mass Spectrometry (MS) proteomics data, peak detection, feature selection, and learning classifiers are critical to classification accuracy. To better understand which methods are more accurate when classifying data, some publicly available peak detection algorithms for Matrix assisted Laser Desorption Ionization Mass Spectrometry (MALDI-MS) data were recently compared; however, the issue of different feature selection methods and different classification models as they relate to classification performance has not been addressed. With the application of intelligent computing, much progress has been made in the development of feature selection methods and learning classifiers for the analysis of high-throughput biological data. The main objective of this paper is to compare the methods of feature selection and different learning classifiers when applied to MALDI-MS data and to provide a subsequent reference for the analysis of MS proteomics data. Results: We compared a well-known method of feature selection, Support Vector Machine Recursive Feature Elimination (SVMRFE), and a recently developed method, Gradient based Leave-one-out Gene Selection (GLGS) that effectively performs microarray data analysis. We also compared several learning classifiers including K-Nearest Neighbor Classifier (KNNC), Naïve Bayes Classifier (NBC), Nearest Mean Scaled Classifier (NMSC), uncorrelated normal based quadratic Bayes Classifier recorded as UDC, Support Vector Machines, and a distance metric learning for Large Margin Nearest Neighbor classifier (LMNN) based on Mahanalobis distance. To compare, we conducted a comprehensive experimental study using three types of MALDI-MS data. Conclusion: Regarding feature selection, SVMRFE outperformed GLGS in classification. As for the learning classifiers, when classification models derived from the best training were compared, SVMs performed the best with respect to the expected testing accuracy. However, the distance metric learning LMNN outperformed SVMs and other classifiers on evaluating the best testing. In such cases, the optimum classification model based on LMNN is worth investigating for future study

Harvard University - DASH

Comparison of feature selection and classification for MALDI-MS data

Author: Andrew H Sung
Jack Y Yang
Mary Yang
Mengyu Qiao
Qingzhong Liu
Xudong Huang
Youping Deng
Zhongxue Chen
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2009
Field of study

INTRODUCTION: In the classification of Mass Spectrometry (MS) proteomics data, peak detection, feature selection, and learning classifiers are critical to classification accuracy. To better understand which methods are more accurate when classifying data, some publicly available peak detection algorithms for Matrix assisted Laser Desorption Ionization Mass Spectrometry (MALDI-MS) data were recently compared; however, the issue of different feature selection methods and different classification models as they relate to classification performance has not been addressed. With the application of intelligent computing, much progress has been made in the development of feature selection methods and learning classifiers for the analysis of high-throughput biological data. The main objective of this paper is to compare the methods of feature selection and different learning classifiers when applied to MALDI-MS data and to provide a subsequent reference for the analysis of MS proteomics data. RESULTS: We compared a well-known method of feature selection, Support Vector Machine Recursive Feature Elimination (SVMRFE), and a recently developed method, Gradient based Leave-one-out Gene Selection (GLGS) that effectively performs microarray data analysis. We also compared several learning classifiers including K-Nearest Neighbor Classifier (KNNC), Naïve Bayes Classifier (NBC), Nearest Mean Scaled Classifier (NMSC), uncorrelated normal based quadratic Bayes Classifier recorded as UDC, Support Vector Machines, and a distance metric learning for Large Margin Nearest Neighbor classifier (LMNN) based on Mahanalobis distance. To compare, we conducted a comprehensive experimental study using three types of MALDI-MS data. CONCLUSION: Regarding feature selection, SVMRFE outperformed GLGS in classification. As for the learning classifiers, when classification models derived from the best training were compared, SVMs performed the best with respect to the expected testing accuracy. However, the distance metric learning LMNN outperformed SVMs and other classifiers on evaluating the best testing. In such cases, the optimum classification model based on LMNN is worth investigating for future study

Crossref

Scholarly Works @ SHSU (Sam Houston State University)

Springer - Publisher Connector

PubMed Central

An Evaluation of Text Classification Methods for Literary Study

Author: Yu Bei
Publication venue: SURFACE at Syracuse University
Publication date: 01/01/2008
Field of study

This article presents an empirical evaluation of text classification methods in literary domain. This study compared the performance of two popular algorithms, naı¨ve Bayes and support vector machines (SVMs) in two literary text classification tasks: the eroticism classification of Dickinson’s poems and the sentimentalism classification of chapters in early American novels. The algorithms were also combined with three text pre-processing tools, namely stemming, stopword removal, and statistical feature selection, to study the impact of these tools on the classifiers’ performance in the literary setting. Existing studies outside the literary domain indicated that SVMs are generally better than naı¨ve Bayes classifiers. However, in this study SVMs were not all winners. Both algorithms achieved high accuracy in sentimental chapter classification, but the naı¨ve Bayes classifier outperformed the SVM classifier in erotic poem classification. Self-feature selection helped both algorithms improve their performance in both tasks. However, the two algorithms selected relevant features in different frequency ranges, and therefore captured different characteristics of the target classes. The evaluation results in this study also suggest that arbitrary featurereduction steps such as stemming and stopword removal should be taken very carefully. Some stopwords were highly discriminative features for Dickinson’s erotic poem classification. In sentimental chapter classification, stemming undermined subsequent feature selection by aggressively conflating and neutralizing discriminative features

CiteSeerX

Syracuse University Research Facility and Collaborative Environment

Combining active learning and semi-supervised learning techniques to extract protein interaction sentences

Author: A Yakushiji
AC McCallum
B Cui
C Blaschke
C Friedman
CJC Burges
D Zhou
F Fung
G Erkan
G Schohn
G Tur
Hwanjo Yu
J Lafferty
J Pustejovsky
JM Temkin
KP Bennett
L Smith
M Huang
M Song
MC Jenkins
Min Song
O Chapelle
O Chapelle
O Chapelle
R Bunescu
S Kim
S Pyysalo
S Pyysalo
T Joachims
T Luo
T Mitsumori
T Ono
TK Jenssen
V Sindhwani
Wook-Shin Han
WW Chapman
X Zhu
Y Miyao
Z Yang
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Background: Protein-protein interaction (PPI) extraction has been a focal point of many biomedical research and database curation tools. Both Active Learning and Semi-supervised SVMs have recently been applied to extract PPI automatically. In this paper, we explore combining the AL with the SSL to improve the performance of the PPI task. Methods: We propose a novel PPI extraction technique called PPISpotter by combining Deterministic Annealing-based SSL and an AL technique to extract protein-protein interaction. In addition, we extract a comprehensive set of features from MEDLINE records by Natural Language Processing (NLP) techniques, which further improve the SVM classifiers. In our feature selection technique, syntactic, semantic, and lexical properties of text are incorporated into feature selection that boosts the system performance significantly. Results: By conducting experiments with three different PPI corpuses, we show that PPISpotter is superior to the other techniques incorporated into semi-supervised SVMs such as Random Sampling, Clustering, and Transductive SVMs by precision, recall, and F-measure. Conclusions: Our system is a novel, state-of-the-art technique for efficiently extracting protein-protein interaction pairs.X116sciescopu

Crossref

Springer - Publisher Connector

PubMed Central

포항공과대학교

Assessment of SVM Reliability for Microarray Data Analysis

Author: Blanzieri Enrico
Malossini Andrea
Ng Raymond
Publication venue
Publication date: 01/12/2004
Field of study

The goal of our research is to provide techniques that can assess and validate the results of SVM-based analysis of microarray data. We present preliminary results of the effect of mislabeled training samples. We conducted several systematic experiments on artificial and real medical data using SVMs. We systematically flipped the labels of a fraction of the training data. We show that a relatively small number of mislabeled examples can dramatically decrease the performance as visualized on the ROC graphs. This phenomenon persists even if the dimensionality of the input space is drastically decreased, by using for example feature selection. Moreover we show that for SVM recursive feature elimination, even a small fraction of mislabeled samples can completely change the resulting set of genes. This work is an extended version of the previous paper [MBN04]

Unitn-eprints Research

Supervised machine learning algorithms for the estimation of the probability of default in corporate credit risk

Author: Sariev Eduard
Publication venue: UCL (University College London)
Publication date: 28/02/2021
Field of study

This thesis investigates the application of non-linear supervised machine learning algorithms for estimating Probability of Default (PD) of corporate clients. To achieve this, the thesis is separated into three different experiments: 1. The first experiment investigates a wrapper feature selection method and its application on the support vector machines (SVMs) and logistic regression (LR). The logistic regression model is the most popular approach used for estimating PD in a rich default portfolio. However, other alternatives to PD estimation are available. SVMs method is compared to the logistic regression model using the proposed feature selection method. 2. The second experiment investigates the application of artificial neural networks (ANNs) for estimating PD of corporate clients. In particular ANNs are regularized and trained both with classical and Bayesian approach. Furthermore, different network architectures are explored and specifically the Bayesian estimation and regularization is compared to the classical estimation and regularization. 3. The third experiment investigates the k-Nearest Neighbours algorithm (KNNs). This algorithm is trained using both Bayesian and classical methods. KNNs could be efficiently applied to estimating PD. In addition, other supervised machine learning algorithms such as Decision trees (DTs), Linear discriminant analysis (LDA) and Naive Bayes (NB) were applied and their performance summarized and compared to that of the SVMs, ANNs, KNNs and logistic regression. The contribution of this thesis to science is to provide efficient and at the same time applicable methods for estimating PD of corporate clients. This thesis contributes to the existing literature in a number of ways. 1. First, this research proposes an innovative feature selection method for SVMs. 2. Second, this research proposes an innovative Bayesian estimation methods to regularize ANNs. 3. Third, this research proposes an innovative Bayesian approaches to the estimation of KNNs. Nonetheless, the objective of the research is to promote the use of the Bayesian non-linear supervised machine learning methods that are currently not heavily applied in the industry for PD estimation of corporate clients

UCL Discovery

Ensembles of wrappers for automated feature selection in fish age classification

Author: Bermejo Sánchez Sergi
Publication venue: 'Elsevier BV'
Publication date: 01/01/2017
Field of study

In feature selection, the most important features must be chosen so as to decrease the number thereof while retaining their discriminatory information. Within this context, a novel feature selection method based on an ensemble of wrappers is proposed and applied for automatically select features in fish age classification. The effectiveness of this procedure using an Atlantic cod database has been tested for different powerful statistical learning classifiers. The subsets based on few features selected, e.g. otolith weight and fish weight, are particularly noticeable given current biological findings and practices in fishery research and the classification results obtained with them outperforms those of previous studies in which a manual feature selection was performed.Peer ReviewedPostprint (author's final draft

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC