260 research outputs found

    Feature subset selection in large dimensionality domains

    Get PDF
    Searching for an optimal feature subset from a high dimensional feature space is known to be an NP-complete problem. We present a hybrid algorithm, SAGA, for this task. SAGA combines the ability to avoid being trapped in a local minimum of Simulated Annealing with the very high rate of convergence of the crossover operator of Genetic Algorithms, the strong local search ability of greedy algorithms and the high computational efficiency of Generalized Regression Neural Networks. We compare the performance over time of SAGA and well-known algorithms on synthetic and real datasets. The results show that SAGA outperforms existing algorithms

    Identification of disease-causing genes using microarray data mining and gene ontology

    Get PDF
    Background: One of the best and most accurate methods for identifying disease-causing genes is monitoring gene expression values in different samples using microarray technology. One of the shortcomings of microarray data is that they provide a small quantity of samples with respect to the number of genes. This problem reduces the classification accuracy of the methods, so gene selection is essential to improve the predictive accuracy and to identify potential marker genes for a disease. Among numerous existing methods for gene selection, support vector machine-based recursive feature elimination (SVMRFE) has become one of the leading methods, but its performance can be reduced because of the small sample size, noisy data and the fact that the method does not remove redundant genes. Methods: We propose a novel framework for gene selection which uses the advantageous features of conventional methods and addresses their weaknesses. In fact, we have combined the Fisher method and SVMRFE to utilize the advantages of a filtering method as well as an embedded method. Furthermore, we have added a redundancy reduction stage to address the weakness of the Fisher method and SVMRFE. In addition to gene expression values, the proposed method uses Gene Ontology which is a reliable source of information on genes. The use of Gene Ontology can compensate, in part, for the limitations of microarrays, such as having a small number of samples and erroneous measurement results. Results: The proposed method has been applied to colon, Diffuse Large B-Cell Lymphoma (DLBCL) and prostate cancer datasets. The empirical results show that our method has improved classification performance in terms of accuracy, sensitivity and specificity. In addition, the study of the molecular function of selected genes strengthened the hypothesis that these genes are involved in the process of cancer growth. Conclusions: The proposed method addresses the weakness of conventional methods by adding a redundancy reduction stage and utilizing Gene Ontology information. It predicts marker genes for colon, DLBCL and prostate cancer with a high accuracy. The predictions made in this study can serve as a list of candidates for subsequent wet-lab verification and might help in the search for a cure for cancers

    A Review of Feature Selection and Classification Approaches for Heart Disease Prediction

    Get PDF
    Cardiovascular disease has been the number one illness to cause death in the world for years. As information technology develops, many researchers have conducted studies on a computer-assisted diagnosis for heart disease. Predicting heart disease using a computer-assisted system can reduce time and costs. Feature selection can be used to choose the most relevant variables for heart disease. It includes filter, wrapper, embedded, and hybrid. The filter method excels in computation speed. The wrapper and embedded methods consider feature dependencies and interact with classifiers. The hybrid method takes advantage of several methods. Classification is a data mining technique to predict heart disease. It includes traditional machine learning, ensemble learning, hybrid, and deep learning. Traditional machine learning uses a specific algorithm. The ensemble learning combines the predictions of multiple classifiers to improve the performance of a single classifier. The hybrid approach combines some techniques and takes advantage of each method. Deep learning does not require a predetermined feature engineering. This research provides an overview of feature selection and classification methods for the prediction of heart disease in the last ten years. Thus, it can be used as a reference in choosing a method for heart disease prediction for future research

    Improving the Anomaly Detection by Combining PSO Search Methods and J48 Algorithm

    Get PDF
    The feature selection techniques are used to find the most important and relevant features in a dataset. Therefore, in this study feature selection technique was used to improve the performance of Anomaly Detection. Many feature selection techniques have been developed and implemented on the NSL-KDD dataset. However, with the rapid growth of traffic on a network where more applications, devices, and protocols participate, the traffic data is complex and heterogeneous contribute to security issues. This makes the NSL-KDD dataset no longer reliable for it. The detection model must also be able to recognize the type of novel attack on complex network datasets. So, a robust analysis technique for a more complex and larger dataset is required, to overcome the increase of security issues in a big data network. This study proposes particle swarm optimization (PSO) Search methods as a feature selection method. As contribute to feature analysis knowledge, In the experiment a combination of particle swarm optimization (PSO) Search methods with other search methods are examined. To overcome the limitation NSL-KDD dataset, in the experiments the CICIDS2017 dataset used. To validate the selected features from the proposed technique J48 classification algorithm used in this study. The detection performance of the combination PSO Search method with J48 examined and compare with other feature selection and previous study. The proposed technique successfully finds the important features of the dataset, which improve detection performance with 99.89% accuracy. Compared with the previous study the proposed technique has better accuracy, TPR, and FPR

    Selection of contributing factors for predicting landslide susceptibility using machine learning and deep learning models

    Full text link
    Landslides are a common natural disaster that can cause casualties, property safety threats and economic losses. Therefore, it is important to understand or predict the probability of landslide occurrence at potentially risky sites. A commonly used means is to carry out a landslide susceptibility assessment based on a landslide inventory and a set of landslide contributing factors. This can be readily achieved using machine learning (ML) models such as logistic regression (LR), support vector machine (SVM), random forest (RF), extreme gradient boosting (Xgboost), or deep learning (DL) models such as convolutional neural network (CNN) and long short time memory (LSTM). As the input data for these models, landslide contributing factors have varying influences on landslide occurrence. Therefore, it is logically feasible to select more important contributing factors and eliminate less relevant ones, with the aim of increasing the prediction accuracy of these models. However, selecting more important factors is still a challenging task and there is no generally accepted method. Furthermore, the effects of factor selection using various methods on the prediction accuracy of ML and DL models are unclear. In this study, the impact of the selection of contributing factors on the accuracy of landslide susceptibility predictions using ML and DL models was investigated. Four methods for selecting contributing factors were considered for all the aforementioned ML and DL models, which included Information Gain Ratio (IGR), Recursive Feature Elimination (RFE), Particle Swarm Optimization (PSO), Least Absolute Shrinkage and Selection Operators (LASSO) and Harris Hawk Optimization (HHO). In addition, autoencoder-based factor selection methods for DL models were also investigated. To assess their performances, an exhaustive approach was adopted,...Comment: Stochastic Environmental Research and Risk Assessmen

    Performance Analysis of a new Filter and Wrapper Sequence for the Survivability Prediction of Breast Cancer Patients

    Get PDF
    Feature selection is an essential preprocessing step for removing redundant or irrelevant features from multidimensional data to improve predictive performance. Currently, medical clinical datasets are increasingly large and multidimensional and not every feature helps in the necessary predictions. So, feature selection techniques are used to determine relevant feature set that can improve the performance of a learning algorithm. This study presents a performance analysis of a new filter and wrapper sequence involving the intersection of filter methods, Mutual Information and Chi-Square followed by one of the wrapper methods: Sequential Forward Selection and Sequential Backward Selection to obtain a more informative feature set for improved prediction of the survivability of breast cancer patients from the clinical breast cancer dataset, SEER. The improvement in performance due to this filter and wrapper sequence in terms of Accuracy, False Positive Rate, False Negative Rate and Area under the Receiver Operating Characteristics curve is tested using the Machine learning algorithms: Logistic Regression, K-Nearest Neighbour, Decision Tree, Random Forest, Support Vector Machine and Multilayer Perceptron. The performance analysis supports the Sequential Backward Selection of the new filter and wrapper sequence over Sequential Forward Selection for the SEER dataset

    PRZEGLĄD METOD SELEKCJI CECH UŻYWANYCH W DIAGNOSTYCE CZERNIAKA

    Get PDF
    Currently, a large number of trait selection methods are used. They are becoming more and more of interest among researchers. Some of the methods are of course used more frequently. The article describes the basics of selection-based algorithms. FS methods fall into three categories: filter wrappers, embedded methods. Particular attention was paid to finding examples of applications of the described methods in the diagnosisof skin melanoma.Obecnie stosuje się wiele metod selekcji cech. Cieszą się coraz większym zainteresowaniem badaczy. Oczywiście niektóre metody są stosowane częściej. W artykule zostały opisane podstawy działania algorytmów opartych na selekcji. Metody selekcji cech należące dzielą się na trzy kategorie: metody filtrowe, metody opakowujące, metody wbudowane. Zwrócono szczególnie uwagę na znalezienie przykładów zastosowań opisanych metod w diagnostyce czerniaka skóry
    corecore