186 research outputs found

    A Review of Feature Selection and Classification Approaches for Heart Disease Prediction

    Get PDF
    Cardiovascular disease has been the number one illness to cause death in the world for years. As information technology develops, many researchers have conducted studies on a computer-assisted diagnosis for heart disease. Predicting heart disease using a computer-assisted system can reduce time and costs. Feature selection can be used to choose the most relevant variables for heart disease. It includes filter, wrapper, embedded, and hybrid. The filter method excels in computation speed. The wrapper and embedded methods consider feature dependencies and interact with classifiers. The hybrid method takes advantage of several methods. Classification is a data mining technique to predict heart disease. It includes traditional machine learning, ensemble learning, hybrid, and deep learning. Traditional machine learning uses a specific algorithm. The ensemble learning combines the predictions of multiple classifiers to improve the performance of a single classifier. The hybrid approach combines some techniques and takes advantage of each method. Deep learning does not require a predetermined feature engineering. This research provides an overview of feature selection and classification methods for the prediction of heart disease in the last ten years. Thus, it can be used as a reference in choosing a method for heart disease prediction for future research

    An improved wrapper-based feature selection method for machinery fault diagnosis

    Get PDF
    A major issue of machinery fault diagnosis using vibration signals is that it is over-reliant on personnel knowledge and experience in interpreting the signal. Thus, machine learning has been adapted for machinery fault diagnosis. The quantity and quality of the input features, however, influence the fault classification performance. Feature selection plays a vital role in selecting the most representative feature subset for the machine learning algorithm. In contrast, the trade-off relationship between capability when selecting the best feature subset and computational effort is inevitable in the wrapper-based feature selection (WFS) method. This paper proposes an improved WFS technique before integration with a support vector machine (SVM) model classifier as a complete fault diagnosis system for a rolling element bearing case study. The bearing vibration dataset made available by the Case Western Reserve University Bearing Data Centre was executed using the proposed WFS and its performance has been analysed and discussed. The results reveal that the proposed WFS secures the best feature subset with a lower computational effort by eliminating the redundancy of re-evaluation. The proposed WFS has therefore been found to be capable and efficient to carry out feature selection tasks

    Ensemble classification and signal image processing for genus Gyrodactylus (Monogenea)

    Get PDF
    This thesis presents an investigation into Gyrodactylus species recognition, making use of machine learning classification and feature selection techniques, and explores image feature extraction to demonstrate proof of concept for an envisaged rapid, consistent and secure initial identification of pathogens by field workers and non-expert users. The design of the proposed cognitively inspired framework is able to provide confident discrimination recognition from its non-pathogenic congeners, which is sought in order to assist diagnostics during periods of a suspected outbreak. Accurate identification of pathogens is a key to their control in an aquaculture context and the monogenean worm genus Gyrodactylus provides an ideal test-bed for the selected techniques. In the proposed algorithm, the concept of classification using a single model is extended to include more than one model. In classifying multiple species of Gyrodactylus, experiments using 557 specimens of nine different species, two classifiers and three feature sets were performed. To combine these models, an ensemble based majority voting approach has been adopted. Experimental results with a database of Gyrodactylus species show the superior performance of the ensemble system. Comparison with single classification approaches indicates that the proposed framework produces a marked improvement in classification performance. The second contribution of this thesis is the exploration of image processing techniques. Active Shape Model (ASM) and Complex Network methods are applied to images of the attachment hooks of several species of Gyrodactylus to classify each species according to their true species type. ASM is used to provide landmark points to segment the contour of the image, while the Complex Network model is used to extract the information from the contour of an image. The current system aims to confidently classify species, which is notifiable pathogen of Atlantic salmon, to their true class with high degree of accuracy. Finally, some concluding remarks are made along with proposal for future work

    Hybrid filter-wrapper approaches for feature selection

    Get PDF
    Durant les darreres dècades, molts sectors empresarials han adoptat les tecnologies digitals, emmagatzemant tota la informació que generen en bases de dades. A més, amb l'auge de l'aprenentatge automàtic i la ciència de les dades, s'ha tornat econòmicament rendible utilitzar aquestes dades per resoldre problemes del món real. No obstant això, a mesura que els conjunts de dades creixen en mida, cada vegada és més difícil determinar exactament quines variables són valuoses per resoldre un problema específic. Aquest projecte estudia el problema de la selecció de variables, que intenta seleccionar el subconjunt de variables rellevants per a una determinada tasca predictiva. En particular, ens centrarem en els algoritmes híbrids que combinen mètodes filtre i embolcall. Aquesta és una àrea d'estudi relativament nova, que ha obtingut bons resultats en conjunts de dades amb grans dimensions perquè ofereixen un bon compromís entre velocitat i precisió. El projecte començarà explicant diversos mètodes filtre i embolcall i seguidament ensenyarà com diversos autors els han combinat per obtenir nous algoritmes híbrids. També introduirem un nou algoritme al qual anomenarem BWRR, que utilitza el popular filtre ReliefF per guiar una cerca cap enrere. La principal novetat que proposem és recomputar ReliefF en certs punts per guiar millor la cerca. Addicionalment, introduirem diverses variacions de l'algoritme. També hem realitzat una extensa experimentació per a provar el nou algoritme. Primerament, hem treballat amb conjunts de dades sintètiques per esbrinar quins factors afectaven el rendiment. Seguidament, l'hem comparat amb l'estat de l'art en diversos conjunts de dades reals.Over the last couple of decades, more business sectors than ever have embraced digital technologies, storing all the information they generate in databases. Moreover, with the rise of machine learning and data science, it has become economically profitable to use this data to solve real-world problems. However, as datasets grow larger, it has become increasingly difficult to determine exactly which variables are valuable to solve a given problem. This project studies the problem of feature selection, which tries to select a subset of relevant variables for a specific prediction task from the complete set of attributes. In particular, we have mostly focused on hybrid filter-wrapper algorithms, a relatively new branch of study, that has seen great success in high-dimensional datasets because they offer a good trade-off between speed and accuracy. The project starts by explaining several important filter and wrapper methods and moves on to illustrate how several authors have combined them to form new hybrid algorithms. Moreover, we also introduce a new algorithm called BWRR, which uses the popular ReliefF filter to guide a backward wrapper search. The key novelty we propose is to recompute the ReliefF rankings at several points to better guide the search. In addition, we also introduce several variations of this algorithm. We have also performed extensive experimentation to test this algorithm. In the first phase, we experimented with synthetic datasets to see which factors affected the performance. After that, we compared the new algorithm against the state-of-the-art in real-world datasets

    An Efficient High-Dimensional Gene Selection Approach based on Binary Horse Herd Optimization Algorithm for Biological Data Classification

    Full text link
    The Horse Herd Optimization Algorithm (HOA) is a new meta-heuristic algorithm based on the behaviors of horses at different ages. The HOA was introduced recently to solve complex and high-dimensional problems. This paper proposes a binary version of the Horse Herd Optimization Algorithm (BHOA) in order to solve discrete problems and select prominent feature subsets. Moreover, this study provides a novel hybrid feature selection framework based on the BHOA and a minimum Redundancy Maximum Relevance (MRMR) filter method. This hybrid feature selection, which is more computationally efficient, produces a beneficial subset of relevant and informative features. Since feature selection is a binary problem, we have applied a new Transfer Function (TF), called X-shape TF, which transforms continuous problems into binary search spaces. Furthermore, the Support Vector Machine (SVM) is utilized to examine the efficiency of the proposed method on ten microarray datasets, namely Lymphoma, Prostate, Brain-1, DLBCL, SRBCT, Leukemia, Ovarian, Colon, Lung, and MLL. In comparison to other state-of-the-art, such as the Gray Wolf (GW), Particle Swarm Optimization (PSO), and Genetic Algorithm (GA), the proposed hybrid method (MRMR-BHOA) demonstrates superior performance in terms of accuracy and minimum selected features. Also, experimental results prove that the X-Shaped BHOA approach outperforms others methods

    Self-adaptive parameter and strategy based particle swarm optimization for large-scale feature selection problems with multiple classifiers

    Get PDF
    This work was partially supported by the National Natural Science Foundation of China (61403206, 61876089,61876185), the Natural Science Foundation of Jiangsu Province (BK20141005), the Natural Science Foundation of the Jiangsu Higher Education Institutions of China (14KJB520025), the Engineering Research Center of Digital Forensics, Ministry of Education, and the Priority Academic Program Development of Jiangsu Higher Education Institutions.Peer reviewedPostprin

    A New Feature Selection Method Based on Class Association Rule

    Full text link
    Feature selection is a key process for supervised learning algorithms. It involves discarding irrelevant attributes from the training dataset from which the models are derived. One of the vital feature selection approaches is Filtering, which often uses mathematical models to compute the relevance for each feature in the training dataset and then sorts the features into descending order based on their computed scores. However, most Filtering methods face several challenges including, but not limited to, merely considering feature-class correlation when defining a feature’s relevance; additionally, not recommending which subset of features to retain. Leaving this decision to the end-user may be impractical for multiple reasons such as the experience required in the application domain, care, accuracy, and time. In this research, we propose a new hybrid Filtering method called Class Association Rule Filter (CARF) that deals with the aforementioned issues by identifying relevant features through the Class Association Rule Mining approach and then using these rules to define weights for the available features in the training dataset. More crucially, we propose a new procedure based on mutual information within the CARF method which suggests the subset of features to be retained by the end-user, hence reducing time and effort. Empirical evaluation using small, medium, and large datasets that belong to various dissimilar domains reveals that CARF was able to reduce the dimensionality of the search space when contrasted with other common Filtering methods. More importantly, the classification models devised by the different machine learning algorithms against the subsets of features selected by CARF were highly competitive in terms of various performance measures. These results indeed reflect the quality of the subsets of features selected by CARF and show the impact of the new cut-off procedure proposed

    GAdaboost: Accelerating adaboost feature selection with genetic algorithms

    Get PDF
    Throughout recent years Machine Learning has acquired attention, due to the abundant data. Thus, devising techniques to reduce the dimensionality of data has been on going. Object detection is one of the Machine Learning techniques which suffer from this draw back. As an example, one of the most famous object detection frameworks is the Viola-Jones Rapid Object Detector, which suffers from a lengthy training process due to the vast search space, which can reach more than 160,000 features for a 24X24 image. The Viola-Jones Rapid Object Detector also uses Adaboost, which is a brute force method, and is required to pass by the set of all possible features in order to train the classifiers. Consequently, ways for reducing the whole feature set into a smaller representative one, eliminating those features that have non relevant information, were devised. The most commonly used technique for this is Feature Selection with its three categories: Filters, Wrappers and Embedded. Feature Selection has proven its success in providing fast and accurate classifiers. Wrapper methods harvest the power of evolutionary computing, most commonly Genetic Algorithms, in finding the set of representative features. This is mostly due to the Advantage of Genetic Algorithms and their power in finding adequate solutions more efficiently. In this thesis we propose GAdaboost: A Genetic Algorithm to accelerate the training procedure of the Viola-Jones Rapid Object Detector through Feature Selection. Specifically, we propose to limit the Adaboost search within a sub-set of the huge feature space, while evolving this subset following a Genetic Algorithm. Experiments demonstrate that our proposed GAdaboost is up to 3.7 times faster than Adaboost. We also demonstrate that the price of this speedup is a mere decrease (3%, 4%) in detection accuracy when tested on FDDB benchmark face detection set, and Caltech Web Faces respectivel
    corecore