603 research outputs found

    Improved Texture Feature Extraction and Selection Methods for Image Classification Applications

    Get PDF
    Classification is an important process in image processing applications, and image texture is the preferable source of information in images classification, especially in the context of real-world applications. However, the output of a typical texture feature descriptor often does not represent a wide range of different texture characteristics. Many research studies have contributed different descriptors to improve the extraction of features from texture. Among the various descriptors, the Local Binary Patterns (LBP) descriptor produces powerful information from texture by simple comparison between a central pixel and its neighbour pixels. In addition, to obtain sufficient information from texture, many research studies have proposed solutions based on combining complementary features together. Although feature-level fusion produces satisfactory results for certain applications, it suffers from an inherent and well-known problem called “the curse of dimensionality’’. Feature selection deals with this problem effectively by reducing the feature dimensions and selecting only the relevant features. However, large feature spaces often make the process of seeking optimum features complicated. This research introduces improved feature extraction methods by adopting a new approach based on new texture descriptors called Local Zone Binary Patterns (LZBP) and Local Multiple Patterns (LMP), which are both based on the LBP descriptor. The produced feature descriptors are combined with other complementary features to yield a unified vector. Furthermore, the combined features are processed by a new hybrid selection approach based on the Artificial Bee Colony and Neighbourhood Rough Set (ABC-NRS) to efficiently reduce the dimensionality of the resulting features from the feature fusion stage. Comprehensive experimental testing and evaluation is carried out for different components of the proposed approach, and the novelty and limitation of the proposed approach have been demonstrated. The results of the evaluation prove the ability of the LZBP and LMP texture descriptors in improving feature extraction compared to the conventional LBP descriptor. In addition, the use of the hybrid ABC-NRS selection method on the proposed combined features is shown to improve the classification performance while achieving the shortest feature length. The overall proposed approach is demonstrated to provide improved texture-based image classification performance compared to previous methods using benchmarks based on outdoor scene images. These research contributions thus represent significant advances in the field of texture-based image classification

    Heuristic-based feature selection for rough set approach

    Get PDF
    The paper presents the proposed research methodology, dedicated to the application of greedy heuristics as a way of gathering information about available features. Discovered knowledge, represented in the form of generated decision rules, was employed to support feature selection and reduction process for induction of decision rules with classical rough set approach. Observations were executed over input data sets discretised by several methods. Experimental results show that elimination of less relevant attributes through the proposed methodology led to inferring rule sets with reduced cardinalities, while maintaining rule quality necessary for satisfactory classification

    Gene selection and classification in autism gene expression data

    Get PDF
    Autism spectrum disorders (ASD) are neurodevelopmental disorders that are currently diagnosed on the basis of abnormal stereotyped behaviour as well as observable deficits in communication and social functioning. Although a variety of candidate genes have been attributed to the disorder, no single gene is applicable to more than 1–2% of the general ASD population. Despite extensive efforts, definitive genes that contribute to autism susceptibility have yet to be identified. The major problems in dealing with the gene expression dataset of autism include the presence of limited number of samples and large noises due to errors of experimental measurements and natural variation. In this study, a systematic combination of three important filters, namely t-test (TT), Wilcoxon Rank Sum (WRS) and Feature Correlation (COR) are applied along with efficient wrapper algorithm based on geometric binary particle swarm optimization-support vector machine (GBPSO-SVM), aiming at selecting and classifying the most attributed genes of autism. A new approach based on the criterion of median ratio, mean ratio and variance deviations is also applied to reduce the initial dataset prior to its involvement. Results showed that the most discriminative genes that were identified in the first and last selection steps concluded the presence of a repetitive gene (CAPS2), which was assigned as the most ASD risk gene. The fused result of genes subset that were selected by the GBPSO-SVM algorithm increased the classification accuracy to about 92.10%, which is higher than those reported in literature for the same autism dataset. Noticeably, the application of ensemble using random forest (RF) showed better performance compared to that of previous studies. However, the ensemble approach based on the employment of SVM as an integrator of the fused genes from the output branches of GBPSO-SVM outperformed the RF integrator. The overall improvement was ascribed to the selection strategies that were taken to reduce the dataset and the utilization of efficient wrapper based GBPSO-SVM algorithm

    From fuzzy-rough to crisp feature selection

    Get PDF
    A central problem in machine learning and pattern recognition is the process of recognizing the most important features in a dataset. This process plays a decisive role in big data processing by reducing the size of datasets. One major drawback of existing feature selection methods is the high chance of redundant features appearing in the final subset, where in most cases, finding and removing them can greatly improve the resulting classification accuracy. To tackle this problem on two different fronts, we employed fuzzy-rough sets and perturbation theories. On one side, we used three strategies to improve the performance of fuzzy-rough set-based feature selection methods. The first strategy was to code both features and samples in one binary vector and use a shuffled frog leaping algorithm to choose the best combination using fuzzy dependency degree as the fitness function. In the second strategy, we designed a measure to evaluate features based on fuzzy-rough dependency degree in a fashion where redundant features are given less priority to be selected. In the last strategy, we designed a new binary version of the shuffled frog leaping algorithm that employs a fuzzy positive region as its similarity measure to work in complete harmony with the fitness function (i.e. fuzzy-rough dependency degree). To extend the applicability of fuzzy-rough set-based feature selection to multi-party medical datasets, we designed a privacy-preserving version of the original method. In addition, we studied the feasibility and applicability of perturbation theory to feature selection, which to the best of our knowledge has never been researched. We introduced a new feature selection based on perturbation theory that is not only capable of detecting and discarding redundant features but also is very fast and flexible in accommodating the special needs of the application. It employs a clustering algorithm to group likely-behaved features based on the sensitivity of each feature to perturbation, the angle of each feature to the outcome and the effect of removing each feature to the outcome, and it chooses the closest feature to the centre of each cluster and returns all those features as the final subset. To assess the effectiveness of the proposed methods, we compared the results of each method with well-known feature selection methods against a series of artificially generated datasets, and biological, medical and cancer datasets adopted from the University of California Irvine machine learning repository, Arizona State University repository and Gene Expression Omnibus repository

    Condition Monitoring of Wind Turbines Using Intelligent Machine Learning Techniques

    Get PDF
    Wind Turbine condition monitoring can detect anomalies in turbine performance which have the potential to result in unexpected failure and financial loss. This study examines common Supervisory Control And Data Acquisition (SCADA) data over a period of 20 months for 21 pitch regulated 2.3 MW turbines and is presented in three manuscripts. First, power curve monitoring is targeted applying various types of Artificial Neural Networks to increase modeling accuracy. It is shown how the proposed method can significantly improve network reliability compared with existing models. Then, an advance technique is utilized to create a smoother dataset for network training followed by establishing dynamic ANFIS network. At this stage, designed network aims to predict power generation in future hours. Finally, a recursive principal component analysis is performed to extract significant features to be used as input parameters of the network. A novel fusion technique is then employed to build an advanced model to make predictions of turbines performance with favorably low errors

    Improving Energy Efficiency through Data-Driven Modeling, Simulation and Optimization

    Get PDF
    In October 2014, the EU leaders agreed upon three key targets for the year 2030: a reduction by at least 40% in greenhouse gas emissions, savings of at least 27% for renewable energy, and improvements by at least 27% in energy efficiency. The increase in computational power combined with advanced modeling and simulation tools makes it possible to derive new technological solutions that can enhance the energy efficiency of systems and that can reduce the ecological footprint. This book compiles 10 novel research works from a Special Issue that was focused on data-driven approaches, machine learning, or artificial intelligence for the modeling, simulation, and optimization of energy systems
    corecore