1,268 research outputs found

    Aco-based feature selection algorithm for classification

    Get PDF
    Dataset with a small number of records but big number of attributes represents a phenomenon called “curse of dimensionality”. The classification of this type of dataset requires Feature Selection (FS) methods for the extraction of useful information. The modified graph clustering ant colony optimisation (MGCACO) algorithm is an effective FS method that was developed based on grouping the highly correlated features. However, the MGCACO algorithm has three main drawbacks in producing a features subset because of its clustering method, parameter sensitivity, and the final subset determination. An enhanced graph clustering ant colony optimisation (EGCACO) algorithm is proposed to solve the three (3) MGCACO algorithm problems. The proposed improvement includes: (i) an ACO feature clustering method to obtain clusters of highly correlated features; (ii) an adaptive selection technique for subset construction from the clusters of features; and (iii) a genetic-based method for producing the final subset of features. The ACO feature clustering method utilises the ability of various mechanisms such as intensification and diversification for local and global optimisation to provide highly correlated features. The adaptive technique for ant selection enables the parameter to adaptively change based on the feedback of the search space. The genetic method determines the final subset, automatically, based on the crossover and subset quality calculation. The performance of the proposed algorithm was evaluated on 18 benchmark datasets from the University California Irvine (UCI) repository and nine (9) deoxyribonucleic acid (DNA) microarray datasets against 15 benchmark metaheuristic algorithms. The experimental results of the EGCACO algorithm on the UCI dataset are superior to other benchmark optimisation algorithms in terms of the number of selected features for 16 out of the 18 UCI datasets (88.89%) and the best in eight (8) (44.47%) of the datasets for classification accuracy. Further, experiments on the nine (9) DNA microarray datasets showed that the EGCACO algorithm is superior than the benchmark algorithms in terms of classification accuracy (first rank) for seven (7) datasets (77.78%) and demonstrates the lowest number of selected features in six (6) datasets (66.67%). The proposed EGCACO algorithm can be utilised for FS in DNA microarray classification tasks that involve large dataset size in various application domains

    Evolving machine learning and deep learning models using evolutionary algorithms

    Get PDF
    Despite the great success in data mining, machine learning and deep learning models are yet subject to material obstacles when tackling real-life challenges, such as feature selection, initialization sensitivity, as well as hyperparameter optimization. The prevalence of these obstacles has severely constrained conventional machine learning and deep learning methods from fulfilling their potentials. In this research, three evolving machine learning and one evolving deep learning models are proposed to eliminate above bottlenecks, i.e. improving model initialization, enhancing feature representation, as well as optimizing model configuration, respectively, through hybridization between the advanced evolutionary algorithms and the conventional ML and DL methods. Specifically, two Firefly Algorithm based evolutionary clustering models are proposed to optimize cluster centroids in K-means and overcome initialization sensitivity as well as local stagnation. Secondly, a Particle Swarm Optimization based evolving feature selection model is developed for automatic identification of the most effective feature subset and reduction of feature dimensionality for tackling classification problems. Lastly, a Grey Wolf Optimizer based evolving Convolutional Neural Network-Long Short-Term Memory method is devised for automatic generation of the optimal topological and learning configurations for Convolutional Neural Network-Long Short-Term Memory networks to undertake multivariate time series prediction problems. Moreover, a variety of tailored search strategies are proposed to eliminate the intrinsic limitations embedded in the search mechanisms of the three employed evolutionary algorithms, i.e. the dictation of the global best signal in Particle Swarm Optimization, the constraint of the diagonal movement in Firefly Algorithm, as well as the acute contraction of search territory in Grey Wolf Optimizer, respectively. The remedy strategies include the diversification of guiding signals, the adaptive nonlinear search parameters, the hybrid position updating mechanisms, as well as the enhancement of population leaders. As such, the enhanced Particle Swarm Optimization, Firefly Algorithm, and Grey Wolf Optimizer variants are more likely to attain global optimality on complex search landscapes embedded in data mining problems, owing to the elevated search diversity as well as the achievement of advanced trade-offs between exploration and exploitation

    Network models in neuroimaging: a survey of multimodal applications

    Get PDF
    Mapping the brain structure and function is one of the hardest problems in science. Different image modalities, in particular the ones based on magnetic resonance imaging (MRI) can shed more light on how it is organised and how its functions unfold, but a theoretical framework is needed. In the last years, using network models and graph theory to represent the brain structure and function has become a major trend in neuroscience. In this review, we outline how network modelling has been used in neuroimaging, clarifying what are the underlying mathematical concepts and the consequent methodological choices. The major findings are then presented for structural, functional and multimodal applications. We conclude outlining what are still the current issues and the perspective for the immediate future

    Synchronization Inspired Data Mining

    Get PDF
    Advances of modern technologies produce huge amounts of data in various fields, increasing the need for efficient and effective data mining tools to uncover the information contained implicitly in the data. This thesis mainly aims to propose innovative and solid algorithms for data mining from a novel perspective: synchronization. Synchronization is a prevalent phenomenon in nature that a group of events spontaneously come into co-occurrence with a common rhythm through mutual interactions. The mechanism of synchronization allows controlling of complex processes by simple operations based on interactions between objects. The first main part of this thesis focuses on developing the innovative algorithms for data mining. Inspired by the concept of synchronization, this thesis presents Sync (Clustering by Synchronization), a novel approach to clustering. In combination with the Minimum Description Length principle (MDL), it allows discovering the intrinsic clusters without any data distribution assumptions and parameters setting. In addition, relying on the dierent dynamic behaviors of objects during the process towards synchronization,the algorithm SOD (Synchronization-based Outlier Detection) is further proposed. The outlier objects can be naturally flagged by the denition of Local Synchronization Factor (LSF). To cure the curse of dimensionality in clustering,a subspace clustering algorithm ORSC is introduced which automatically detects clusters in subspaces of the original feature space. This approach proposes a weighted local interaction model to ensure all objects in a common cluster, which accommodate in arbitrarily oriented subspace, naturally move together. In order to reveal the underlying patterns in graphs, a graph partitioning approach RSGC (Robust Synchronization-based Graph Clustering) is presented. The key philosophy of RSGC is to consider graph clustering as a dynamic process towards synchronization. Inherited from the powerful concept of synchronization, RSGC shows several desirable properties that don't exist in other competitive methods. For all presented algorithms, their efficiency and eectiveness are thoroughly analyzed. The benets over traditional approaches are further demonstrated by evaluating them on synthetic as well as real-world data sets. Not only the theory research on novel data mining algorithms, the second main part of the thesis focuses on brain network analysis based on Diusion Tensor Images (DTI). A new framework for automated white matter tracts clustering is rst proposed to identify the meaningful ber bundles in the Human Brain by combining ideas from time series mining with density-based clustering. Subsequently, the enhancement and variation of this approach is discussed allowing for a more robust, efficient, or eective way to find hierarchies of ber bundles. Based on the structural connectivity network, an automated prediction framework is proposed to analyze and understand the abnormal patterns in patients of Alzheimer's Disease

    Proceedings of the 18th Irish Conference on Artificial Intelligence and Cognitive Science

    Get PDF
    These proceedings contain the papers that were accepted for publication at AICS-2007, the 18th Annual Conference on Artificial Intelligence and Cognitive Science, which was held in the Technological University Dublin; Dublin, Ireland; on the 29th to the 31st August 2007. AICS is the annual conference of the Artificial Intelligence Association of Ireland (AIAI)

    Insights on Reticulate Evolution in Ferns, with Special Emphasis on the Genus Ceratopteris

    Get PDF
    The history of life is often viewed as a evenly branching tree; however, in reality it is more like a tangled hedgerow. Many groups of organisms are known to have such a net-like or reticulate evolutionary history, but it is particularly common in ferns and lycophytes (also known as pteridophytes). This dissertation investigates how net-like evolution affects different groups of ferns, with a special emphasis on the model species C-fern (Ceratopteris richardii, also called the antler or water sprite fern). Genomic data are utilized to under-stand hybridization, cryptic species and reticulate evolution in two groups of ferns. The C-fern is shown to be a potential hybrid species, which has important implications for future research using this model organism

    Approche de prédiction par télésurveillance à base de Data Mining

    Get PDF
    Following the technological evolution, in particular the mobile approach, scientific research has been oriented towards the exploitation of these advances for remote predictive decision support. A major interest of researchers has had a great impact in the medical field because of its very positive influence for the care of the patient aimed at its assistance and the reduction of cases of death due to follow-up and the problem of time of treatment. emergency action. This is how telemedicine has become an issue of great importance, it is based on the manipulation and analysis of a large volume of medical data. The aim of this thesis is firstly to exploit a new approach to data analysis, namely Symbiotic Organisms Search (SOS) for Data Mining for data classification, and secondly, to propose improvements to this metaheuristic. This improvement relies on the integration of speed in SOS as a new parameter to explore the search space efficiently and avoiding premature convergence. We also develop a conceptual and practical architecture for applied telemedicine for decision support for the knowledge of the type of breast cancer (benign or malignant). This study allowed us to achieve excellent results and findings in terms of data classification
    corecore