2,931 research outputs found

    Layered genetic programming for feature extraction in classification problems

    Get PDF
    Dissertation presented as the partial requirement for obtaining a Master's degree in Data Science and Advanced AnalyticsGenetic programming has been proven to be a successful technique for feature extraction in various applications. In this thesis, we present a Layered Genetic Programming system which implements genetic programming-based feature extraction mechanism. The proposed system uses a layered structure where instead of evolving just one population of individuals, several populations are evolved sequentially. Each such population transforms the input data received from the previous population into a lower dimensional space with the aim of improving classification performance. The performance of the proposed system was experimentally tested on 5 real-world problems using different dimensionality reduction step sizes and different classifiers. The proposed method was able to outperform a simple classifier applied directly on the original data on two problems. On the remaining problems, the classifier performed better using the original data. The best solutions were often obtained in the first few layers which implied that increasing the size of the system, i.e. adding more layers was not useful. However, the layered structure allowed control of the size of individuals

    Machine Learning and Integrative Analysis of Biomedical Big Data.

    Get PDF
    Recent developments in high-throughput technologies have accelerated the accumulation of massive amounts of omics data from multiple sources: genome, epigenome, transcriptome, proteome, metabolome, etc. Traditionally, data from each source (e.g., genome) is analyzed in isolation using statistical and machine learning (ML) methods. Integrative analysis of multi-omics and clinical data is key to new biomedical discoveries and advancements in precision medicine. However, data integration poses new computational challenges as well as exacerbates the ones associated with single-omics studies. Specialized computational approaches are required to effectively and efficiently perform integrative analysis of biomedical data acquired from diverse modalities. In this review, we discuss state-of-the-art ML-based approaches for tackling five specific computational challenges associated with integrative analysis: curse of dimensionality, data heterogeneity, missing data, class imbalance and scalability issues

    Feature Selection for Classification with Artificial Bee Colony Programming

    Get PDF
    Feature selection and classification are the most applied machine learning processes. In the feature selection, it is aimed to find useful properties containing class information by eliminating noisy and unnecessary features in the data sets and facilitating the classifiers. Classification is used to distribute data among the various classes defined on the resulting feature set. In this chapter, artificial bee colony programming (ABCP) is proposed and applied to feature selection for classification problems on four different data sets. The best models are obtained by using the sensitivity fitness function defined according to the total number of classes in the data sets and are compared with the models obtained by genetic programming (GP). The results of the experiments show that the proposed technique is accurate and efficient when compared with GP in terms of critical features selection and classification accuracy on well-known benchmark problems

    Genetic algorithm-neural network: feature extraction for bioinformatics data.

    Get PDF
    With the advance of gene expression data in the bioinformatics field, the questions which frequently arise, for both computer and medical scientists, are which genes are significantly involved in discriminating cancer classes and which genes are significant with respect to a specific cancer pathology. Numerous computational analysis models have been developed to identify informative genes from the microarray data, however, the integrity of the reported genes is still uncertain. This is mainly due to the misconception of the objectives of microarray study. Furthermore, the application of various preprocessing techniques in the microarray data has jeopardised the quality of the microarray data. As a result, the integrity of the findings has been compromised by the improper use of techniques and the ill-conceived objectives of the study. This research proposes an innovative hybridised model based on genetic algorithms (GAs) and artificial neural networks (ANNs), to extract the highly differentially expressed genes for a specific cancer pathology. The proposed method can efficiently extract the informative genes from the original data set and this has reduced the gene variability errors incurred by the preprocessing techniques. The novelty of the research comes from two perspectives. Firstly, the research emphasises on extracting informative features from a high dimensional and highly complex data set, rather than to improve classification results. Secondly, the use of ANN to compute the fitness function of GA which is rare in the context of feature extraction. Two benchmark microarray data have been taken to research the prominent genes expressed in the tumour development and the results show that the genes respond to different stages of tumourigenesis (i.e. different fitness precision levels) which may be useful for early malignancy detection. The extraction ability of the proposed model is validated based on the expected results in the synthetic data sets. In addition, two bioassay data have been used to examine the efficiency of the proposed model to extract significant features from the large, imbalanced and multiple data representation bioassay data

    Hybrid Models Of Fuzzy Artmap And Qlearning For Pattern Classification

    Get PDF
    Pengelasan corak adalah salah satu isu utama dalam pelbagai tugas pencarian data. Dalam kajian ini, fokus penyelidikan tertumpu kepada reka bentuk dan pembinaan model hibrid yang menggabungkan rangkaian neural Teori Resonan Adaptif (ART) terselia dan model Pembelajaran Pengukuhan (RL) untuk pengelasan corak. Secara khususnya, rangkaian ARTMAP Kabur (FAM) dan Pembelajaran-Q dijadikan sebagai tulang belakang dalam merekabentuk dan membina model-model hibrid. Satu model QFAM baharu terlebih dahulu diperkenalkan bagi menambahbaik prestasi pengelasan rangkaian FAM. Strategi pruning dimasukkan bagi mengurangkan kekompleksan QFAM. Bagi mengatasi isu ketidak-telusan, Algoritma Genetik (GA) digunakan bagi mengekstrak hukum kabur if-then daripada QFAM. Model yang terhasil iaitu QFAM-GA, dapat memberi ramalan berserta dengan huraian dengan hanya menggunakan bilangan antisiden yang sedikit. Bagi menambahkan lagi kebolehtahanan model-model Q-FAM, penggunaan sistem agenpelbagai telah dicadangkan. Hasilnya, model gugusan QFAM berasaskan agen dengan ukuran percaya dan kaedah rundingan baharu telah dicadangkan. Pelbagai jenis masalah tanda-aras telah digunakan bagi penilaian model-model gugusan dan individu berasaskan QFAM. Hasilnya telah dianalisa dan dibandingkan dengan FAM serta model-model yang dilaporkan dalam kajian terdahulu. Sebagai tambahan, dua daripada masalah dunia-nyata digunakan bagi menunjukkan kebolehan praktikal model hibrid. Keputusan akhir menunjukkan keberkesanan modul berasaskan QFAM dalam menerajui tugas-tugas pengelasan corak. ________________________________________________________________________________________________________________________ Pattern classification is one of the primary issues in various data mining tasks. In this study, the main research focus is on the design and development of hybrid models, combining the supervised Adaptive Resonance Theory (ART) neural network and Reinforcement Learning (RL) models for pattern classification. Specifically, the Fuzzy ARTMAP (FAM) network and Q-learning are adopted as the backbone for designing and developing the hybrid models. A new QFAM model is first introduced to improve the classification performance of FAM network. A pruning strategy is incorporated to reduce the complexity of QFAM. To overcome the opaqueness issue, a Genetic Algorithm (GA) is used to extract fuzzy if-then rules from QFAM. The resulting model, i.e. QFAM-GA, is able to provide predictions with explanations using only a few antecedents. To further improve the robustness of QFAM-based models, the notion of multi agent systems is employed. As a result, an agent-based QFAM ensemble model with a new trust measurement and negotiation method is proposed. A variety of benchmark problems are used for evaluation of individual and ensemble QFAM-based models. The results are analyzed and compared with those from FAM as well as other models reported in the literature. In addition, two real-world problems are used to demonstrate the practicality of the hybrid models. The outcomes indicate the effectiveness of QFAM-based models in tackling pattern classification tasks

    Knowledge management overview of feature selection problem in high-dimensional financial data: Cooperative co-evolution and Map Reduce perspectives

    Get PDF
    The term big data characterizes the massive amounts of data generation by the advanced technologies in different domains using 4Vs volume, velocity, variety, and veracity-to indicate the amount of data that can only be processed via computationally intensive analysis, the speed of their creation, the different types of data, and their accuracy. High-dimensional financial data, such as time-series and space-Time data, contain a large number of features (variables) while having a small number of samples, which are used to measure various real-Time business situations for financial organizations. Such datasets are normally noisy, and complex correlations may exist between their features, and many domains, including financial, lack the al analytic tools to mine the data for knowledge discovery because of the high-dimensionality. Feature selection is an optimization problem to find a minimal subset of relevant features that maximizes the classification accuracy and reduces the computations. Traditional statistical-based feature selection approaches are not adequate to deal with the curse of dimensionality associated with big data. Cooperative co-evolution, a meta-heuristic algorithm and a divide-And-conquer approach, decomposes high-dimensional problems into smaller sub-problems. Further, MapReduce, a programming model, offers a ready-To-use distributed, scalable, and fault-Tolerant infrastructure for parallelizing the developed algorithm. This article presents a knowledge management overview of evolutionary feature selection approaches, state-of-The-Art cooperative co-evolution and MapReduce-based feature selection techniques, and future research directions
    corecore