2,931 research outputs found
Layered genetic programming for feature extraction in classification problems
Dissertation presented as the partial requirement for obtaining a Master's degree in Data Science and Advanced AnalyticsGenetic programming has been proven to be a successful technique for feature extraction in various
applications. In this thesis, we present a Layered Genetic Programming system which implements
genetic programming-based feature extraction mechanism. The proposed system uses a layered
structure where instead of evolving just one population of individuals, several populations are evolved
sequentially. Each such population transforms the input data received from the previous population
into a lower dimensional space with the aim of improving classification performance.
The performance of the proposed system was experimentally tested on 5 real-world problems using
different dimensionality reduction step sizes and different classifiers. The proposed method was able
to outperform a simple classifier applied directly on the original data on two problems. On the
remaining problems, the classifier performed better using the original data. The best solutions were
often obtained in the first few layers which implied that increasing the size of the system, i.e. adding
more layers was not useful. However, the layered structure allowed control of the size of individuals
Machine Learning and Integrative Analysis of Biomedical Big Data.
Recent developments in high-throughput technologies have accelerated the accumulation of massive amounts of omics data from multiple sources: genome, epigenome, transcriptome, proteome, metabolome, etc. Traditionally, data from each source (e.g., genome) is analyzed in isolation using statistical and machine learning (ML) methods. Integrative analysis of multi-omics and clinical data is key to new biomedical discoveries and advancements in precision medicine. However, data integration poses new computational challenges as well as exacerbates the ones associated with single-omics studies. Specialized computational approaches are required to effectively and efficiently perform integrative analysis of biomedical data acquired from diverse modalities. In this review, we discuss state-of-the-art ML-based approaches for tackling five specific computational challenges associated with integrative analysis: curse of dimensionality, data heterogeneity, missing data, class imbalance and scalability issues
Feature Selection for Classification with Artificial Bee Colony Programming
Feature selection and classification are the most applied machine learning processes. In the feature selection, it is aimed to find useful properties containing class information by eliminating noisy and unnecessary features in the data sets and facilitating the classifiers. Classification is used to distribute data among the various classes defined on the resulting feature set. In this chapter, artificial bee colony programming (ABCP) is proposed and applied to feature selection for classification problems on four different data sets. The best models are obtained by using the sensitivity fitness function defined according to the total number of classes in the data sets and are compared with the models obtained by genetic programming (GP). The results of the experiments show that the proposed technique is accurate and efficient when compared with GP in terms of critical features selection and classification accuracy on well-known benchmark problems
Genetic algorithm-neural network: feature extraction for bioinformatics data.
With the advance of gene expression data in the bioinformatics field, the questions which frequently arise,
for both computer and medical scientists, are which genes are significantly involved in discriminating cancer
classes and which genes are significant with respect to a specific cancer pathology. Numerous computational analysis models have been developed to identify informative genes from the microarray data, however, the integrity of the reported genes is still uncertain. This is mainly due to the
misconception of the objectives of microarray study. Furthermore, the application of various preprocessing
techniques in the microarray data has jeopardised the quality of the microarray data. As a result, the
integrity of the findings has been compromised by the improper use of techniques and the ill-conceived
objectives of the study. This research proposes an innovative hybridised model based on genetic algorithms (GAs) and artificial neural networks (ANNs), to extract the highly differentially expressed genes for a specific cancer pathology. The proposed method can efficiently extract the informative genes from the original data set and this has
reduced the gene variability errors incurred by the preprocessing techniques. The novelty of the research comes from two perspectives. Firstly, the research emphasises on extracting informative features from a high dimensional and highly complex data set, rather than to improve classification results. Secondly, the use of ANN to compute the fitness function of GA which is rare in the context
of feature extraction. Two benchmark microarray data have been taken to research the prominent genes expressed in the tumour development and the results show that the genes respond to different stages of tumourigenesis (i.e. different fitness precision levels) which may be useful for early malignancy detection. The extraction ability of the
proposed model is validated based on the expected results in the synthetic data sets. In addition, two bioassay data have been used to examine the efficiency of the proposed model to extract significant features from the large, imbalanced and multiple data representation bioassay data
Hybrid Models Of Fuzzy Artmap And Qlearning For Pattern Classification
Pengelasan corak adalah salah satu isu utama dalam pelbagai tugas pencarian
data. Dalam kajian ini, fokus penyelidikan tertumpu kepada reka bentuk dan
pembinaan model hibrid yang menggabungkan rangkaian neural Teori Resonan
Adaptif (ART) terselia dan model Pembelajaran Pengukuhan (RL) untuk pengelasan
corak. Secara khususnya, rangkaian ARTMAP Kabur (FAM) dan Pembelajaran-Q
dijadikan sebagai tulang belakang dalam merekabentuk dan membina model-model
hibrid. Satu model QFAM baharu terlebih dahulu diperkenalkan bagi menambahbaik
prestasi pengelasan rangkaian FAM. Strategi pruning dimasukkan bagi
mengurangkan kekompleksan QFAM. Bagi mengatasi isu ketidak-telusan, Algoritma
Genetik (GA) digunakan bagi mengekstrak hukum kabur if-then daripada QFAM.
Model yang terhasil iaitu QFAM-GA, dapat memberi ramalan berserta dengan
huraian dengan hanya menggunakan bilangan antisiden yang sedikit. Bagi
menambahkan lagi kebolehtahanan model-model Q-FAM, penggunaan sistem agenpelbagai
telah dicadangkan. Hasilnya, model gugusan QFAM berasaskan agen
dengan ukuran percaya dan kaedah rundingan baharu telah dicadangkan. Pelbagai
jenis masalah tanda-aras telah digunakan bagi penilaian model-model gugusan dan
individu berasaskan QFAM. Hasilnya telah dianalisa dan dibandingkan dengan FAM
serta model-model yang dilaporkan dalam kajian terdahulu. Sebagai tambahan, dua
daripada masalah dunia-nyata digunakan bagi menunjukkan kebolehan praktikal
model hibrid. Keputusan akhir menunjukkan keberkesanan modul berasaskan QFAM
dalam menerajui tugas-tugas pengelasan corak.
________________________________________________________________________________________________________________________
Pattern classification is one of the primary issues in various data mining
tasks. In this study, the main research focus is on the design and
development of hybrid models, combining the supervised Adaptive
Resonance Theory (ART) neural network and Reinforcement Learning (RL)
models for pattern classification. Specifically, the Fuzzy ARTMAP (FAM)
network and Q-learning are adopted as the backbone for designing and
developing the hybrid models. A new QFAM model is first introduced to
improve the classification performance of FAM network. A pruning strategy
is incorporated to reduce the complexity of QFAM. To overcome the
opaqueness issue, a Genetic Algorithm (GA) is used to extract fuzzy if-then
rules from QFAM. The resulting model, i.e. QFAM-GA, is able to provide
predictions with explanations using only a few antecedents. To further
improve the robustness of QFAM-based models, the notion of multi agent
systems is employed. As a result, an agent-based QFAM ensemble model
with a new trust measurement and negotiation method is proposed. A variety
of benchmark problems are used for evaluation of individual and ensemble
QFAM-based models. The results are analyzed and compared with those
from FAM as well as other models reported in the literature. In addition, two
real-world problems are used to demonstrate the practicality of the hybrid
models. The outcomes indicate the effectiveness of QFAM-based models in
tackling pattern classification tasks
Recommended from our members
Artificial Immune Systems - Models, algorithms and applications
Copyright © 2010 Academic Research Publishing Agency.This article has been made available through the Brunel Open Access Publishing Fund.Artificial Immune Systems (AIS) are computational paradigms that belong to the computational intelligence family and are inspired by the biological immune system. During the past decade, they have attracted a lot of interest from researchers aiming to develop immune-based models and techniques to solve complex computational or engineering problems. This work presents a survey of existing AIS models and algorithms with a focus on the last five years.This article is available through the Brunel Open Access Publishing Fun
Knowledge management overview of feature selection problem in high-dimensional financial data: Cooperative co-evolution and Map Reduce perspectives
The term big data characterizes the massive amounts of data generation by the advanced technologies in different domains using 4Vs volume, velocity, variety, and veracity-to indicate the amount of data that can only be processed via computationally intensive analysis, the speed of their creation, the different types of data, and their accuracy. High-dimensional financial data, such as time-series and space-Time data, contain a large number of features (variables) while having a small number of samples, which are used to measure various real-Time business situations for financial organizations. Such datasets are normally noisy, and complex correlations may exist between their features, and many domains, including financial, lack the al analytic tools to mine the data for knowledge discovery because of the high-dimensionality. Feature selection is an optimization problem to find a minimal subset of relevant features that maximizes the classification accuracy and reduces the computations. Traditional statistical-based feature selection approaches are not adequate to deal with the curse of dimensionality associated with big data. Cooperative co-evolution, a meta-heuristic algorithm and a divide-And-conquer approach, decomposes high-dimensional problems into smaller sub-problems. Further, MapReduce, a programming model, offers a ready-To-use distributed, scalable, and fault-Tolerant infrastructure for parallelizing the developed algorithm. This article presents a knowledge management overview of evolutionary feature selection approaches, state-of-The-Art cooperative co-evolution and MapReduce-based feature selection techniques, and future research directions
- …