935 research outputs found
A new genetic algorithm for multi-label correlation-based feature selection.
This paper proposes a new Genetic Algorithm for Multi-Label Correlation-Based Feature Selection (GA-ML-CFS). This GA performs a global search in the space of candidate feature subset, in order to select a high-quality feature subset is used by a multi-label classification algorithm - in this work, the Multi-Label k-NN algorithm. We compare the results of GA-ML-CFS with the results of the previously proposed Hill-Climbing for Multi-Label Correlation-Based Feature Selection (HC-ML-CFS), across 10 multi-label datasets
Incorporating feature ranking and evolutionary methods for the classification of high-dimensional DNA microarray gene expression data
Background: DNA microarray gene expression classification poses a challenging task to the machine learning domain. Typically, the dimensionality of gene expression data sets could go from several thousands to over 10,000 genes. A potential solution to this issue is using feature selection to reduce the dimensionality. Aim The aim of this paper is to investigate how we can use feature quality information to improve the precision of microarray gene expression classification tasks. Method: We propose two evolutionary machine learning models based on the eXtended Classifier System (XCS) and a typical feature selection methodology. The first one, which we call FS-XCS, uses feature selection for feature reduction purposes. The second model is GRD-XCS, which uses feature ranking to bias the rule discovery process of XCS. Results: The results indicate that the use of feature selection/ranking methods is essential for tackling high-dimensional classification tasks, such as microarray gene expression classification. However, the results also suggest that using feature ranking to bias the rule discovery process performs significantly better than using the feature reduction method. In other words, using feature quality information to develop a smarter learning procedure is more efficient than reducing the feature set. Conclusion: Our findings have shown that extracting feature quality information can assist the learning process and improve classification accuracy. On the other hand, relying exclusively on the feature quality information might potentially decrease the classification performance (e.g., using feature reduction). Therefore, we recommend a hybrid approach that uses feature quality information to direct the learning process by highlighting the more informative features, but at the same time not restricting the learning process to explore other features
Wavelet feature extraction and genetic algorithm for biomarker detection in colorectal cancer data
Biomarkers which predict patient’s survival can play an important role in medical diagnosis and
treatment. How to select the significant biomarkers from hundreds of protein markers is a key step in
survival analysis. In this paper a novel method is proposed to detect the prognostic biomarkers ofsurvival in colorectal cancer patients using wavelet analysis, genetic algorithm, and Bayes classifier. One dimensional discrete wavelet transform (DWT) is normally used to reduce the dimensionality of biomedical data. In this study one dimensional continuous wavelet transform (CWT) was proposed to extract the features of colorectal cancer data. One dimensional CWT has no ability to reduce
dimensionality of data, but captures the missing features of DWT, and is complementary part of DWT. Genetic algorithm was performed on extracted wavelet coefficients to select the optimized features, using Bayes classifier to build its fitness function. The corresponding protein markers were
located based on the position of optimized features. Kaplan-Meier curve and Cox regression model 2 were used to evaluate the performance of selected biomarkers. Experiments were conducted on colorectal cancer dataset and several significant biomarkers were detected. A new protein biomarker CD46 was found to significantly associate with survival time
Gene masking - a technique to improve accuracy for cancer classification with high dimensionality in microarray data
Background: High dimensional feature space generally degrades classification in several applications. In this paper, we propose a strategy called gene masking, in which non-contributing dimensions are heuristically removed from the data to improve classification accuracy.
Methods: Gene masking is implemented via a binary encoded genetic algorithm that can be integrated seamlessly
with classifiers during the training phase of classification to perform feature selection. It can also be used to discriminate between features that contribute most to the classification, thereby, allowing researchers to isolate features that may have special significance.
Results: This technique was applied on publicly available datasets whereby it substantially reduced the number of
features used for classification while maintaining high accuracies.
Conclusion: The proposed technique can be extremely useful in feature selection as it heuristically removes
non-contributing features to improve the performance of classifiers
ClgR regulation of chaperone and protease systems is essential for Mycobacterium tuberculosis parasitism of the macrophage
Chaperone and protease systems play essential roles in cellular homeostasis and have vital functions in controlling the abundance of specific cellular proteins involved in processes such as transcription, replication, metabolism and virulence. Bacteria have evolved accurate regulatory systems to control the expression and function of chaperones and potentially destructive proteases. Here, we have used a combination of transcriptomics, proteomics and targeted mutagenesis to reveal that the clp gene regulator (ClgR) of Mycobacterium tuberculosis activates the transcription of at least ten genes, including four that encode protease systems (ClpP1/C, ClpP2/C, PtrB and HtrA-like protease Rv1043c) and three that encode chaperones (Acr2, ClpB and the chaperonin Rv3269). Thus, M. tuberculosis ClgR controls a larger network of protein homeostatic and regulatory systems than ClgR in any other bacterium studied to date. We demonstrate that ClgR-regulated transcriptional activation of these systems is essential for M. tuberculosis to replicate in macrophages. Furthermore, we observe that this defect is manifest early in infection, as M. tuberculosis lacking ClgR is deficient in the ability to control phagosome pH 1 h post-phagocytosis
Triclustering on TemporaryMicroarray Data using the TriGen Algorithm
The analysis of microarray data is a computational
challenge due to the characteristics of these data.
Clustering techniques are widely applied to create groups of
genes that exhibit a similar behavior under the conditions
tested. Biclustering emerges as an improvement of classical
clustering since it relaxes the constraints for grouping allowing
genes to be evaluated only under a subset of the conditions
and not under all of them. However, this technique is not
appropriate for the analysis of temporal microarray data in
which the genes are evaluated under certain conditions at
several time points. In this paper, we propose the TriGen
algorithm, which finds triclusters that take into account the
experimental conditions and the time points, using evolutionary
computation, in particular genetic algorithms, enabling the
evaluation of the gene’s behavior under subsets of conditions
and of time points
Combining Bayesian Approaches and Evolutionary Techniques for the Inference of Breast Cancer Networks
Gene and protein networks are very important to model complex large-scale
systems in molecular biology. Inferring or reverseengineering such networks can
be defined as the process of identifying gene/protein interactions from
experimental data through computational analysis. However, this task is
typically complicated by the enormously large scale of the unknowns in a rather
small sample size. Furthermore, when the goal is to study causal relationships
within the network, tools capable of overcoming the limitations of correlation
networks are required. In this work, we make use of Bayesian Graphical Models
to attach this problem and, specifically, we perform a comparative study of
different state-of-the-art heuristics, analyzing their performance in inferring
the structure of the Bayesian Network from breast cancer data
A lexicographic multi-objective genetic algorithm for multi-label correlation-based feature selection
This paper proposes a new Lexicographic multi-objective Genetic Algorithm for Multi-Label Correlation-based Feature Selection (LexGA-ML-CFS), which is an extension of the previous single-objective Genetic Algorithm for Multi-label Correlation-based Feature Selection (GA-ML-CFS). This extension uses a LexGA as a global search method for generating candidate feature subsets. In our experiments, we compare the results obtained by LexGA-ML-CFS with the results obtained by the original hill climbing-based ML-CFS, the single-objective GA-ML-CFS and a baseline Binary Relevance method, using ML-kNN as the multi-label classifier. The results from our experiments show that LexGA-ML-CFS improved predictive accuracy, by comparison with other methods, in some cases, but in general there was no statistically significant different between the results of LexGA-ML-CFS and other methods
- …