10,367 research outputs found
Innovative Hybridisation of Genetic Algorithms and Neural Networks in Detecting Marker Genes for Leukaemia Cancer
Methods for extracting marker genes that trigger the growth
of cancerous cells from a high level of complexity microarrays are of much interest from the computing community. Through the identified genes, the pathology of cancerous cells can be revealed and early precaution
can be taken to prevent further proliferation of cancerous cells. In this paper, we propose an innovative hybridised gene identification framework based on genetic algorithms and neural networks to identify marker genes for leukaemia disease. Our approach confirms that high classification
accuracy does not ensure the optimal set of genes have been identified and our model delivers a more promising set of genes even with a lower classification accurac
Knowledge-based gene expression classification via matrix factorization
Motivation: Modern machine learning methods based on matrix decomposition techniques, like independent component analysis (ICA) or non-negative matrix factorization (NMF), provide new and efficient analysis tools which are currently explored to analyze gene expression profiles. These exploratory feature extraction techniques yield expression modes (ICA) or metagenes (NMF). These extracted features are considered indicative of underlying regulatory processes. They can as well be applied to the classification of gene expression datasets by grouping samples into different categories for diagnostic purposes or group genes into functional categories for further investigation of related metabolic pathways and regulatory networks.
Results: In this study we focus on unsupervised matrix factorization techniques and apply ICA and sparse NMF to microarray datasets. The latter monitor the gene expression levels of human peripheral blood cells during differentiation from monocytes to macrophages. We show that these tools are able to identify relevant signatures in the deduced component matrices and extract informative sets of marker genes from these gene expression profiles. The methods rely on the joint discriminative power of a set of marker genes rather than on single marker genes. With these sets of marker genes, corroborated by leave-one-out or random forest cross-validation, the datasets could easily be classified into related diagnostic categories. The latter correspond to either monocytes versus macrophages or healthy vs Niemann Pick C disease patients.Siemens AG, MunichDFG (Graduate College 638)DAAD (PPP Luso - Alem˜a and PPP Hispano - Alemanas
Increasing stability and interpretability of gene expression signatures
Motivation : Molecular signatures for diagnosis or prognosis estimated from
large-scale gene expression data often lack robustness and stability, rendering
their biological interpretation challenging. Increasing the signature's
interpretability and stability across perturbations of a given dataset and, if
possible, across datasets, is urgently needed to ease the discovery of
important biological processes and, eventually, new drug targets. Results : We
propose a new method to construct signatures with increased stability and
easier interpretability. The method uses a gene network as side interpretation
and enforces a large connectivity among the genes in the signature, leading to
signatures typically made of genes clustered in a few subnetworks. It combines
the recently proposed graph Lasso procedure with a stability selection
procedure. We evaluate its relevance for the estimation of a prognostic
signature in breast cancer, and highlight in particular the increase in
interpretability and stability of the signature
A Regularized Method for Selecting Nested Groups of Relevant Genes from Microarray Data
Gene expression analysis aims at identifying the genes able to accurately
predict biological parameters like, for example, disease subtyping or
progression. While accurate prediction can be achieved by means of many
different techniques, gene identification, due to gene correlation and the
limited number of available samples, is a much more elusive problem. Small
changes in the expression values often produce different gene lists, and
solutions which are both sparse and stable are difficult to obtain. We propose
a two-stage regularization method able to learn linear models characterized by
a high prediction performance. By varying a suitable parameter these linear
models allow to trade sparsity for the inclusion of correlated genes and to
produce gene lists which are almost perfectly nested. Experimental results on
synthetic and microarray data confirm the interesting properties of the
proposed method and its potential as a starting point for further biological
investigationsComment: 17 pages, 8 Post-script figure
DNA expression microarrays may be the wrong tool to identify biological pathways
DNA microarray expression signatures are expected to provide new insights into patho- physiological pathways. Numerous variant statistical methods have been described for each step of the signal analysis. We employed five similar statistical tests on the same data set at the level of gene selection. Inter-test agreement for the identification of biological pathways in BioCarta, KEGG and Reactome was calculated using Cohen’s k- score. The identification of specific biological pathways showed only moderate agreement (0.30 < k < 0.79) between the analysis methods used. Pathways identified by microarrays must be treated cautiously as they vary according to the statistical method used
A novel dimensionality reduction technique based on independent component analysis for modeling microarray gene expression data
DNA microarray experiments generating thousands of gene
expression measurements, are being used to gather information from tissue and cell samples regarding gene expression differences that will be useful in diagnosing disease. But one challenge of microarray studies is the fact that the number n of samples collected is relatively small compared to the number p of genes per sample which are usually in thousands. In statistical terms this very large number of predictors compared to a small number of samples or observations makes the classification problem difficult. This is known as the ”curse of dimensionality problem”. An efficient way to solve this problem is by using dimensionality reduction techniques. Principle Component Analysis(PCA) is a leading method for dimensionality reduction of gene expression data which is optimal in the sense of least square error. In this paper we propose a new dimensionality reduction technique for specific bioinformatics applications based on Independent component Analysis(ICA). Being able to exploit higher order statistics to identify a linear model result, this ICA based dimensionality reduction technique
outperforms PCA from both statistical and biological
significance aspects. We present experiments on NCI 60
dataset to show this result
- …