4 research outputs found

    Specific Tuning Parameter for Directed Random Walk Algorithm Cancer Classification

    Get PDF
    Accuracy of cancerous gene classification is a central challenge in clinical cancer research. Microarray-based gene biomarkers have proved the performance and its ability over traditional clinical parameters. However, gene biomarkers of an individual are less robustness due to litter reproducibility between different cohorts of patients. Several methods incorporating pathway information such as directed random walk have been proposed to infer the pathway activity. This paper discusses the implementation of group specific tuning parameter in directed random walk algorithm. In this experiment, gene expression data and pathway data are used as input data. Throughout this experiment, more significant pathway activities can be identified which increases the accuracy of cancer classification. The lung cancer gene is used as the experimental dataset, with which, the sDRW is used in determining significant pathways. More risk-active pathways are identified throughout this experiment

    Analysis of microarray and next generation sequencing data for classification and biomarker discovery in relation to complex diseases

    Get PDF
    PhDThis thesis presents an investigation into gene expression profiling, using microarray and next generation sequencing (NGS) datasets, in relation to multi-category diseases such as cancer. It has been established that if the sequence of a gene is mutated, it can result in the unscheduled production of protein, leading to cancer. However, identifying the molecular signature of different cancers amongst thousands of genes is complex. This thesis investigates tools that can aid the study of gene expression to infer useful information towards personalised medicine. For microarray data analysis, this study proposes two new techniques to increase the accuracy of cancer classification. In the first method, a novel optimisation algorithm, COA-GA, was developed by synchronising the Cuckoo Optimisation Algorithm and the Genetic Algorithm for data clustering in a shuffle setup, to choose the most informative genes for classification purposes. Support Vector Machine (SVM) and Multilayer Perceptron (MLP) artificial neural networks are utilised for the classification step. Results suggest this method can significantly increase classification accuracy compared to other methods. An additional method involving a two-stage gene selection process was developed. In this method, a subset of the most informative genes are first selected by the Minimum Redundancy Maximum Relevance (MRMR) method. In the second stage, optimisation algorithms are used in a wrapper setup with SVM to minimise the selected genes whilst maximising the accuracy of classification. A comparative performance assessment suggests that the proposed algorithm significantly outperforms other methods at selecting fewer genes that are highly relevant to the cancer type, while maintaining a high classification accuracy. In the case of NGS, a state-of-the-art pipeline for the analysis of RNA-Seq data is investigated to discover differentially expressed genes and differential exon usages between normal and AIP positive Drosophila datasets, which are produced in house at Queen Mary, University of London. Functional genomic of differentially expressed genes were examined and found to be relevant to the case study under investigation. Finally, after normalising the RNA-Seq data, machine learning approaches similar to those in microarray was successfully implemented for these datasets

    Integrative gene selection for classification of microarray data

    No full text
    Microarray data classification is one of the major interests in health informatics that aims at discovering hidden patterns in gene expression profiles. The main challenge in building this classification system is the curse of dimensionality problem. Thus, there is a considerable amount of studies on gene selection method for building effective classification models. However, most of the approaches consider solely on gene expression values, and as a result, the selected genes might not be biologically meaningful. This paper presents an integrative gene selection for improving microarray data classification performance. The proposed approach employs the association analysis technique to integrate both gene expression and biological data in identifying informative genes. The experimental results show that the proposed gene selection outperformed the traditional method in terms of accuracy and number of selected genes

    Integrative Gene Selection for Classification of Microarray Data

    No full text
    corecore