34 research outputs found

    An Algorithm for Finding Biologically Significant Features in Microarray Data Based on <i>A Priori</i> Manifold Learning

    No full text
    <div><p>Microarray databases are a large source of genetic data, which, upon proper analysis, could enhance our understanding of biology and medicine. Many microarray experiments have been designed to investigate the genetic mechanisms of cancer, and analytical approaches have been applied in order to classify different types of cancer or distinguish between cancerous and non-cancerous tissue. However, microarrays are high-dimensional datasets with high levels of noise and this causes problems when using machine learning methods. A popular approach to this problem is to search for a set of features that will simplify the structure and to some degree remove the noise from the data. The most widely used approach to feature extraction is principal component analysis (PCA) which assumes a multivariate Gaussian model of the data. More recently, non-linear methods have been investigated. Among these, manifold learning algorithms, for example Isomap, aim to project the data from a higher dimensional space onto a lower dimension one. We have proposed <i>a priori</i> manifold learning for finding a manifold in which a representative set of microarray data is fused with relevant data taken from the KEGG pathway database. Once the manifold has been constructed the raw microarray data is projected onto it and clustering and classification can take place. In contrast to earlier fusion based methods, the prior knowledge from the KEGG databases is not used in, and does not bias the classification process—it merely acts as an aid to find the best space in which to search the data. In our experiments we have found that using our new manifold method gives better classification results than using either PCA or conventional Isomap.</p></div

    10 Fold Cross Validation Variance On Sample-by-Sample Transformation using <i>k</i>-Nearest Neighbours.

    No full text
    <p>The results show that the variance of the cross validation is very small and thus we can safely compare the methods tested.</p

    10 Fold Cross Validation Variance On Gene-by-Gene Transformation using Linear Discriminant Analysis.

    No full text
    <p>The results show that the variance of the cross validation is very small and thus we can safely compare the methods tested.</p

    Pathway Robustness (Ovary).

    No full text
    <p>A plot of ROC curves with different percentages of pathways.</p

    Pathway Robustness (Prostate).

    No full text
    <p>A plot of the Dunn Index with different percentages of pathways.</p

    Pathway Robustness (Kidney).

    No full text
    <p>A plot of ROC curves with different percentages of pathways.</p

    10 Fold Cross Validation Accuracy On Gene-by-Gene Transformation using <i>k</i>-Nearest Neighbours.

    No full text
    <p>The results of 10-fold cross-validation on the dataset using gene-by-gene affinity matrices for PCA and Isomap. The <i>a priori</i> manifold learning method clearly outperforms the other two. We have emphasised in bold the cases which <i>a priori</i> manifold learning outperforms the rest of the methods.</p

    Leukaemia cell.

    No full text
    <p>Two dimensional manifold of the three different leukaemia cells. Clusters of the different cell types are formed and are easily distinguished in the lower dimensional space.</p
    corecore