303 research outputs found

    Predictive gene lists for breast cancer prognosis: A topographic visualisation study

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The controversy surrounding the non-uniqueness of predictive gene lists (PGL) of small selected subsets of genes from very large potential candidates as available in DNA microarray experiments is now widely acknowledged <abbrgrp><abbr bid="B1">1</abbr></abbrgrp>. Many of these studies have focused on constructing discriminative semi-parametric models and as such are also subject to the issue of random correlations of sparse model selection in high dimensional spaces. In this work we outline a different approach based around an unsupervised patient-specific nonlinear topographic projection in predictive gene lists.</p> <p>Methods</p> <p>We construct nonlinear topographic projection maps based on inter-patient gene-list relative dissimilarities. The Neuroscale, the Stochastic Neighbor Embedding(SNE) and the Locally Linear Embedding(LLE) techniques have been used to construct two-dimensional projective visualisation plots of 70 dimensional PGLs per patient, classifiers are also constructed to identify the prognosis indicator of each patient using the resulting projections from those visualisation techniques and investigate whether <it>a-posteriori </it>two prognosis groups are separable on the evidence of the gene lists.</p> <p>A literature-proposed predictive gene list for breast cancer is benchmarked against a separate gene list using the above methods. Generalisation ability is investigated by using the mapping capability of Neuroscale to visualise the follow-up study, but based on the projections derived from the original dataset.</p> <p>Results</p> <p>The results indicate that small subsets of patient-specific PGLs have insufficient prognostic dissimilarity to permit a distinction between two prognosis patients. Uncertainty and diversity across multiple gene expressions prevents unambiguous or even confident patient grouping. Comparative projections across different PGLs provide similar results.</p> <p>Conclusion</p> <p>The random correlation effect to an arbitrary outcome induced by small subset selection from very high dimensional interrelated gene expression profiles leads to an outcome with associated uncertainty. This continuum and uncertainty precludes any attempts at constructing discriminative classifiers.</p> <p>However a patient's gene expression profile could possibly be used in treatment planning, based on knowledge of other patients' responses.</p> <p>We conclude that many of the patients involved in such medical studies are <it>intrinsically unclassifiable </it>on the basis of provided PGL evidence. This additional category of 'unclassifiable' should be accommodated within medical decision support systems if serious errors and unnecessary adjuvant therapy are to be avoided.</p

    Multi-class pairwise linear dimensionality reduction using heteroscedastic schemes

    Get PDF
    Linear dimensionality reduction (LDR) techniques have been increasingly important in pattern recognition (PR) due to the fact that they permit a relatively simple mapping of the problem onto a lower-dimensional subspace, leading to simple and computationally efficient classification strategies. Although the field has been well developed for the two-class problem, the corresponding issues encountered when dealing with multiple classes are far from trivial. In this paper, we argue that, as opposed to the traditional LDR multi-class schemes, if we are dealing with multiple classes, it is not expedient to treat it as a multi-class problem per se. Rather, we shall show that it is better to treat it as an ensemble of Chernoff-based two-class reductions onto different subspaces, whence the overall solution is achieved by resorting to either Voting, Weighting, or to a Decision Tree strategy. The experimental results obtained on benchmark datasets demonstrate that the proposed methods are not only efficient, but that they also yield accuracies comparable to that obtained by the optimal Bayes classifier

    Stellar classification from single-band imaging using machine learning

    Full text link
    Information on the spectral types of stars is of great interest in view of the exploitation of space-based imaging surveys. In this article, we investigate the classification of stars into spectral types using only the shape of their diffraction pattern in a single broad-band image. We propose a supervised machine learning approach to this endeavour, based on principal component analysis (PCA) for dimensionality reduction, followed by artificial neural networks (ANNs) estimating the spectral type. Our analysis is performed with image simulations mimicking the Hubble Space Telescope (HST) Advanced Camera for Surveys (ACS) in the F606W and F814W bands, as well as the Euclid VIS imager. We first demonstrate this classification in a simple context, assuming perfect knowledge of the point spread function (PSF) model and the possibility of accurately generating mock training data for the machine learning. We then analyse its performance in a fully data-driven situation, in which the training would be performed with a limited subset of bright stars from a survey, and an unknown PSF with spatial variations across the detector. We use simulations of main-sequence stars with flat distributions in spectral type and in signal-to-noise ratio, and classify these stars into 13 spectral subclasses, from O5 to M5. Under these conditions, the algorithm achieves a high success rate both for Euclid and HST images, with typical errors of half a spectral class. Although more detailed simulations would be needed to assess the performance of the algorithm on a specific survey, this shows that stellar classification from single-band images is well possible.Comment: 10 pages, 9 figures, 2 tables, accepted in A&

    Deformation Correlations and Machine Learning: Microstructural inference and crystal plasticity predictions

    Get PDF
    The present thesis makes a connection between spatially resolved strain correlations and material processing history. Such correlations can be used to infer and classify prior deformation history of a sample at various strain levels with the use of Machine Learning approaches. A simple and concrete example of uniaxially compressed crystalline thin films of various sizes, generated by two-dimensional discrete dislocation plasticity simulations is examined. At the nanoscale, thin films exhibit yield-strength size effects with noisy mechanical responses which create an interesting challenge for the application of Machine Learning techniques. Moreover, this thesis demonstrates the prediction of the average mechanical responses of thin films based on the classified prior deformation history and discusses the possible ramifications for modelling crystal plasticity behavior in extreme settings

    The 2dF Galaxy Redshift Survey: spectral types and luminosity functions

    No full text
    We describe the 2dF Galaxy Redshift Survey (2dFGRS) and the current status of the observations. In this exploratory paper, we apply a principal component analysis to a preliminary sample of 5869 galaxy spectra and use the two most significant components to split the sample into five spectral classes. These classes are defined by considering visual classifications of a subset of the 2dF spectra, and also by comparison with high-quality spectra of local galaxies. We calculate a luminosity function for each of the different classes and find that later-type galaxies have a fainter characteristic magnitude, and a steeper faint-end slope. For the whole sample we find M*=-19.7 (for Ω=1, H_0=100kms^-1Mpc^-1), α=-1.3, φ*=0.017. For class 1 (`early-type') we find M*=-19.6, α=-0.7, while for class 5 (`late-type') we find M*=-19.0, α=-1.7. The derived 2dF luminosity functions agree well with other recent luminosity function estimates

    The 2dF Galaxy Redshift Survey: spectral types and luminosity functions

    Get PDF
    We describe the 2dF Galaxy Redshift Survey (2dFGRS) and the current status of the observations. In this exploratory paper, we apply a principal component analysis to a preliminary sample of 5869 galaxy spectra and use the two most significant components to split the sample into five spectral classes. These classes are defined by considering visual classifications of a subset of the 2dF spectra, and also by comparison with high-quality spectra of local galaxies. We calculate a luminosity function for each of the different classes and find that later-type galaxies have a fainter characteristic magnitude, and a steeper faint-end slope. For the whole sample we find M*=−19.7 (for Ω=1, H₀=100 km s⁻Âč Mpc⁻Âč), α=−1.3, φ*=0.017. For class 1 (‘early-type’) we find M*=−19.6, α=−0.7, while for class 5 (‘late-type’) we find M*=−19.0, α=−1.7. The derived 2dF luminosity functions agree well with other recent luminosity function estimates
