6 research outputs found

    Multi-resolution independent component analysis for high-performance tumor classification and biomarker discovery

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Although high-throughput microarray based molecular diagnostic technologies show a great promise in cancer diagnosis, it is still far from a clinical application due to its low and instable sensitivities and specificities in cancer molecular pattern recognition. In fact, high-dimensional and heterogeneous tumor profiles challenge current machine learning methodologies for its small number of samples and large or even huge number of variables (genes). This naturally calls for the use of an effective feature selection in microarray data classification.</p> <p>Methods</p> <p>We propose a novel feature selection method: multi-resolution independent component analysis (MICA) for large-scale gene expression data. This method overcomes the weak points of the widely used transform-based feature selection methods such as principal component analysis (PCA), independent component analysis (ICA), and nonnegative matrix factorization (NMF) by avoiding their global feature-selection mechanism. In addition to demonstrating the effectiveness of the multi-resolution independent component analysis in meaningful biomarker discovery, we present a multi-resolution independent component analysis based support vector machines (MICA-SVM) and linear discriminant analysis (MICA-LDA) to attain high-performance classifications in low-dimensional spaces.</p> <p>Results</p> <p>We have demonstrated the superiority and stability of our algorithms by performing comprehensive experimental comparisons with nine state-of-the-art algorithms on six high-dimensional heterogeneous profiles under cross validations. Our classification algorithms, especially, MICA-SVM, not only accomplish clinical or near-clinical level sensitivities and specificities, but also show strong performance stability over its peers in classification. Software that implements the major algorithm and data sets on which this paper focuses are freely available at <url>https://sites.google.com/site/heyaumapbc2011/</url>.</p> <p>Conclusions</p> <p>This work suggests a new direction to accelerate microarray technologies into a clinical routine through building a high-performance classifier to attain clinical-level sensitivities and specificities by treating an input profile as a ‘profile-biomarker’. The multi-resolution data analysis based redundant global feature suppressing and effective local feature extraction also have a positive impact on large scale ‘omics’ data mining.</p

    Non-negative matrix factorisation methods for the spectral decomposition of MRS data from human brain tumours

    Get PDF
    <p>Abstract</p> <p>Background</p> <p><it>In-vivo </it>single voxel proton magnetic resonance spectroscopy (SV <sup>1</sup>H-MRS), coupled with supervised pattern recognition (PR) methods, has been widely used in clinical studies of discrimination of brain tumour types and follow-up of patients bearing abnormal brain masses. SV <sup>1</sup>H-MRS provides useful biochemical information about the metabolic state of tumours and can be performed at short (< 45 ms) or long (> 45 ms) echo time (TE), each with particular advantages. Short-TE spectra are more adequate for detecting lipids, while the long-TE provides a much flatter signal baseline in between peaks but also negative signals for metabolites such as lactate. Both, lipids and lactate, are respectively indicative of specific metabolic processes taking place. Ideally, the information provided by both TE should be of use for clinical purposes. In this study, we characterise the performance of a range of Non-negative Matrix Factorisation (NMF) methods in two respects: first, to derive sources correlated with the mean spectra of known tissue types (tumours and normal tissue); second, taking the best performing NMF method for source separation, we compare its accuracy for class assignment when using the mixing matrix directly as a basis for classification, as against using the method for dimensionality reduction (DR). For this, we used SV <sup>1</sup>H-MRS data with positive and negative peaks, from a widely tested SV <sup>1</sup>H-MRS human brain tumour database.</p> <p>Results</p> <p>The results reported in this paper reveal the advantage of using a recently described variant of NMF, namely Convex-NMF, as an unsupervised method of source extraction from SV<sup>1</sup>H-MRS. Most of the sources extracted in our experiments closely correspond to the mean spectra of some of the analysed tumour types. This similarity allows accurate diagnostic predictions to be made both in fully unsupervised mode and using Convex-NMF as a DR step previous to standard supervised classification. The obtained results are comparable to, or more accurate than those obtained with supervised techniques.</p> <p>Conclusions</p> <p>The unsupervised properties of Convex-NMF place this approach one step ahead of classical label-requiring supervised methods for the discrimination of brain tumour types, as it accounts for their increasingly recognised molecular subtype heterogeneity. The application of Convex-NMF in computer assisted decision support systems is expected to facilitate further improvements in the uptake of MRS-derived information by clinicians.</p

    Multivariate methods for interpretable analysis of magnetic resonance spectroscopy data in brain tumour diagnosis

    Get PDF
    Malignant tumours of the brain represent one of the most difficult to treat types of cancer due to the sensitive organ they affect. Clinical management of the pathology becomes even more intricate as the tumour mass increases due to proliferation, suggesting that an early and accurate diagnosis is vital for preventing it from its normal course of development. The standard clinical practise for diagnosis includes invasive techniques that might be harmful for the patient, a fact that has fostered intensive research towards the discovery of alternative non-invasive brain tissue measurement methods, such as nuclear magnetic resonance. One of its variants, magnetic resonance imaging, is already used in a regular basis to locate and bound the brain tumour; but a complementary variant, magnetic resonance spectroscopy, despite its higher spatial resolution and its capability to identify biochemical metabolites that might become biomarkers of tumour within a delimited area, lags behind in terms of clinical use, mainly due to its difficult interpretability. The interpretation of magnetic resonance spectra corresponding to brain tissue thus becomes an interesting field of research for automated methods of knowledge extraction such as machine learning, always understanding its secondary role behind human expert medical decision making. The current thesis aims at contributing to the state of the art in this domain by providing novel techniques for assistance of radiology experts, focusing on complex problems and delivering interpretable solutions. In this respect, an ensemble learning technique to accurately discriminate amongst the most aggressive brain tumours, namely glioblastomas and metastases, has been designed; moreover, a strategy to increase the stability of biomarker identification in the spectra by means of instance weighting is provided. From a different analytical perspective, a tool based on signal source separation, guided by tumour type-specific information has been developed to assess the existence of different tissues in the tumoural mass, quantifying their influence in the vicinity of tumoural areas. This development has led to the derivation of a probabilistic interpretation of some source separation techniques, which provide support for uncertainty handling and strategies for the estimation of the most accurate number of differentiated tissues within the analysed tumour volumes. The provided strategies should assist human experts through the use of automated decision support tools and by tackling interpretability and accuracy from different anglesEls tumors cerebrals malignes representen un dels tipus de càncer més difícils de tractar degut a la sensibilitat de l’òrgan que afecten. La gestió clínica de la patologia esdevé encara més complexa quan la massa tumoral s'incrementa degut a la proliferació incontrolada de cèl·lules; suggerint que una diagnosis precoç i acurada és vital per prevenir el curs natural de desenvolupament. La pràctica clínica estàndard per a la diagnosis inclou la utilització de tècniques invasives que poden arribar a ser molt perjudicials per al pacient, factor que ha fomentat la recerca intensiva cap al descobriment de mètodes alternatius de mesurament dels teixits del cervell, tals com la ressonància magnètica nuclear. Una de les seves variants, la imatge de ressonància magnètica, ja s'està actualment utilitzant de forma regular per localitzar i delimitar el tumor. Així mateix, una variant complementària, la espectroscòpia de ressonància magnètica, malgrat la seva alta resolució espacial i la seva capacitat d'identificar metabòlits bioquímics que poden esdevenir biomarcadors de tumor en una àrea delimitada, està molt per darrera en termes d'ús clínic, principalment per la seva difícil interpretació. Per aquest motiu, la interpretació dels espectres de ressonància magnètica corresponents a teixits del cervell esdevé un interessant camp de recerca en mètodes automàtics d'extracció de coneixement tals com l'aprenentatge automàtic, sempre entesos com a una eina d'ajuda per a la presa de decisions per part d'un metge expert humà. La tesis actual té com a propòsit la contribució a l'estat de l'art en aquest camp mitjançant l'aportació de noves tècniques per a l'assistència d'experts radiòlegs, centrades en problemes complexes i proporcionant solucions interpretables. En aquest sentit, s'ha dissenyat una tècnica basada en comitè d'experts per a una discriminació acurada dels diferents tipus de tumors cerebrals agressius, anomenats glioblastomes i metàstasis; a més, es proporciona una estratègia per a incrementar l'estabilitat en la identificació de biomarcadors presents en un espectre mitjançant una ponderació d'instàncies. Des d'una perspectiva analítica diferent, s'ha desenvolupat una eina basada en la separació de fonts, guiada per informació específica de tipus de tumor per a avaluar l'existència de diferents tipus de teixits existents en una massa tumoral, quantificant-ne la seva influència a les regions tumorals veïnes. Aquest desenvolupament ha portat cap a la derivació d'una interpretació probabilística d'algunes d'aquestes tècniques de separació de fonts, proporcionant suport per a la gestió de la incertesa i estratègies d'estimació del nombre més acurat de teixits diferenciats en cada un dels volums tumorals analitzats. Les estratègies proporcionades haurien d'assistir els experts humans en l'ús d'eines automatitzades de suport a la decisió, donada la interpretabilitat i precisió que presenten des de diferents angles
    corecore