66 research outputs found

    Nonnegative principal component analysis for mass spectral serum profiles and biomarker discovery

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>As a novel cancer diagnostic paradigm, mass spectroscopic serum proteomic pattern diagnostics was reported superior to the conventional serologic cancer biomarkers. However, its clinical use is not fully validated yet. An important factor to prevent this young technology to become a mainstream cancer diagnostic paradigm is that robustly identifying cancer molecular patterns from high-dimensional protein expression data is still a challenge in machine learning and oncology research. As a well-established dimension reduction technique, PCA is widely integrated in pattern recognition analysis to discover cancer molecular patterns. However, its global feature selection mechanism prevents it from capturing local features. This may lead to difficulty in achieving high-performance proteomic pattern discovery, because only features interpreting global data behavior are used to train a learning machine.</p> <p>Methods</p> <p>In this study, we develop a nonnegative principal component analysis algorithm and present a nonnegative principal component analysis based support vector machine algorithm with sparse coding to conduct a high-performance proteomic pattern classification. Moreover, we also propose a nonnegative principal component analysis based filter-wrapper biomarker capturing algorithm for mass spectral serum profiles.</p> <p>Results</p> <p>We demonstrate the superiority of the proposed algorithm by comparison with six peer algorithms on four benchmark datasets. Moreover, we illustrate that nonnegative principal component analysis can be effectively used to capture meaningful biomarkers.</p> <p>Conclusion</p> <p>Our analysis suggests that nonnegative principal component analysis effectively conduct local feature selection for mass spectral profiles and contribute to improving sensitivities and specificities in the following classification, and meaningful biomarker discovery.</p

    Spectral Separation of Quantum Dots within Tissue Equivalent Phantom Using Linear Unmixing Methods in Multispectral Fluorescence Reflectance Imaging

    Get PDF
    Introduction Non-invasive Fluorescent Reflectance Imaging (FRI) is used for accessing physiological and molecular processes in biological media. The aim of this article is to separate the overlapping emission spectra of quantum dots within tissue-equivalent phantom using SVD, Jacobi SVD, and NMF methods in the FRI mode. Materials and Methods In this article, a tissue-like phantom and an optical setup in reflectance mode were developed. The algorithm of multispectral imaging method was then written in Matlab environment. The setup included the diode-pumped solid-state lasers at 479 nm, 533 nm, and 798 nm, achromatic telescopic, mirror, high pass and low pass filters, and EMCCD camera. The FRI images were acquired by a CCD camera using band pass filter centered at 600 nm and high pass max at 615 nm for the first region and high pass filter max at 810 nm for the second region. The SVD and Jacobi SVD algorithms were written in Matlab environment and compared with a Non-negative Matrix Factorization (NMF) and applied to the obtained images. Results PSNR, SNR, CNR of SVD, and NMF methods were obtained as 39 dB, 30.1 dB, and 0.7 dB, respectively. The results showed that the difference of Jacobi SVD PSNR with PSNR of NMF and modified NMF algorithm was significant (p<0.0001). The statistical results showed that the Jacobi SVD was more accurate than modified NMF. Conclusion In this study, the Jacobi SVD was introduced as a powerful method for obtaining the unmixed FRI images. An experimental evaluation of the algorithm will be done in the near future

    Extraction of pure components from overlapped signals in gas chromatography-mass spectrometry (GC-MS)

    Get PDF
    Gas chromatography-mass spectrometry (GC-MS) is a widely used analytical technique for the identification and quantification of trace chemicals in complex mixtures. When complex samples are analyzed by GC-MS it is common to observe co-elution of two or more components, resulting in an overlap of signal peaks observed in the total ion chromatogram. In such situations manual signal analysis is often the most reliable means for the extraction of pure component signals; however, a systematic manual analysis over a number of samples is both tedious and prone to error. In the past 30 years a number of computational approaches were proposed to assist in the process of the extraction of pure signals from co-eluting GC-MS components. This includes empirical methods, comparison with library spectra, eigenvalue analysis, regression and others. However, to date no approach has been recognized as best, nor accepted as standard. This situation hampers general GC-MS capabilities, and in particular has implications for the development of robust, high-throughput GC-MS analytical protocols required in metabolic profiling and biomarker discovery. Here we first discuss the nature of GC-MS data, and then review some of the approaches proposed for the extraction of pure signals from co-eluting components. We summarize and classify different approaches to this problem, and examine why so many approaches proposed in the past have failed to live up to their full promise. Finally, we give some thoughts on the future developments in this field, and suggest that the progress in general computing capabilities attained in the past two decades has opened new horizons for tackling this important problem

    Non-negative matrix factorisation methods for the spectral decomposition of MRS data from human brain tumours

    Get PDF
    <p>Abstract</p> <p>Background</p> <p><it>In-vivo </it>single voxel proton magnetic resonance spectroscopy (SV <sup>1</sup>H-MRS), coupled with supervised pattern recognition (PR) methods, has been widely used in clinical studies of discrimination of brain tumour types and follow-up of patients bearing abnormal brain masses. SV <sup>1</sup>H-MRS provides useful biochemical information about the metabolic state of tumours and can be performed at short (< 45 ms) or long (> 45 ms) echo time (TE), each with particular advantages. Short-TE spectra are more adequate for detecting lipids, while the long-TE provides a much flatter signal baseline in between peaks but also negative signals for metabolites such as lactate. Both, lipids and lactate, are respectively indicative of specific metabolic processes taking place. Ideally, the information provided by both TE should be of use for clinical purposes. In this study, we characterise the performance of a range of Non-negative Matrix Factorisation (NMF) methods in two respects: first, to derive sources correlated with the mean spectra of known tissue types (tumours and normal tissue); second, taking the best performing NMF method for source separation, we compare its accuracy for class assignment when using the mixing matrix directly as a basis for classification, as against using the method for dimensionality reduction (DR). For this, we used SV <sup>1</sup>H-MRS data with positive and negative peaks, from a widely tested SV <sup>1</sup>H-MRS human brain tumour database.</p> <p>Results</p> <p>The results reported in this paper reveal the advantage of using a recently described variant of NMF, namely Convex-NMF, as an unsupervised method of source extraction from SV<sup>1</sup>H-MRS. Most of the sources extracted in our experiments closely correspond to the mean spectra of some of the analysed tumour types. This similarity allows accurate diagnostic predictions to be made both in fully unsupervised mode and using Convex-NMF as a DR step previous to standard supervised classification. The obtained results are comparable to, or more accurate than those obtained with supervised techniques.</p> <p>Conclusions</p> <p>The unsupervised properties of Convex-NMF place this approach one step ahead of classical label-requiring supervised methods for the discrimination of brain tumour types, as it accounts for their increasingly recognised molecular subtype heterogeneity. The application of Convex-NMF in computer assisted decision support systems is expected to facilitate further improvements in the uptake of MRS-derived information by clinicians.</p

    CANCER MOLECULAR PATTERN DISCOVERY BY SUBSPACE CONSENSUS KERNEL CLASSIFICATION

    Full text link

    Structure-revealing data fusion

    Get PDF
    BACKGROUND: Analysis of data from multiple sources has the potential to enhance knowledge discovery by capturing underlying structures, which are, otherwise, difficult to extract. Fusing data from multiple sources has already proved useful in many applications in social network analysis, signal processing and bioinformatics. However, data fusion is challenging since data from multiple sources are often (i) heterogeneous (i.e., in the form of higher-order tensors and matrices), (ii) incomplete, and (iii) have both shared and unshared components. In order to address these challenges, in this paper, we introduce a novel unsupervised data fusion model based on joint factorization of matrices and higher-order tensors. RESULTS: While the traditional formulation of coupled matrix and tensor factorizations modeling only shared factors fails to capture the underlying structures in the presence of both shared and unshared factors, the proposed data fusion model has the potential to automatically reveal shared and unshared components through modeling constraints. Using numerical experiments, we demonstrate the effectiveness of the proposed approach in terms of identifying shared and unshared components. Furthermore, we measure a set of mixtures with known chemical composition using both LC-MS (Liquid Chromatography - Mass Spectrometry) and NMR (Nuclear Magnetic Resonance) and demonstrate that the structure-revealing data fusion model can (i) successfully capture the chemicals in the mixtures and extract the relative concentrations of the chemicals accurately, (ii) provide promising results in terms of identifying shared and unshared chemicals, and (iii) reveal the relevant patterns in LC-MS by coupling with the diffusion NMR data. CONCLUSIONS: We have proposed a structure-revealing data fusion model that can jointly analyze heterogeneous, incomplete data sets with shared and unshared components and demonstrated its promising performance as well as potential limitations on both simulated and real data. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/1471-2105-15-239) contains supplementary material, which is available to authorized users

    A review of blind source separation in NMR spectroscopy

    No full text
    27 pagesInternational audienceFourier transform is the data processing naturally associated to most NMR experiments. Notable exceptions are Pulse Field Gradient and relaxation analysis, the structure of which is only partially suitable for FT. With the revamp of NMR of complex mixtures, fueled by analytical challenges such as metabolomics, alternative and more apt mathematical methods for data processing have been sought, with the aim of decomposing the NMR signal into simpler bits. Blind source separation is a very broad definition regrouping several classes of mathematical methods for complex signal decomposition that use no hypothesis on the form of the data. Developed outside NMR, these algorithms have been increasingly tested on spectra of mixtures. In this review, we shall provide an historical overview of the application of blind source separation methodologies to NMR, including methods specifically designed for the specificity of this spectroscopy
    corecore