3,383 research outputs found

    Wavelet feature extraction and genetic algorithm for biomarker detection in colorectal cancer data

    Get PDF
    Biomarkers which predict patient’s survival can play an important role in medical diagnosis and treatment. How to select the significant biomarkers from hundreds of protein markers is a key step in survival analysis. In this paper a novel method is proposed to detect the prognostic biomarkers ofsurvival in colorectal cancer patients using wavelet analysis, genetic algorithm, and Bayes classifier. One dimensional discrete wavelet transform (DWT) is normally used to reduce the dimensionality of biomedical data. In this study one dimensional continuous wavelet transform (CWT) was proposed to extract the features of colorectal cancer data. One dimensional CWT has no ability to reduce dimensionality of data, but captures the missing features of DWT, and is complementary part of DWT. Genetic algorithm was performed on extracted wavelet coefficients to select the optimized features, using Bayes classifier to build its fitness function. The corresponding protein markers were located based on the position of optimized features. Kaplan-Meier curve and Cox regression model 2 were used to evaluate the performance of selected biomarkers. Experiments were conducted on colorectal cancer dataset and several significant biomarkers were detected. A new protein biomarker CD46 was found to significantly associate with survival time

    Machine Learning and Integrative Analysis of Biomedical Big Data.

    Get PDF
    Recent developments in high-throughput technologies have accelerated the accumulation of massive amounts of omics data from multiple sources: genome, epigenome, transcriptome, proteome, metabolome, etc. Traditionally, data from each source (e.g., genome) is analyzed in isolation using statistical and machine learning (ML) methods. Integrative analysis of multi-omics and clinical data is key to new biomedical discoveries and advancements in precision medicine. However, data integration poses new computational challenges as well as exacerbates the ones associated with single-omics studies. Specialized computational approaches are required to effectively and efficiently perform integrative analysis of biomedical data acquired from diverse modalities. In this review, we discuss state-of-the-art ML-based approaches for tackling five specific computational challenges associated with integrative analysis: curse of dimensionality, data heterogeneity, missing data, class imbalance and scalability issues

    Mass spectral imaging of clinical samples using deep learning

    Get PDF
    A better interpretation of tumour heterogeneity and variability is vital for the improvement of novel diagnostic techniques and personalized cancer treatments. Tumour tissue heterogeneity is characterized by biochemical heterogeneity, which can be investigated by unsupervised metabolomics. Mass Spectrometry Imaging (MSI) combined with Machine Learning techniques have generated increasing interest as analytical and diagnostic tools for the analysis of spatial molecular patterns in tissue samples. Considering the high complexity of data produced by the application of MSI, which can consist of many thousands of spectral peaks, statistical analysis and in particular machine learning and deep learning have been investigated as novel approaches to deduce the relationships between the measured molecular patterns and the local structural and biological properties of the tissues. Machine learning have historically been divided into two main categories: Supervised and Unsupervised learning. In MSI, supervised learning methods may be used to segment tissues into histologically relevant areas e.g. the classification of tissue regions in H&E (Haemotoxylin and Eosin) stained samples. Initial classification by an expert histopathologist, through visual inspection enables the development of univariate or multivariate models, based on tissue regions that have significantly up/down-regulated ions. However, complex data may result in underdetermined models, and alternative methods that can cope with high dimensionality and noisy data are required. Here, we describe, apply, and test a novel diagnostic procedure built using a combination of MSI and deep learning with the objective of delineating and identifying biochemical differences between cancerous and non-cancerous tissue in metastatic liver cancer and epithelial ovarian cancer. The workflow investigates the robustness of single (1D) to multidimensional (3D) tumour analyses and also highlights possible biomarkers which are not accessible from classical visual analysis of the H&E images. The identification of key molecular markers may provide a deeper understanding of tumour heterogeneity and potential targets for intervention.Open Acces

    A scale space approach for unsupervised feature selection in mass spectra classification for ovarian cancer detection

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Mass spectrometry spectra, widely used in proteomics studies as a screening tool for protein profiling and to detect discriminatory signals, are high dimensional data. A large number of local maxima (a.k.a. <it>peaks</it>) have to be analyzed as part of computational pipelines aimed at the realization of efficient predictive and screening protocols. With this kind of data dimensions and samples size the risk of over-fitting and selection bias is pervasive. Therefore the development of bio-informatics methods based on unsupervised feature extraction can lead to general tools which can be applied to several fields of predictive proteomics.</p> <p>Results</p> <p>We propose a method for feature selection and extraction grounded on the theory of multi-scale spaces for high resolution spectra derived from analysis of serum. Then we use support vector machines for classification. In particular we use a database containing 216 samples spectra divided in 115 cancer and 91 control samples. The overall accuracy averaged over a large cross validation study is 98.18. The area under the ROC curve of the best selected model is 0.9962.</p> <p>Conclusion</p> <p>We improved previous known results on the problem on the same data, with the advantage that the proposed method has an unsupervised feature selection phase. All the developed code, as MATLAB scripts, can be downloaded from <url>http://medeaserver.isa.cnr.it/dacierno/spectracode.htm</url></p

    DiviK: Divisive intelligent K-means for hands-free unsupervised clustering in biological big data

    Full text link
    Investigation of molecular heterogeneity provides insights about tumor origin and metabolomics. Increasing amount of data gathered makes manual analyses infeasible. Automated unsupervised learning approaches are exercised for this purpose. However, this kind of analysis requires a lot of experience with setting its hyperparameters and usually an upfront knowledge about the number of expected substructures. Moreover, numerous measured molecules require additional step of feature engineering to provide valuable results. In this work we propose DiviK: a scalable auto-tuning algorithm for segmentation of high-dimensional datasets, and a method to assess the quality of the unsupervised analysis. DiviK is validated on two separate high-throughput datasets acquired by Mass Spectrometry Imaging in 2D and 3D. Proposed algorithm could be one of the default choices to consider during initial exploration of Mass Spectrometry Imaging data. With comparable clustering quality, it brings the possibility of focusing on different levels of dataset nuance, while requires no number of expected structures specified upfront. Finally, due to its simplicity, DiviK is easily generalizable to even more flexible framework, with other clustering algorithm used instead of k-means. Generic implementation is freely available under Apache 2.0 license at https://github.com/gmrukwa/divik.Comment: 8 pages, 7 figure

    A Comprehensive Analysis of MALDI-TOF Spectrometry Data

    Get PDF

    A Review on Data Fusion of Multidimensional Medical and Biomedical Data

    Get PDF
    Data fusion aims to provide a more accurate description of a sample than any one source of data alone. At the same time, data fusion minimizes the uncertainty of the results by combining data from multiple sources. Both aim to improve the characterization of samples and might improve clinical diagnosis and prognosis. In this paper, we present an overview of the advances achieved over the last decades in data fusion approaches in the context of the medical and biomedical fields. We collected approaches for interpreting multiple sources of data in different combinations: image to image, image to biomarker, spectra to image, spectra to spectra, spectra to biomarker, and others. We found that the most prevalent combination is the image-to-image fusion and that most data fusion approaches were applied together with deep learning or machine learning methods
    • …
    corecore