3,005 research outputs found

    Machine Learning and Integrative Analysis of Biomedical Big Data.

    Get PDF
    Recent developments in high-throughput technologies have accelerated the accumulation of massive amounts of omics data from multiple sources: genome, epigenome, transcriptome, proteome, metabolome, etc. Traditionally, data from each source (e.g., genome) is analyzed in isolation using statistical and machine learning (ML) methods. Integrative analysis of multi-omics and clinical data is key to new biomedical discoveries and advancements in precision medicine. However, data integration poses new computational challenges as well as exacerbates the ones associated with single-omics studies. Specialized computational approaches are required to effectively and efficiently perform integrative analysis of biomedical data acquired from diverse modalities. In this review, we discuss state-of-the-art ML-based approaches for tackling five specific computational challenges associated with integrative analysis: curse of dimensionality, data heterogeneity, missing data, class imbalance and scalability issues

    Data-Adaptive Kernel Support Vector Machine

    Get PDF
    In this thesis, we propose the data-adaptive kernel Support Vector Machine (SVM), a new method with a data-driven scaling kernel function based on real data sets. This two-stage approach of kernel function scaling can enhance the accuracy of a support vector machine, especially when the data are imbalanced. Followed by the standard SVM procedure in the first stage, the proposed method locally adapts the kernel function to data locations based on the skewness of the class outcomes. In the second stage, the decision rule is constructed with the data-adaptive kernel function and is used as the classifier. This process enlarges the magnification effect directly on the Riemannian manifold within the feature space rather than the input space. The proposed data-adaptive kernel SVM technique is applied in the binary classification, and is extended to the multi-class situations when imbalance is a main concern. We conduct extensive simulation studies to assess the performance of the proposed methods, and the prostate cancer image study is employed as an illustration. The data-adaptive kernel is further applied in feature selection process. We propose the data-adaptive kernel-penalized SVM, a new method of simultaneous feature selection and classification by penalizing data-adaptive kernels in SVMs. Instead of penalizing the standard cost function of SVMs in the usual way, the penalty will be directly added to the dual objective function that contains the data-adaptive kernel. Classification results with sparse features selected can be obtained simultaneously. Different penalty terms in the data-adaptive kernel-penalized SVM will be compared. The oracle property of the estimator is examined. We conduct extensive simulation studies to assess the performance of all the proposed methods, and employ the method on a breast cancer data set as an illustration. The data-adaptive kernel is further applied in feature selection process. We propose the data-adaptive kernel-penalized SVM, a new method of simultaneous feature selection and classification by penalizing data-adaptive kernels in SVMs. Instead of penalizing the standard cost function of SVMs in the usual way, the penalty will be directly added to the dual objective function that contains the data-adaptive kernel. Classification results with sparse features selected can be obtained simultaneously. Different penalty terms in the data-adaptive kernel-penalized SVM will be compared. The oracle property of the estimator is examined. We conduct extensive simulation studies to assess the performance of all the proposed methods, and employ the method on a breast cancer data set as an illustration
    • …
    corecore