45,404 research outputs found

    Bioinformatics and Machine Learning for Cancer Biology

    Get PDF
    Cancer is a leading cause of death worldwide, claiming millions of lives each year. Cancer biology is an essential research field to understand how cancer develops, evolves, and responds to therapy. By taking advantage of a series of “omics” technologies (e.g., genomics, transcriptomics, and epigenomics), computational methods in bioinformatics and machine learning can help scientists and researchers to decipher the complexity of cancer heterogeneity, tumorigenesis, and anticancer drug discovery. Particularly, bioinformatics enables the systematic interrogation and analysis of cancer from various perspectives, including genetics, epigenetics, signaling networks, cellular behavior, clinical manifestation, and epidemiology. Moreover, thanks to the influx of next-generation sequencing (NGS) data in the postgenomic era and multiple landmark cancer-focused projects, such as The Cancer Genome Atlas (TCGA) and Clinical Proteomic Tumor Analysis Consortium (CPTAC), machine learning has a uniquely advantageous role in boosting data-driven cancer research and unraveling novel methods for the prognosis, prediction, and treatment of cancer

    A review on recent progress in machine learning and deep learning methods for cancer classification on gene expression data

    Get PDF
    Data-driven model with predictive ability are important to be used in medical and healthcare. However, the most challenging task in predictive modeling is to construct a prediction model, which can be addressed using machine learning (ML) methods. The methods are used to learn and trained the model using a gene expression dataset without being programmed explicitly. Due to the vast amount of gene expression data, this task becomes complex and time consuming. This paper provides a recent review on recent progress in ML and deep learning (DL) for cancer classification, which has received increasing attention in bioinformatics and computational biology. The development of cancer classification methods based on ML and DL is mostly focused on this review. Although many methods have been applied to the cancer classification problem, recent progress shows that most of the successful techniques are those based on supervised and DL methods. In addition, the sources of the healthcare dataset are also described. The development of many machine learning methods for insight analysis in cancer classification has brought a lot of improvement in healthcare. Currently, it seems that there is highly demanded further development of efficient classification methods to address the expansion of healthcare applications

    Kernel methods in genomics and computational biology

    Full text link
    Support vector machines and kernel methods are increasingly popular in genomics and computational biology, due to their good performance in real-world applications and strong modularity that makes them suitable to a wide range of problems, from the classification of tumors to the automatic annotation of proteins. Their ability to work in high dimension, to process non-vectorial data, and the natural framework they provide to integrate heterogeneous data are particularly relevant to various problems arising in computational biology. In this chapter we survey some of the most prominent applications published so far, highlighting the particular developments in kernel methods triggered by problems in biology, and mention a few promising research directions likely to expand in the future

    Pathway-Based Genomics Prediction using Generalized Elastic Net.

    Get PDF
    We present a novel regularization scheme called The Generalized Elastic Net (GELnet) that incorporates gene pathway information into feature selection. The proposed formulation is applicable to a wide variety of problems in which the interpretation of predictive features using known molecular interactions is desired. The method naturally steers solutions toward sets of mechanistically interlinked genes. Using experiments on synthetic data, we demonstrate that pathway-guided results maintain, and often improve, the accuracy of predictors even in cases where the full gene network is unknown. We apply the method to predict the drug response of breast cancer cell lines. GELnet is able to reveal genetic determinants of sensitivity and resistance for several compounds. In particular, for an EGFR/HER2 inhibitor, it finds a possible trans-differentiation resistance mechanism missed by the corresponding pathway agnostic approach

    Stable Feature Selection for Biomarker Discovery

    Full text link
    Feature selection techniques have been used as the workhorse in biomarker discovery applications for a long time. Surprisingly, the stability of feature selection with respect to sampling variations has long been under-considered. It is only until recently that this issue has received more and more attention. In this article, we review existing stable feature selection methods for biomarker discovery using a generic hierarchal framework. We have two objectives: (1) providing an overview on this new yet fast growing topic for a convenient reference; (2) categorizing existing methods under an expandable framework for future research and development

    Machine Learning and Integrative Analysis of Biomedical Big Data.

    Get PDF
    Recent developments in high-throughput technologies have accelerated the accumulation of massive amounts of omics data from multiple sources: genome, epigenome, transcriptome, proteome, metabolome, etc. Traditionally, data from each source (e.g., genome) is analyzed in isolation using statistical and machine learning (ML) methods. Integrative analysis of multi-omics and clinical data is key to new biomedical discoveries and advancements in precision medicine. However, data integration poses new computational challenges as well as exacerbates the ones associated with single-omics studies. Specialized computational approaches are required to effectively and efficiently perform integrative analysis of biomedical data acquired from diverse modalities. In this review, we discuss state-of-the-art ML-based approaches for tackling five specific computational challenges associated with integrative analysis: curse of dimensionality, data heterogeneity, missing data, class imbalance and scalability issues
    • …
    corecore