527 research outputs found

    Machine Learning and Integrative Analysis of Biomedical Big Data.

    Get PDF
    Recent developments in high-throughput technologies have accelerated the accumulation of massive amounts of omics data from multiple sources: genome, epigenome, transcriptome, proteome, metabolome, etc. Traditionally, data from each source (e.g., genome) is analyzed in isolation using statistical and machine learning (ML) methods. Integrative analysis of multi-omics and clinical data is key to new biomedical discoveries and advancements in precision medicine. However, data integration poses new computational challenges as well as exacerbates the ones associated with single-omics studies. Specialized computational approaches are required to effectively and efficiently perform integrative analysis of biomedical data acquired from diverse modalities. In this review, we discuss state-of-the-art ML-based approaches for tackling five specific computational challenges associated with integrative analysis: curse of dimensionality, data heterogeneity, missing data, class imbalance and scalability issues

    DeepConv-DTI: Prediction of drug-target interactions via deep learning with convolution on protein sequences

    Full text link
    Identification of drug-target interactions (DTIs) plays a key role in drug discovery. The high cost and labor-intensive nature of in vitro and in vivo experiments have highlighted the importance of in silico-based DTI prediction approaches. In several computational models, conventional protein descriptors are shown to be not informative enough to predict accurate DTIs. Thus, in this study, we employ a convolutional neural network (CNN) on raw protein sequences to capture local residue patterns participating in DTIs. With CNN on protein sequences, our model performs better than previous protein descriptor-based models. In addition, our model performs better than the previous deep learning model for massive prediction of DTIs. By examining the pooled convolution results, we found that our model can detect binding sites of proteins for DTIs. In conclusion, our prediction model for detecting local residue patterns of target proteins successfully enriches the protein features of a raw protein sequence, yielding better prediction results than previous approaches.Comment: 26 pages, 7 figure

    Evaluation of colorectal cancer subtypes and cell lines using deep learning

    Get PDF
    Colorectal cancer (CRC) is a common cancer with a high mortality rate and a rising incidence rate in the developed world. Molecular profiling techniques have been used to better understand the variability between tumors and disease models such as cell lines. To maximize the translatability and clinical relevance of in vitro studies, the selection of optimal cancer models is imperative. We have developed a deep learning-based method to measure the similarity between CRC tumors and disease models such as cancer cell lines. Our method efficiently leverages multiomics data sets containing copy number alterations, gene expression, and point mutations and learns latent factors that describe data in lower dimensions. These latent factors represent the patterns that are clinically relevant and explain the variability of molecular profiles across tumors and cell lines. Using these, we propose refined CRC subtypes and provide best-matching cell lines to different subtypes. These findings are relevant to patient stratification and selection of cell lines for early-stage drug discovery pipelines, biomarker discovery, and target identification

    Evaluation of colorectal cancer subtypes and cell lines using deep learning

    Get PDF
    Colorectal cancer (CRC) is a common cancer with a high mortality rate and rising incidence rate in the developed world. Molecular profiling techniques have been used to study the variability between tumours as well as cancer models such as cell lines, but their translational value is incomplete with current methods. Moreover, first generation computational methods for subtype classification do not make use of multi-omics data in full scale. Drug discovery programs use cell lines as a proxy for human cancers to characterize their molecular makeup and drug response, identify relevant indications and discover biomarkers. In order to maximize the translatability and the clinical relevance of in vitro studies, selection of optimal cancer models is imperative. We present a novel subtype classification method based on deep learning and apply it to classify CRC tumors using multi-omics data, and further to measure the similarity between tumors and disease models such as cancer cell lines. Multi-omics Autoencoder Integration (maui) efficiently leverages data sets containing copy number alterations, gene expression, and point mutations, and learns clinically important patterns (latent factors) across these data types. Using these latent factors, we propose a refinement of the gold-standard CRC subtypes, and propose best-matching cell lines for the different subtypes. These findings are relevant for patient stratification and selection of cell lines for drug discovery pipelines, biomarker discovery, and target identification

    Predicting potential drugs and drug-drug interactions for drug repositioning

    Get PDF
    The purpose of drug repositioning is to predict novel treatments for existing drugs. It saves time and reduces cost in drug discovery, especially in preclinical procedures. In drug repositioning, the challenging objective is to identify reasonable drugs with strong evidence. Recently, benefiting from various types of data and computational strategies, many methods have been proposed to predict potential drugs. Signature-based methods use signatures to describe a specific disease condition and match it with drug-induced transcriptomic profiles. For a disease signature, a list of potential drugs is produced based on matching scores. In many studies, the top drugs on the list are identified as potential drugs and verified in various ways. However, there are a few limitations in existing methods: (1) For many diseases, especially cancers, the tissue samples are often heterogeneous and multiple subtypes are involved. It is challenging to identify a signature from such a group of profiles. (2) Genes are treated as independent elements in many methods, while they may associate with each other in the given condition. (3) The disease signatures cannot identify potential drugs for personalized treatments. In order to address those limitations, I propose three strategies in this dissertation. (1) I employ clustering methods to identify sub-signatures from the heterogeneous dataset, then use a weighting strategy to concatenate them together. (2) I utilize human protein complex (HPC) information to reflect the dependencies among genes and identify an HPC signature to describe a specific type of cancer. (3) I use an HPC strategy to identify signatures for drugs, then predict a list of potential drugs for each patient. Besides predicting potential drugs directly, more indications are essential to enhance my understanding in drug repositioning studies. The interactions between biological and biomedical entities, such as drug-drug interactions (DDIs) and drug-target interactions (DTIs), help study mechanisms behind the repurposed drugs. Machine learning (ML), especially deep learning (DL), are frontier methods in predicting those interactions. Network strategies, such as constructing a network from interactions and studying topological properties, are commonly used to combine with other methods to make predictions. However, the interactions may have different functions, and merging them in a single network may cause some biases. In order to solve it, I construct two networks for two types of DDIs and employ a graph convolutional network (GCN) model to concatenate them together. In this dissertation, the first chapter introduces background information, objectives of studies, and structure of the dissertation. After that, a comprehensive review is provided in Chapter 2. Biological databases, methods and applications in drug repositioning studies, and evaluation metrics are discussed. I summarize three application scenarios in Chapter 2. The first method proposed in Chapter 3 considers the issue of identifying a cancer gene signature and predicting potential drugs. The k-means clustering method is used to identify highly reliable gene signatures. The identified signature is used to match drug profiles and identify potential drugs for the given disease. The second method proposed in Chapter 4 uses human protein complex (HPC) information to identify a protein complex signature, instead of a gene signature. This strategy improves the prediction accuracy in the experiments of cancers. Chapter 5 introduces the signature-based method in personalized cancer medicine. The profiles of a given drug are used to identify a drug signature, under the HPC strategy. Each patient has a profile, which is matched with the drug signature. Each patient has a different list of potential drugs. Chapter 6 propose a graph convolutional network with multi-kernel to predict DDIs. This method constructs two DDI kernels and concatenates them in the GCN model. It achieves higher performance in predicting DDIs than three state-of-the-art methods. In summary, this dissertation has proposed several computational algorithms for drug repositioning. Experimental results have shown that the proposed methods can achieve very good performance