5,763 research outputs found

    Transcriptomic data integration for precision medicine in leukemia

    Get PDF
    This thesis is comprised of three studies demonstrating the application of different statistical and bioinformatic approaches to address distinct challenges of implementing precision medicine strategies for hematological malignancies. The approaches focus on the analysis of next-generation sequencing data, including both genomic and transcriptomics, to deconvolute disease biology and underlying mechanisms of drug sensitivities and resistance. The outcomes of the studies have clinical implications for advancing current diagnosis and treatment paradigms in patients with hematological diseases. Study I, RNA sequencing has not been widely adopted in a clinical diagnostic setting due to continuous development and lack of standardization. Here, the aim was to evaluate the efficiency of two different RNA-seq library preparation protocols applied to cells collected from acute myeloid leukemia (AML) and acute lymphoblastic leukemia (ALL) patients. The poly-A-tailed mRNA selection (PA) and ribo- depletion (RD) based RNA-seq library preparation protocols were compared and evaluated for detection of gene fusions, variant calling and gene expression profiling. Overall, both protocols produced broadly consistent results and similar outcomes. However, the PA protocol was more efficient in quantifying expression of leukemia marker genes and drug targets. It also provided higher sensitivity and specificity for expression-based classification of leukemia. In contrast, the RD protocol was more suitable for gene fusion detection and captured a greater number of transcripts. Importantly, high technical variations were observed in samples from two leukemia patient cases suggesting further development of strategies for transcriptomic quantification and data analysis. Study II, the BCL-2 inhibitor venetoclax is an approved and effective agent in combination with hypomethylating agents or low dose cytarabine for AML patients, unfit for intensive induction chemotherapy. However, a limited number of patients responding to venetoclax and development of resistance to the treatment presents a challenge for using the drug to benefit the majority of the AML patients. The aim was to investigate genomic and transcriptomic biomarkers for venetoclax sensitivity and enable identification of the patients who are most responsive to venetoclax treatment. We found that venetoclax sensitive samples are enriched with WT1 and IDH1/IDH2 mutations. Intriguingly, HOX family genes, including HOXB9, HOXA5, HOXB3, HOXB4, were found to be significantly overexpressed in venetoclax sensitive patients. Thus, these HOX-cluster genes expression biomarkers can be explored in a clinical trial setting to stratify AML patients responding to venetoclax based therapies. Study III, venetoclax treatment does not benefit all AML patients that demands identifying biomarkers to exclude the patients from venetoclax based therapies. The aim was to investigate transcriptomic biomarkers for ex vivo venetoclax resistance in AML patients. The correlation of ex vivo venetoclax response with gene expression profiles using a machine learning approach revealed significant overexpression of S100 family genes, S100A8 and S100A9. Moreover, high expression ofS100A9was found to be associated with birabresib (BET inhibitor) sensitivity. The overexpression of S100A8 and S100A9 could potentially be used to detect and monitor venetoclax resistance. The combination of BCL-2 and BET inhibitors may sensitize AML cells to venetoclax upon BET inhibition and block leukemic cell survival.In this thesis, the aim was to utilize gene expression information for advanced precision medicine outcomes in patients with hematological malignancies. In the study, I, the contemporary mainstream library preparation protocols, Ribo-depletion and PolyA enrichment used for RNA sequencing, were compared in order to select the protocol that suffices the goal of the experiment, especially in patients with acute leukemias. In study II, we applied bioinformatics approaches to identify IDH1/2 mutation and HOX family gene expression correlated with ex vivo sensitivity to BCL-2 inhibitor venetoclax in acute myeloid leukemia (AML) patients. In study III, statistical and machine learning methods were implemented to identify S100A8/A9 gene expression biomarkers for ex vivo resistance to venetoclax in AML patients. In summary, this thesis addresses the challenges of utilizing gene expression information to stratify patients based on biomarkers to promote precision medicine practice in hematological malignancies

    Machine Learning and Integrative Analysis of Biomedical Big Data.

    Get PDF
    Recent developments in high-throughput technologies have accelerated the accumulation of massive amounts of omics data from multiple sources: genome, epigenome, transcriptome, proteome, metabolome, etc. Traditionally, data from each source (e.g., genome) is analyzed in isolation using statistical and machine learning (ML) methods. Integrative analysis of multi-omics and clinical data is key to new biomedical discoveries and advancements in precision medicine. However, data integration poses new computational challenges as well as exacerbates the ones associated with single-omics studies. Specialized computational approaches are required to effectively and efficiently perform integrative analysis of biomedical data acquired from diverse modalities. In this review, we discuss state-of-the-art ML-based approaches for tackling five specific computational challenges associated with integrative analysis: curse of dimensionality, data heterogeneity, missing data, class imbalance and scalability issues

    Evidence-Based Detection of Pancreatic Canc

    Get PDF
    This study is an effort to develop a tool for early detection of pancreatic cancer using evidential reasoning. An evidential reasoning model predicts the likelihood of an individual developing pancreatic cancer by processing the outputs of a Support Vector Classifier, and other input factors such as smoking history, drinking history, sequencing reads, biopsy location, family and personal health history. Certain features of the genomic data along with the mutated gene sequence of pancreatic cancer patients was obtained from the National Cancer Institute (NIH) Genomic Data Commons (GDC). This data was used to train the SVC. A prediction accuracy of ~85% with a ROC AUC of 83.4% was achieved. Synthetic data was assembled in different combinations to evaluate the working of evidential reasoning model. Using this, variations in the belief interval of developing pancreatic cancer are observed. When the model is provided with an input of high smoking history and family history of cancer, an increase in the evidential reasoning interval in belief of pancreatic cancer and support in the machine learning model prediction is observed. Likewise, decrease in the quantity of genetic material and an irregularity in the cellular structure near the pancreas increases support in the machine learning classifier’s prediction of having pancreatic cancer. This evidence-based approach is an attempt to diagnose the pancreatic cancer at a premalignant stage. Future work includes using the real sequencing reads as well as accurate habits and real medical and family history of individuals to increase the efficiency of the evidential reasoning model. Next steps also involve trying out different machine learning models to observe their performance on the dataset considered in this study

    The Translational Status of Cancer Liquid Biopsies

    Get PDF
    Precision oncology aims to tailor clinical decisions specifically to patients with the objective of improving treatment outcomes. This can be achieved by leveraging omics information for accurate molecular characterization of tumors. Tumor tissue biopsies are currently the main source of information for molecular profiling. However, biopsies are invasive and limited in resolving spatiotemporal heterogeneity in tumor tissues. Alternative non-invasive liquid biopsies can exploit patient’s body fluids to access multiple layers of tumor-specific biological information (genomes, epigenomes, transcriptomes, proteomes, metabolomes, circulating tumor cells, and exosomes). Analysis and integration of these large and diverse datasets using statistical and machine learning approaches can yield important insights into tumor biology and lead to discovery of new diagnostic, predictive, and prognostic biomarkers. Translation of these new diagnostic tools into standard clinical practice could transform oncology, as demonstrated by a number of liquid biopsy assays already entering clinical use. In this review, we highlight successes and challenges facing the rapidly evolving field of cancer biomarker research. Lay Summary: Precision oncology aims to tailor clinical decisions specifically to patients with the objective of improving treatment outcomes. The discovery of biomarkers for precision oncology has been accelerated by high-throughput experimental and computational methods, which can inform fine-grained characterization of tumors for clinical decision-making. Moreover, advances in the liquid biopsy field allow non-invasive sampling of patient’s body fluids with the aim of analyzing circulating biomarkers, obviating the need for invasive tumor tissue biopsies. In this review, we highlight successes and challenges facing the rapidly evolving field of liquid biopsy cancer biomarker research

    Machine Learning Approaches for Cancer Analysis

    Get PDF
    In addition, we propose many machine learning models that serve as contributions to solve a biological problem. First, we present Zseq, a linear time method that identifies the most informative genomic sequences and reduces the number of biased sequences, sequence duplications, and ambiguous nucleotides. Zseq finds the complexity of the sequences by counting the number of unique k-mers in each sequence as its corresponding score and also takes into the account other factors, such as ambiguous nucleotides or high GC-content percentage in k-mers. Based on a z-score threshold, Zseq sweeps through the sequences again and filters those with a z-score less than the user-defined threshold. Zseq is able to provide a better mapping rate; it reduces the number of ambiguous bases significantly in comparison with other methods. Evaluation of the filtered reads has been conducted by aligning the reads and assembling the transcripts using the reference genome as well as de novo assembly. The assembled transcripts show a better discriminative ability to separate cancer and normal samples in comparison with another state-of-the-art method. Studying the abundance of select mRNA species throughout prostate cancer progression may provide some insight into the molecular mechanisms that advance the disease. In the second contribution of this dissertation, we reveal that the combination of proper clustering, distance function and Index validation for clusters are suitable in identifying outlier transcripts, which show different trending than the majority of the transcripts, the trending of the transcript is the abundance throughout different stages of prostate cancer. We compare this model with standard hierarchical time-series clustering method based on Euclidean distance. Using time-series profile hierarchical clustering methods, we identified stage-specific mRNA species termed outlier transcripts that exhibit unique trending patterns as compared to most other transcripts during disease progression. This method is able to identify those outliers rather than finding patterns among the trending transcripts compared to the hierarchical clustering method based on Euclidean distance. A wet-lab experiment on a biomarker (CAM2G gene) confirmed the result of the computational model. Genes related to these outlier transcripts were found to be strongly associated with cancer, and in particular, prostate cancer. Further investigation of these outlier transcripts in prostate cancer may identify them as potential stage-specific biomarkers that can predict the progression of the disease. Breast cancer, on the other hand, is a widespread type of cancer in females and accounts for a lot of cancer cases and deaths in the world. Identifying the subtype of breast cancer plays a crucial role in selecting the best treatment. In the third contribution, we propose an optimized hierarchical classification model that is used to predict the breast cancer subtype. Suitable filter feature selection methods and new hybrid feature selection methods are utilized to find discriminative genes. Our proposed model achieves 100% accuracy for predicting the breast cancer subtypes using the same or even fewer genes. Studying breast cancer survivability among different patients who received various treatments may help understand the relationship between the survivability and treatment therapy based on gene expression. In the fourth contribution, we have built a classifier system that predicts whether a given breast cancer patient who underwent some form of treatment, which is either hormone therapy, radiotherapy, or surgery will survive beyond five years after the treatment therapy. Our classifier is a tree-based hierarchical approach that partitions breast cancer patients based on survivability classes; each node in the tree is associated with a treatment therapy and finds a predictive subset of genes that can best predict whether a given patient will survive after that particular treatment. We applied our tree-based method to a gene expression dataset that consists of 347 treated breast cancer patients and identified potential biomarker subsets with prediction accuracies ranging from 80.9% to 100%. We have further investigated the roles of many biomarkers through the literature. Studying gene expression through various time intervals of breast cancer survival may provide insights into the recovery of the patients. Discovery of gene indicators can be a crucial step in predicting survivability and handling of breast cancer patients. In the fifth contribution, we propose a hierarchical clustering method to separate dissimilar groups of genes in time-series data as outliers. These isolated outliers, genes that trend differently from other genes, can serve as potential biomarkers of breast cancer survivability. In the last contribution, we introduce a method that uses machine learning techniques to identify transcripts that correlate with prostate cancer development and progression. We have isolated transcripts that have the potential to serve as prognostic indicators and may have significant value in guiding treatment decisions. Our study also supports PTGFR, NREP, scaRNA22, DOCK9, FLVCR2, IK2F3, USP13, and CLASP1 as potential biomarkers to predict prostate cancer progression, especially between stage II and subsequent stages of the disease
    • …
    corecore