271 research outputs found

    Machine Learning Approaches for Cancer Analysis

    Get PDF
    In addition, we propose many machine learning models that serve as contributions to solve a biological problem. First, we present Zseq, a linear time method that identifies the most informative genomic sequences and reduces the number of biased sequences, sequence duplications, and ambiguous nucleotides. Zseq finds the complexity of the sequences by counting the number of unique k-mers in each sequence as its corresponding score and also takes into the account other factors, such as ambiguous nucleotides or high GC-content percentage in k-mers. Based on a z-score threshold, Zseq sweeps through the sequences again and filters those with a z-score less than the user-defined threshold. Zseq is able to provide a better mapping rate; it reduces the number of ambiguous bases significantly in comparison with other methods. Evaluation of the filtered reads has been conducted by aligning the reads and assembling the transcripts using the reference genome as well as de novo assembly. The assembled transcripts show a better discriminative ability to separate cancer and normal samples in comparison with another state-of-the-art method. Studying the abundance of select mRNA species throughout prostate cancer progression may provide some insight into the molecular mechanisms that advance the disease. In the second contribution of this dissertation, we reveal that the combination of proper clustering, distance function and Index validation for clusters are suitable in identifying outlier transcripts, which show different trending than the majority of the transcripts, the trending of the transcript is the abundance throughout different stages of prostate cancer. We compare this model with standard hierarchical time-series clustering method based on Euclidean distance. Using time-series profile hierarchical clustering methods, we identified stage-specific mRNA species termed outlier transcripts that exhibit unique trending patterns as compared to most other transcripts during disease progression. This method is able to identify those outliers rather than finding patterns among the trending transcripts compared to the hierarchical clustering method based on Euclidean distance. A wet-lab experiment on a biomarker (CAM2G gene) confirmed the result of the computational model. Genes related to these outlier transcripts were found to be strongly associated with cancer, and in particular, prostate cancer. Further investigation of these outlier transcripts in prostate cancer may identify them as potential stage-specific biomarkers that can predict the progression of the disease. Breast cancer, on the other hand, is a widespread type of cancer in females and accounts for a lot of cancer cases and deaths in the world. Identifying the subtype of breast cancer plays a crucial role in selecting the best treatment. In the third contribution, we propose an optimized hierarchical classification model that is used to predict the breast cancer subtype. Suitable filter feature selection methods and new hybrid feature selection methods are utilized to find discriminative genes. Our proposed model achieves 100% accuracy for predicting the breast cancer subtypes using the same or even fewer genes. Studying breast cancer survivability among different patients who received various treatments may help understand the relationship between the survivability and treatment therapy based on gene expression. In the fourth contribution, we have built a classifier system that predicts whether a given breast cancer patient who underwent some form of treatment, which is either hormone therapy, radiotherapy, or surgery will survive beyond five years after the treatment therapy. Our classifier is a tree-based hierarchical approach that partitions breast cancer patients based on survivability classes; each node in the tree is associated with a treatment therapy and finds a predictive subset of genes that can best predict whether a given patient will survive after that particular treatment. We applied our tree-based method to a gene expression dataset that consists of 347 treated breast cancer patients and identified potential biomarker subsets with prediction accuracies ranging from 80.9% to 100%. We have further investigated the roles of many biomarkers through the literature. Studying gene expression through various time intervals of breast cancer survival may provide insights into the recovery of the patients. Discovery of gene indicators can be a crucial step in predicting survivability and handling of breast cancer patients. In the fifth contribution, we propose a hierarchical clustering method to separate dissimilar groups of genes in time-series data as outliers. These isolated outliers, genes that trend differently from other genes, can serve as potential biomarkers of breast cancer survivability. In the last contribution, we introduce a method that uses machine learning techniques to identify transcripts that correlate with prostate cancer development and progression. We have isolated transcripts that have the potential to serve as prognostic indicators and may have significant value in guiding treatment decisions. Our study also supports PTGFR, NREP, scaRNA22, DOCK9, FLVCR2, IK2F3, USP13, and CLASP1 as potential biomarkers to predict prostate cancer progression, especially between stage II and subsequent stages of the disease

    Machine Learning Approaches for Identifying Cancer Biomarkers Using Next Generation Sequencing

    Get PDF
    Identifying biomarkers that can be used to classify certain disease stages or predict when a disease becomes more aggressive is one of the most important applications of machine learning. Next generation sequencing (NGS) is a state-of-the-art method that enables fast sequencing of DNA or RNA samples. The output usually contains a very large file that consists of base pairs of DNA or RNA. The generated data can be analyzed to provide gene expression, chromosome counting, detection of mutations on the genes, and detecting levels of copy number variations or alterations in specific genes, just as examples. NGS is leading the way to explore the human genome, enabling the future of personalized medicine. In this thesis, a demonstration is done on how machine learning is used extensively to identify genes that can be used to predict prostate cancer stages with very high accuracy, using gene expression. We have also been successful in predicting the location of prostate tumors based on gene expression. In addition, traditional biomarker identification approaches, typically, use machine learning techniques to identify a number of genes and macromolecules as biomarkers that can be used to diagnose specific diseases or states of diseases with very high accuracy, using molecular measurements such as mutations, gene expression, copy number variations, and others. However, experts\u27 opinions and knowledge is required to validate such findings. We, therefore, also introduce a new machine learning model that incorporates a knowledge-assisted system used to integrate the findings of the DisGeNET database, which is a framework that contains proven relationships among diseases and genes. The machine learning pipeline starts by reducing the number of features using a filter-based feature selection method. The DisGeNET database is used to score each gene related to the given cancer name. Then, a wrapper-based feature-selection algorithm picks the best set of genes with the highest classification accuracy. The method has been able to retrieve key genes from multiple data sets that classify with very high accuracy, while being biologically relevant, and no human intervention needed. Initial results provide a high area-under-the-curve with a handful of genes that are already proven to be related to the relevant disease and state based on the latest published medical findings. The proposed methods results provide biomarkers that can be verified in wet lab environments and can then be further analyzed and studied for diagnostic purposes

    Identification and validation of novel prostate cancer biomarkers

    Get PDF
    Prostate cancer (PCa) has emerged as the most commonly diagnosed lethal cancer in European men. PCa is a heterogeneous cancer that in the majority of the cases is slow growing: consequently, these patients would not need any medical treatment. Currently, the measurement of prostate-specific antigen (PSA) from blood by immunoassay followed by digital rectal examination and a pathological examination of prostate tissue biopsies are the most widely used methods in the diagnosis of PCa. These methods suffer from a lack of sensitivity and specificity that may cause either missed cancers or overtreatment as a consequence of over-diagnosis. Therefore, more reliable biomarkers are needed for a better discrimination between indolent and potentially aggressive cancers. The aim of this thesis was the identification and validation of novel biomarkers for PCa. The mRNA expression level of 14 genes including AMACR, AR, PCA3, SPINK1, TMPRSS2-ERG, KLK3, ACSM1, CACNA1D, DLX1, LMNB1, PLA2G7, RHOU, SPON2, and TDRD1 was measured by a truly quantitative reverse transcription PCR in different prostate tissue samples from men with and without PCa. For the last eight genes the function of the genes in PCa progression was studied by a specific siRNA knockdown in PC-3 and VCaP cells. The results from radical prostatectomy and cystoprostatectomy samples showed statistically significant overexpression for all the target genes, except for KLK3 in men with PCa compared with men without PCa. Statistically significant difference was also observed in low versus high Gleason grade tumors (for PLA2G7), PSA relapse versus no relapse (for SPON2), and low versus high TNM stages (for CACNA1D and DLX1). Functional studies and siRNA silencing results revealed a cytotoxicity effect for the knock-down of DLX1, PLA2G7, and RHOU, and altered tumor cell invasion for PLA2G7, RHOU, ACSM1, and CACNA1D knock-down in 3D conditions. In addition, effects on tumor cell motility were observed after silencing PLA2G7 and RHOU in 2D monolayer cultures. Altogether, these findings indicate the possibility of utilizing these new markers as diagnostic and prognostic markers, and they may also represent therapeutic targets for PCa.Siirretty Doriast

    Prostate cancer radiogenomics—from imaging to molecular characterization

    Get PDF
    Radiomics and genomics represent two of the most promising fields of cancer research, designed to improve the risk stratification and disease management of patients with prostate cancer (PCa). Radiomics involves a conversion of imaging derivate quantitative features using manual or automated algorithms, enhancing existing data through mathematical analysis. This could increase the clinical value in PCa management. To extract features from imaging methods such as magnetic resonance imaging (MRI), the empiric nature of the analysis using machine learning and artificial intelligence could help make the best clinical decisions. Genomics information can be explained or decoded by radiomics. The development of methodologies can create more-efficient predictive models and can better characterize the molecular features of PCa. Additionally, the identification of new imaging biomarkers can overcome the known heterogeneity of PCa, by non-invasive radio-logical assessment of the whole specific organ. In the future, the validation of recent findings, in large, randomized cohorts of PCa patients, can establish the role of radiogenomics. Briefly, we aimed to review the current literature of highly quantitative and qualitative results from well-de-signed studies for the diagnoses, treatment, and follow-up of prostate cancer, based on radiomics, genomics and radiogenomics research

    Urological Cancer 2020

    Get PDF
    This Urological Cancer 2020 collection contains a set of multidisciplinary contributions to the extraordinary heterogeneity of tumor mechanisms, diagnostic approaches, and therapies of the renal, urinary tract, and prostate cancers, with the intention of offering to interested readers a representative snapshot of the status of urological research

    The Translational Status of Cancer Liquid Biopsies

    Get PDF
    Precision oncology aims to tailor clinical decisions specifically to patients with the objective of improving treatment outcomes. This can be achieved by leveraging omics information for accurate molecular characterization of tumors. Tumor tissue biopsies are currently the main source of information for molecular profiling. However, biopsies are invasive and limited in resolving spatiotemporal heterogeneity in tumor tissues. Alternative non-invasive liquid biopsies can exploit patient’s body fluids to access multiple layers of tumor-specific biological information (genomes, epigenomes, transcriptomes, proteomes, metabolomes, circulating tumor cells, and exosomes). Analysis and integration of these large and diverse datasets using statistical and machine learning approaches can yield important insights into tumor biology and lead to discovery of new diagnostic, predictive, and prognostic biomarkers. Translation of these new diagnostic tools into standard clinical practice could transform oncology, as demonstrated by a number of liquid biopsy assays already entering clinical use. In this review, we highlight successes and challenges facing the rapidly evolving field of cancer biomarker research. Lay Summary: Precision oncology aims to tailor clinical decisions specifically to patients with the objective of improving treatment outcomes. The discovery of biomarkers for precision oncology has been accelerated by high-throughput experimental and computational methods, which can inform fine-grained characterization of tumors for clinical decision-making. Moreover, advances in the liquid biopsy field allow non-invasive sampling of patient’s body fluids with the aim of analyzing circulating biomarkers, obviating the need for invasive tumor tissue biopsies. In this review, we highlight successes and challenges facing the rapidly evolving field of liquid biopsy cancer biomarker research

    Identification of prostate cancer diagnostic and prognostic biomarkers in urine expression data with a focus on extracellular vesicles

    Get PDF
    Prostate Cancer (PCa) is a major clinical problem worldwide with considerable variability in clinical outcome of patients. PCa diagnostics and prognostics currently lack specific and sensitive clinical biomarkers and treatment is not well individualised. The PCA3 test, amongst others, highlights the utility of urine in PCa diagnostics and prognostics. Urine contains cells and extracellular vesicles (EV) that originate in the prostate. There are many areas of the PCa clinical process that could be aided with an expression based urine test, including diagnosis, prognosis and response to therapy. NanoString data (167 transcripts) from 485 EV RNA samples were collected from PCa patients and used to build models that would aid in PCa diagnosis and prognosis i.e. i) PCa (low- (L), intermediate-(I), and high-risk(H)) vs CB (Clinically Benign/No evidence for cancer), ii) high-risk PCa vs CB, and iii) trend in expression across CB>L>I>H. These models were validated in 235 samples, with AUCs of i) 0.851 ii) 0.897 and iii) 0.709, respectively. The potential of using urine EVs to predict patient response to treatments was also investigated. In a pilot data set a signature of seven transcripts was identified that could optimally predict progression of patients on hormone therapy (p = 2.3x10-05; HR = 0.04288). Models were also built using NanoString data from 92 cell RNA samples. Intercomparing expression data from matched cell and EV fractions of urine showed that transcripts significantly higher in the EV samples were associated with the prostate, PCa and cancer in general, supporting them as a viable source of biomarkers in the clinical management of PCa. In conclusion my analyses have demonstrated the utility of examining urine RNA for the diagnosis and prognosis of PCa. My studies have formed the basis of the production of a Prostate Urine Risk test that is currently under development at UEA

    Discovering cancer-associated transcripts by RNA sequencing

    Full text link
    High-throughput sequencing of poly-adenylated RNA (RNA-Seq) in human cancers shows remarkable potential to identify uncharacterized aspects of tumor biology, including gene fusions with therapeutic significance and disease markers such as long non-coding RNA (lncRNA) species. However, the analysis of RNA-Seq data places unprecedented demands upon computational infrastructures and algorithms, requiring novel bioinformatics approaches. To meet these demands, we present two new open-source software packages - ChimeraScan and AssemblyLine - designed to detect gene fusion events and novel lncRNAs, respectively. RNA-Seq studies utilizing ChimeraScan led to discoveries of new families of recurrent gene fusions in breast cancers and solitary fibrous tumors. Further, ChimeraScan was one of the key components of the repertoire of computational tools utilized in data analysis for MI-ONCOSEQ, a clinical sequencing initiative to identify potentially informative and actionable mutations in cancer patients’ tumors. AssemblyLine, by contrast, reassembles RNA sequencing data into full-length transcripts ab initio. In head-to-head analyses AssemblyLine compared favorably to existing ab initio approaches and unveiled abundant novel lncRNAs, including antisense and intronic lncRNAs disregarded by previous studies. Moreover, we used AssemblyLine to define the prostate cancer transcriptome from a large patient cohort and discovered myriad lncRNAs, including 121 prostate cancer-associated transcripts (PCATs) that could potentially serve as novel disease markers. Functional studies of two PCATs - PCAT-1 and SChLAP1 - revealed cancer-promoting roles for these lncRNAs. PCAT1, a lncRNA expressed from chromosome 8q24, promotes cell proliferation and represses the tumor suppressor BRCA2. SChLAP1, located in a chromosome 2q31 ‘gene desert’, independently predicts poor patient outcomes, including metastasis and cancer-specific mortality. Mechanistically, SChLAP1 antagonizes the genome-wide localization and regulatory functions of the SWI/SNF chromatin-modifying complex. Collectively, this work demonstrates the utility of ChimeraScan and AssemblyLine as open-source bioinformatics tools. Our applications of ChimeraScan and AssemblyLine led to the discovery of new classes of recurrent and clinically informative gene fusions, and established a prominent role for lncRNAs in coordinating aggressive prostate cancer, respectively. We expect that the methods and findings described herein will establish a precedent for RNA-Seq-based studies in cancer biology and assist the research community at large in making similar discoveries.PHDBioinformaticsUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/120814/1/mkiyer_1.pd
    • …
    corecore