5,784 research outputs found

    Machine Learning and Integrative Analysis of Biomedical Big Data.

    Get PDF
    Recent developments in high-throughput technologies have accelerated the accumulation of massive amounts of omics data from multiple sources: genome, epigenome, transcriptome, proteome, metabolome, etc. Traditionally, data from each source (e.g., genome) is analyzed in isolation using statistical and machine learning (ML) methods. Integrative analysis of multi-omics and clinical data is key to new biomedical discoveries and advancements in precision medicine. However, data integration poses new computational challenges as well as exacerbates the ones associated with single-omics studies. Specialized computational approaches are required to effectively and efficiently perform integrative analysis of biomedical data acquired from diverse modalities. In this review, we discuss state-of-the-art ML-based approaches for tackling five specific computational challenges associated with integrative analysis: curse of dimensionality, data heterogeneity, missing data, class imbalance and scalability issues

    Bioinformatic Analysis for the Validation of Novel Biomarkers for Cancer Diagnosis and Drug Sensitivity

    Get PDF
    Background: The genetic control of tumour progression presents the opportunity for bioinformatics and gene expression data to be used as a basis for tumour grading. The development of a genetic signature based on microarray data allows for the development of personalised chemotherapeutic regimes. Method: ONCOMINE was utilised to create a genetic signature for ovarian serous adenocarcinoma and to compare the expression of genes between normal ovarian and cancerous cells. Ingenuity Pathways Analysis was also utilised to develop molecular pathways and observe interactions with exogenous molecules. Results: The gene signature demonstrated 98.6% predictive capability for the differentiation between borderline ovarian serous neoplasm and ovarian serous adenocarcinoma. The data demonstrated that many genes were related to angiogenesis. Thymidylate synthase, GLUT-3 and HSP90AA1 were related to tanespimycin sensitivity (p=0.005). Conclusions: Genetic profiling with the gene signature demonstrated potential for clinical use. The use of tanespimycin alongside overexpression of thymidylate synthase, GLUT-3 and HSP90AA1 is a novel consideration for ovarian cancer treatment

    Breast Cancer Biomarker Discovery in the Functional Genomic Age: A Systematic Review of 42 Gene Expression Signatures

    Get PDF
    In this review we provide a systematic analysis of transcriptomic signatures derived from 42 breast cancer gene expression studies, in an effort to identify the most relevant breast cancer biomarkers using a meta-analysis method. Meta-data revealed a set of 117 genes that were the most commonly affected ranging from 12% to 36% of overlap among breast cancer gene expression studies. Data mining analysis of transcripts and protein-protein interactions of these commonly modulated genes indicate three functional modules significantly affected among signatures, one module related with the response to steroid hormone stimulus, and two modules related to the cell cycle. Analysis of a publicly available gene expression data showed that the obtained meta-signature is capable of predicting overall survival (P < 0.0001) and relapse-free survival (P < 0.0001) in patients with early-stage breast carcinomas. In addition, the identified meta-signature improves breast cancer patient stratification independently of traditional prognostic factors in a multivariate Cox proportional-hazards analysis

    Identification of a minimum number of genes to predict triple-negative breast cancer subgroups from gene expression profiles

    Get PDF
    Background: Triple-negative breast cancer (TNBC) is a very heterogeneous disease. Several gene expression and mutation profiling approaches were used to classify it, and all converged to the identification of distinct molecular subtypes, with some overlapping across different approaches. However, a standardised tool to routinely classify TNBC in the clinics and guide personalised treatment is lacking. We aimed at defining a specific gene signature for each of the six TNBC subtypes proposed by Lehman et al. in 2011 (basal-like 1 (BL1); basal-like 2 (BL2); mesenchymal (M); immunomodulatory (IM); mesenchymal stem-like (MSL); and luminal androgen receptor (LAR)), to be able to accurately predict them. Methods: Lehman’s TNBCtype subtyping tool was applied to RNA-sequencing data from 482 TNBC (GSE164458), and a minimal subtype-specific gene signature was defined by combining two class comparison techniques with seven attribute selection methods. Several machine learning algorithms for subtype prediction were used, and the best classifier was applied on microarray data from 72 Italian TNBC and on the TNBC subset of the BRCA-TCGA data set. Results: We identified two signatures with the 120 and 81 top up- and downregulated genes that define the six TNBC subtypes, with prediction accuracy ranging from 88.6 to 89.4%, and even improving after removal of the least important genes. Network analysis was used to identify highly interconnected genes within each subgroup. Two druggable matrix metalloproteinases were found in the BL1 and BL2 subsets, and several druggable targets were complementary to androgen receptor or aromatase in the LAR subset. Several secondary drug–target interactions were found among the upregulated genes in the M, IM and MSL subsets. Conclusions: Our study took full advantage of available TNBC data sets to stratify samples and genes into distinct subtypes, according to gene expression profiles. The development of a data mining approach to acquire a large amount of information from several data sets has allowed us to identify a well-determined minimal number of genes that may help in the recognition of TNBC subtypes. These genes, most of which have been previously found to be associated with breast cancer, have the potential to become novel diagnostic markers and/or therapeutic targets for specific TNBC subsets

    Long non-coding RNAs in cutaneous melanoma : clinical perspectives

    Get PDF
    Metastatic melanoma of the skin has a high mortality despite the recent introduction of targeted therapy and immunotherapy. Long non-coding RNAs (lncRNAs) are defined as transcripts of more than 200 nucleotides in length that lack protein-coding potential. There is growing evidence that lncRNAs play an important role in gene regulation, including oncogenesis. We present 13 lncRNA genes involved in the pathogenesis of cutaneous melanoma through a variety of pathways and molecular interactions. Some of these lncRNAs are possible biomarkers or therapeutic targets for malignant melanoma

    Computational study of cancer

    Get PDF
    In my thesis, I focused on integrative analysis of high-throughput oncogenomic data. This was done in two parts: In the first part, I describe IntOGen, an integrative data mining tool for the study of cancer. This system collates, annotates, pre-processes and analyzes large-scale data for transcriptomic, copy number aberration and mutational profiling of a large number of tumors in multiple cancer types. All oncogenomic data is annotated with ICD-O terms. We perform analysis at different levels of complexity: at the level of genes, at the level of modules, at the level of studies and finally combination of studies. The results are publicly available in a web service. I also present the Biomart interface of IntOGen for bulk download of data. In the final part, I propose a methodology based on sample-level enrichment analysis to identify patient subgroups from high-throughput profiling of tumors. I also apply this approach to a specific biological problem and characterize properties of worse prognosis tumor in multiple cancer types. This methodology can be used in the translational version of IntOGen

    Cellular interactions in the tumor microenvironment: the role of secretome

    Get PDF
    Over the past years, it has become evident that cancer initiation and progression depends on several components of the tumor microenvironment, including inflammatory and immune cells, fibroblasts, endothelial cells, adipocytes, and extracellular matrix. These components of the tumor microenvironment and the neoplastic cells interact with each other providing pro and antitumor signals. The tumor-stroma communication occurs directly between cells or via a variety of molecules secreted, such as growth factors, cytokines, chemokines and microRNAs. This secretome, which derives not only from tumor cells but also from cancer-associated stromal cells, is an important source of key regulators of the tumorigenic process. Their screening and characterization could provide useful biomarkers to improve cancer diagnosis, prognosis, and monitoring of treatment responses.Agência financiadora Fundação de Amparo a Pesquisa do Estado de Sao Paulo (FAPESP) FAPESP 10/51168-0 12/06048-2 13/03839-1 National Council for Scientific and Technological Development (CNPq) CNPq 306216/2010-8 Fundacao para a Ciencia e a Tecnologia (FCT) UID/BIM/04773/2013 CBMR 1334info:eu-repo/semantics/publishedVersio
    • …
    corecore