4,153 research outputs found

    BcCluster: a bladder cancer database at the molecular level

    Get PDF
    Background: Bladder Cancer (BC) has two clearly distinct phenotypes. Non-muscle invasive BC has good prognosis and is treated with tumor resection and intravesical therapy whereas muscle invasive BC has poor prognosis and requires usually systemic cisplatin based chemotherapy either prior to or after radical cystectomy. Neoadjuvant chemotherapy is not often used for patients undergoing cystectomy. High-throughput analytical omics techniques are now available that allow the identification of individual molecular signatures to characterize the invasive phenotype. However, a large amount of data produced by omics experiments is not easily accessible since it is often scattered over many publications or stored in supplementary files. Objective: To develop a novel open-source database, BcCluster (http://www.bccluster.org/), dedicated to the comprehensive molecular characterization of muscle invasive bladder carcinoma. Materials: A database was created containing all reported molecular features significant in invasive BC. The query interface was developed in Ruby programming language (version 1.9.3) using the web-framework Rails (version 4.1.5) (http://rubyonrails.org/). Results: BcCluster contains the data from 112 published references, providing 1,559 statistically significant features relative to BC invasion. The database also holds 435 protein-protein interaction data and 92 molecular pathways significant in BC invasion. The database can be used to retrieve binding partners and pathways for any protein of interest. We illustrate this possibility using survivin, a known BC biomarker. Conclusions: BcCluster is an online database for retrieving molecular signatures relative to BC invasion. This application offers a comprehensive view of BC invasiveness at the molecular level and allows formulation of research hypotheses relevant to this phenotype

    Systems Analysis of miRNA Biomarkers to Inform Drug Safety

    Get PDF
    microRNAs (miRNAs or miRs) are short non-coding RNA molecules which have been shown to be dysregulated and released into the extracellular milieu as a result of many drug and non-drug-induced pathologies in different organ systems. Consequently, circulating miRs have been proposed as useful biomarkers of many disease states, including drug-induced tissue injury. miRs have shown potential to support or even replace the existing traditional biomarkers of drug-induced toxicity in terms of sensitivity and specificity, and there is some evidence for their improved diagnostic and prognostic value. However, several pre-analytical and analytical challenges, mainly associated with assay standardization, require solutions before circulating miRs can be successfully translated into the clinic. This review will consider the value and potential for the use of circulating miRs in drug-safety assessment and describe a systems approach to the analysis of the miRNAome in the discovery setting, as well as highlighting standardization issues that at this stage prevent their clinical use as biomarkers. Highlighting these challenges will hopefully drive future research into finding appropriate solutions, and eventually circulating miRs may be translated to the clinic where their undoubted biomarker potential can be used to benefit patients in rapid, easy to use, point-of-care test systems

    Closing the circle : current state and perspectives of circular RNA databases

    Get PDF
    Circular RNAs (circRNAs) are covalently closed RNA molecules that have been linked to various diseases, including cancer. However, a precise function and working mechanism are lacking for the larger majority. Following many different experimental and computational approaches to identify circRNAs, multiple circRNA databases were developed as well. Unfortunately, there are several major issues with the current circRNA databases, which substantially hamper progression in the field. First, as the overlap in content is limited, a true reference set of circRNAs is lacking. This results from the low abundance and highly specific expression of circRNAs, and varying sequencing methods, data-analysis pipelines, and circRNA detection tools. A second major issue is the use of ambiguous nomenclature. Thus, redundant or even conflicting names for circRNAs across different databases contribute to the reproducibility crisis. Third, circRNA databases, in essence, rely on the position of the circRNA back-splice junction, whereas alternative splicing could result in circRNAs with different length and sequence. To uniquely identify a circRNA molecule, the full circular sequence is required. Fourth, circRNA databases annotate circRNAs' microRNA binding and protein-coding potential, but these annotations are generally based on presumed circRNA sequences. Finally, several databases are not regularly updated, contain incomplete data or suffer from connectivity issues. In this review, we present a comprehensive overview of the current circRNA databases and their content, features, and usability. In addition to discussing the current issues regarding circRNA databases, we come with important suggestions to streamline further research in this growing field

    Identifying potential circulating miRNA biomarkers for the diagnosis and prediction of ovarian cancer using machine-learning approach: application of Boruta

    Get PDF
    Introduction: In gynecologic oncology, ovarian cancer is a great clinical challenge. Because of the lack of typical symptoms and effective biomarkers for noninvasive screening, most patients develop advanced-stage ovarian cancer by the time of diagnosis. MicroRNAs (miRNAs) are a type of non-coding RNA molecule that has been linked to human cancers. Specifying diagnostic biomarkers to determine non-cancer and cancer samples is difficult.Methods: By using Boruta, a novel random forest-based feature selection in the machine-learning techniques, we aimed to identify biomarkers associated with ovarian cancer using cancerous and non-cancer samples from the Gene Expression Omnibus (GEO) database: GSE106817. In this study, we used two independent GEO data sets as external validation, including GSE113486 and GSE113740. We utilized five state-of-the-art machine-learning algorithms for classification: logistic regression, random forest, decision trees, artificial neural networks, and XGBoost.Results: Four models discovered in GSE113486 had an AUC of 100%, three in GSE113740 with AUC of over 94%, and four in GSE113486 with AUC of over 94%. We identified 10 miRNAs to distinguish ovarian cancer cases from normal controls: hsa-miR-1290, hsa-miR-1233-5p, hsa-miR-1914-5p, hsa-miR-1469, hsa-miR-4675, hsa-miR-1228-5p, hsa-miR-3184-5p, hsa-miR-6784-5p, hsa-miR-6800-5p, and hsa-miR-5100. Our findings suggest that miRNAs could be used as possible biomarkers for ovarian cancer screening, for possible intervention

    Machine Learning and Integrative Analysis of Biomedical Big Data.

    Get PDF
    Recent developments in high-throughput technologies have accelerated the accumulation of massive amounts of omics data from multiple sources: genome, epigenome, transcriptome, proteome, metabolome, etc. Traditionally, data from each source (e.g., genome) is analyzed in isolation using statistical and machine learning (ML) methods. Integrative analysis of multi-omics and clinical data is key to new biomedical discoveries and advancements in precision medicine. However, data integration poses new computational challenges as well as exacerbates the ones associated with single-omics studies. Specialized computational approaches are required to effectively and efficiently perform integrative analysis of biomedical data acquired from diverse modalities. In this review, we discuss state-of-the-art ML-based approaches for tackling five specific computational challenges associated with integrative analysis: curse of dimensionality, data heterogeneity, missing data, class imbalance and scalability issues

    iGPSe: A Visual Analytic System for Integrative Genomic Based Cancer Patient Stratification

    Full text link
    Background: Cancers are highly heterogeneous with different subtypes. These subtypes often possess different genetic variants, present different pathological phenotypes, and most importantly, show various clinical outcomes such as varied prognosis and response to treatment and likelihood for recurrence and metastasis. Recently, integrative genomics (or panomics) approaches are often adopted with the goal of combining multiple types of omics data to identify integrative biomarkers for stratification of patients into groups with different clinical outcomes. Results: In this paper we present a visual analytic system called Interactive Genomics Patient Stratification explorer (iGPSe) which significantly reduces the computing burden for biomedical researchers in the process of exploring complicated integrative genomics data. Our system integrates unsupervised clustering with graph and parallel sets visualization and allows direct comparison of clinical outcomes via survival analysis. Using a breast cancer dataset obtained from the The Cancer Genome Atlas (TCGA) project, we are able to quickly explore different combinations of gene expression (mRNA) and microRNA features and identify potential combined markers for survival prediction. Conclusions: Visualization plays an important role in the process of stratifying given population patients. Visual tools allowed for the selection of possibly features across various datasets for the given patient population. We essentially made a case for visualization for a very important problem in translational informatics.Comment: BioVis 2014 conferenc

    Integrating genetics and epigenetics in breast cancer: biological insights, experimental, computational methods and therapeutic potential

    Get PDF

    Prognostic Methods for Integrating Data from Complex Diseases

    Get PDF
    Statistics in medical research gained a vast surge with the development of high-throughput biotechnologies that provide thousands of measurements for each patient. These multi-layered data has the clear potential to improve the disease prognosis. Data integration is increasingly becoming essential in this context, to address problems such as increasing the power, inconsistencies between studies, obtaining more reliable biomarkers and gaining a broader understanding of the disease. This thesis focuses on addressing the challenges in the development of statistical methods while contributing to the methodological advancements in this field. We propose a clinical data analysis framework to obtain a model with good prediction accuracy addressing missing data and model instability. A detailed pre-processing pipeline is proposed for miRNA data that removes unwanted noise and offers improved concordance with qRT-PCR data. Platform specific models are developed to uncover biomarkers using mRNA, protein and miRNA data, to identify the source with the most important prognostic information. This thesis explores two types of data integration: horizontal; the integration of same type of data, and vertical; the integration of data from different platforms for the same patient. We use multiple miRNA datasets to develop a meta-analysis framework addressing the challenges in horizontal data integration using a multi-step validation protocol. In the vertical data integration, we extend the pre-validation principle and derive platform dependent weights to utilise the weighted Lasso. Our study revealed that integration of multi-layered data is instrumental in improving the prediction accuracy and in obtaining more biologically relevant biomarkers. A novel visualisation technique to look at prediction accuracy at patient level revealed vital findings with translational impact in personalised medicine
    • 

    corecore