91 research outputs found

    DIANA-microT Web server upgrade supports Fly and Worm miRNA target prediction and bibliographic miRNA to disease association

    Get PDF
    microRNAs (miRNAs) are small endogenous RNA molecules that are implicated in many biological processes through post-transcriptional regulation of gene expression. The DIANA-microT Web server provides a user-friendly interface for comprehensive computational analysis of miRNA targets in human and mouse. The server has now been extended to support predictions for two widely studied species: Drosophila melanogaster and Caenorhabditis elegans. In the updated version, the Web server enables the association of miRNAs to diseases through bibliographic analysis and provides insights for the potential involvement of miRNAs in biological processes. The nomenclature used to describe mature miRNAs along different miRBase versions has been extensively analyzed, and the naming history of each miRNA has been extracted. This enables the identification of miRNA publications regardless of possible nomenclature changes. User interaction has been further refined allowing users to save results that they wish to analyze further. A connection to the UCSC genome browser is now provided, enabling users to easily preview predicted binding sites in comparison to a wide array of genomic tracks, such as single nucleotide polymorphisms. The Web server is publicly accessible in www.microrna.gr/microT-v4

    Integrated analysis of microRNA and mRNA expression and association with HIF binding reveals the complexity of microRNA expression regulation under hypoxia.

    Get PDF
    BACKGROUND: In mammalians, HIF is a master regulator of hypoxia gene expression through direct binding to DNA, while its role in microRNA expression regulation, critical in the hypoxia response, is not elucidated genome wide. Our aim is to investigate in depth the regulation of microRNA expression by hypoxia in the breast cancer cell line MCF-7, establish the relationship between microRNA expression and HIF binding sites, pri-miRNA transcription and microRNA processing gene expression. METHODS: MCF-7 cells were incubated at 1% Oxygen for 16, 32 and 48 h. SiRNA against HIF-1α and HIF-2α were performed as previously published. MicroRNA and mRNA expression were assessed using microRNA microarrays, small RNA sequencing, gene expression microarrays and Real time PCR. The Kraken pipeline was applied for microRNA-seq analysis along with Bioconductor packages. Microarray data was analysed using Limma (Bioconductor), ChIP-seq data were analysed using Gene Set Enrichment Analysis and multiple testing correction applied in all analyses. RESULTS: Hypoxia time course microRNA sequencing data analysis identified 41 microRNAs significantly up- and 28 down-regulated, including hsa-miR-4521, hsa-miR-145-3p and hsa-miR-222-5p reported in conjunction with hypoxia for the first time. Integration of HIF-1α and HIF-2α ChIP-seq data with expression data showed overall association between binding sites and microRNA up-regulation, with hsa-miR-210-3p and microRNAs of miR-27a/23a/24-2 and miR-30b/30d clusters as predominant examples. Moreover the expression of hsa-miR-27a-3p and hsa-miR-24-3p was found positively associated to a hypoxia gene signature in breast cancer. Gene expression analysis showed no full coordination between pri-miRNA and microRNA expression, pointing towards additional levels of regulation. Several transcripts involved in microRNA processing were found regulated by hypoxia, of which DICER (down-regulated) and AGO4 (up-regulated) were HIF dependent. DICER expression was found inversely correlated to hypoxia in breast cancer. CONCLUSIONS: Integrated analysis of microRNA, mRNA and ChIP-seq data in a model cell line supports the hypothesis that microRNA expression under hypoxia is regulated at transcriptional and post-transcriptional level, with the presence of HIF binding sites at microRNA genomic loci associated with up-regulation. The identification of hypoxia and HIF regulated microRNAs relevant for breast cancer is important for our understanding of disease development and design of therapeutic interventions

    DIANA-microT web server: elucidating microRNA functions through target prediction

    Get PDF
    Computational microRNA (miRNA) target prediction is one of the key means for deciphering the role of miRNAs in development and disease. Here, we present the DIANA-microT web server as the user interface to the DIANA-microT 3.0 miRNA target prediction algorithm. The web server provides extensive information for predicted miRNA:target gene interactions with a user-friendly interface, providing extensive connectivity to online biological resources. Target gene and miRNA functions may be elucidated through automated bibliographic searches and functional information is accessible through Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways. The web server offers links to nomenclature, sequence and protein databases, and users are facilitated by being able to search for targeted genes using different nomenclatures or functional features, such as the genes possible involvement in biological pathways. The target prediction algorithm supports parameters calculated individually for each miRNA:target gene interaction and provides a signal-to-noise ratio and a precision score that helps in the evaluation of the significance of the predicted results. Using a set of miRNA targets recently identified through the pSILAC method, the performance of several computational target prediction programs was assessed. DIANA-microT 3.0 achieved there with 66% the highest ratio of correctly predicted targets over all predicted targets. The DIANA-microT web server is freely available at www.microrna.gr/microT

    A method to improve protein subcellular localization prediction by integrating various biological data sources

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Protein subcellular localization is crucial information to elucidate protein functions. Owing to the need for large-scale genome analysis, computational method for efficiently predicting protein subcellular localization is highly required. Although many previous works have been done for this task, the problem is still challenging due to several reasons: the number of subcellular locations in practice is large; distribution of protein in locations is imbalanced, that is the number of protein in each location remarkably different; and there are many proteins located in multiple locations. Thus it is necessary to explore new features and appropriate classification methods to improve the prediction performance.</p> <p>Results</p> <p>In this paper we propose a new predicting method which combines two key ideas: 1) Information of neighbour proteins in a probabilistic gene network is integrated to enrich the prediction features. 2) Fuzzy k-NN, a classification method based on fuzzy set theory is applied to predict protein locating in multiple sites. Experiment was conducted on a dataset consisting of 22 locations from Budding yeast proteins and significant improvement was observed.</p> <p>Conclusion</p> <p>Our results suggest that the neighbourhood information from functional gene networks is predictive to subcellular localization. The proposed method thus can be integrated and complementary to other available prediction methods.</p

    Development and evaluation of machine learning in whole-body magnetic resonance imaging for detecting metastases in patients with lung or colon cancer: a diagnostic test accuracy study.

    Get PDF
    OBJECTIVES: Whole-body magnetic resonance imaging (WB-MRI) has been demonstrated to be efficient and cost-effective for cancer staging. The study aim was to develop a machine learning (ML) algorithm to improve radiologists' sensitivity and specificity for metastasis detection and reduce reading times. MATERIALS AND METHODS: A retrospective analysis of 438 prospectively collected WB-MRI scans from multicenter Streamline studies (February 2013-September 2016) was undertaken. Disease sites were manually labeled using Streamline reference standard. Whole-body MRI scans were randomly allocated to training and testing sets. A model for malignant lesion detection was developed based on convolutional neural networks and a 2-stage training strategy. The final algorithm generated lesion probability heat maps. Using a concurrent reader paradigm, 25 radiologists (18 experienced, 7 inexperienced in WB-/MRI) were randomly allocated WB-MRI scans with or without ML support to detect malignant lesions over 2 or 3 reading rounds. Reads were undertaken in the setting of a diagnostic radiology reading room between November 2019 and March 2020. Reading times were recorded by a scribe. Prespecified analysis included sensitivity, specificity, interobserver agreement, and reading time of radiology readers to detect metastases with or without ML support. Reader performance for detection of the primary tumor was also evaluated. RESULTS: Four hundred thirty-three evaluable WB-MRI scans were allocated to algorithm training (245) or radiology testing (50 patients with metastases, from primary 117 colon [n = 117] or lung [n = 71] cancer). Among a total 562 reads by experienced radiologists over 2 reading rounds, per-patient specificity was 86.2% (ML) and 87.7% (non-ML) (-1.5% difference; 95% confidence interval [CI], -6.4%, 3.5%; P = 0.39). Sensitivity was 66.0% (ML) and 70.0% (non-ML) (-4.0% difference; 95% CI, -13.5%, 5.5%; P = 0.344). Among 161 reads by inexperienced readers, per-patient specificity in both groups was 76.3% (0% difference; 95% CI, -15.0%, 15.0%; P = 0.613), with sensitivity of 73.3% (ML) and 60.0% (non-ML) (13.3% difference; 95% CI, -7.9%, 34.5%; P = 0.313). Per-site specificity was high (>90%) for all metastatic sites and experience levels. There was high sensitivity for the detection of primary tumors (lung cancer detection rate of 98.6% with and without ML [0.0% difference; 95% CI, -2.0%, 2.0%; P = 1.00], colon cancer detection rate of 89.0% with and 90.6% without ML [-1.7% difference; 95% CI, -5.6%, 2.2%; P = 0.65]). When combining all reads from rounds 1 and 2, reading times fell by 6.2% (95% CI, -22.8%, 10.0%) when using ML. Round 2 read-times fell by 32% (95% CI, 20.8%, 42.8%) compared with round 1. Within round 2, there was a significant decrease in read-time when using ML support, estimated as 286 seconds (or 11%) quicker (P = 0.0281), using regression analysis to account for reader experience, read round, and tumor type. Interobserver variance suggests moderate agreement, Cohen κ = 0.64; 95% CI, 0.47, 0.81 (with ML), and Cohen κ = 0.66; 95% CI, 0.47, 0.81 (without ML). CONCLUSIONS: There was no evidence of a significant difference in per-patient sensitivity and specificity for detecting metastases or the primary tumor using concurrent ML compared with standard WB-MRI. Radiology read-times with or without ML support fell for round 2 reads compared with round 1, suggesting that readers familiarized themselves with the study reading method. During the second reading round, there was a significant reduction in reading time when using ML support

    Subcellular location prediction of proteins using support vector machines with alignment of block sequences utilizing amino acid composition

    Get PDF
    Background: Subcellular location prediction of proteins is an important and well-studied problem in bioinformatics. This is a problem of predicting which part in a cell a given protein is transported to, where an amino acid sequence of the protein is given as an input. This problem is becoming more important since information on subcellular location is helpful for annotation of proteins and genes and the number of complete genomes is rapidly increasing. Since existing predictors are based on various heuristics, it is important to develop a simple method with high prediction accuracies. Results: In this paper, we propose a novel and general predicting method by combining techniques for sequence alignment and feature vectors based on amino acid composition. We implemented this method with support vector machines on plant data sets extracted from the TargetP database. Through fivefold cross validation tests, the obtained overall accuracies and average MCC were 0.9096 and 0.8655 respectively. We also applied our method to other datasets including that of WoLF PSORT. Conclusion: Although there is a predictor which uses the information of gene ontology and yields higher accuracy than ours, our accuracies are higher than existing predictors which use only sequence information. Since such information as gene ontology can be obtained only for known proteins, our predictor is considered to be useful for subcellular location prediction of newly-discovered proteins. Furthermore, the idea of combination of alignment and amino acid frequency is novel and general so that it may be applied to other problems in bioinformatics. Our method for plant is also implemented as a web-system and available on http://sunflower.kuicr.kyoto-u.ac.jp/~tamura/slpfa.html webcite

    A prospective observational cohort study for newly diagnosed osteosarcoma patients in the UK: ICONIC study initial results

    Get PDF
    There has been little change to the standard treatment for osteosarcoma (OS) over the last 25 years and there is an unmet need to identify new biomarkers and novel therapeutic approaches if outcomes are to improve. Furthermore, there is limited evidence on the impact of OS treatment on patient-reported outcomes (PROs). ICONIC (Improving Outcomes through Collaboration in Osteosarcoma; NCT04132895) is a prospective observational cohort study recruiting newly diagnosed OS patients across the United Kingdom (UK) with matched longitudinal collection of clinical, biological, and PRO data. During Stage 1, which assessed the feasibility of recruitment and data collection, 102 patients were recruited at 22 sites with representation from patient groups frequently excluded in OS studies, including patients over 50 years and those with less common primary sites. The feasibility of collecting clinical and biological samples, in addition to PRO data, has been established and there is ongoing analysis of these data as part of Stage 2. ICONIC will provide a unique, prospective cohort of newly diagnosed OS patients representative of the UK patient population, with fully annotated clinical outcomes linked to molecularly characterised biospecimens, allowing for comprehensive analyses to better understand biology and develop new biomarkers and novel therapeutic approaches

    CyclinPred: A SVM-Based Method for Predicting Cyclin Protein Sequences

    Get PDF
    Functional annotation of protein sequences with low similarity to well characterized protein sequences is a major challenge of computational biology in the post genomic era. The cyclin protein family is once such important family of proteins which consists of sequences with low sequence similarity making discovery of novel cyclins and establishing orthologous relationships amongst the cyclins, a difficult task. The currently identified cyclin motifs and cyclin associated domains do not represent all of the identified and characterized cyclin sequences. We describe a Support Vector Machine (SVM) based classifier, CyclinPred, which can predict cyclin sequences with high efficiency. The SVM classifier was trained with features of selected cyclin and non cyclin protein sequences. The training features of the protein sequences include amino acid composition, dipeptide composition, secondary structure composition and PSI-BLAST generated Position Specific Scoring Matrix (PSSM) profiles. Results obtained from Leave-One-Out cross validation or jackknife test, self consistency and holdout tests prove that the SVM classifier trained with features of PSSM profile was more accurate than the classifiers based on either of the other features alone or hybrids of these features. A cyclin prediction server- CyclinPred has been setup based on SVM model trained with PSSM profiles. CyclinPred prediction results prove that the method may be used as a cyclin prediction tool, complementing conventional cyclin prediction methods

    Kinome-wide interaction modelling using alignment-based and alignment-independent approaches for kinase description and linear and non-linear data analysis techniques

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Protein kinases play crucial roles in cell growth, differentiation, and apoptosis. Abnormal function of protein kinases can lead to many serious diseases, such as cancer. Kinase inhibitors have potential for treatment of these diseases. However, current inhibitors interact with a broad variety of kinases and interfere with multiple vital cellular processes, which causes toxic effects. Bioinformatics approaches that can predict inhibitor-kinase interactions from the chemical properties of the inhibitors and the kinase macromolecules might aid in design of more selective therapeutic agents, that show better efficacy and lower toxicity.</p> <p>Results</p> <p>We applied proteochemometric modelling to correlate the properties of 317 wild-type and mutated kinases and 38 inhibitors (12,046 inhibitor-kinase combinations) to the respective combination's interaction dissociation constant (K<sub>d</sub>). We compared six approaches for description of protein kinases and several linear and non-linear correlation methods. The best performing models encoded kinase sequences with amino acid physico-chemical z-scale descriptors and used support vector machines or partial least- squares projections to latent structures for the correlations. Modelling performance was estimated by double cross-validation. The best models showed high predictive ability; the squared correlation coefficient for new kinase-inhibitor pairs ranging P<sup>2 </sup>= 0.67-0.73; for new kinases it ranged P<sup>2</sup><sub>kin </sub>= 0.65-0.70. Models could also separate interacting from non-interacting inhibitor-kinase pairs with high sensitivity and specificity; the areas under the ROC curves ranging AUC = 0.92-0.93. We also investigated the relationship between the number of protein kinases in the dataset and the modelling results. Using only 10% of all data still a valid model was obtained with P<sup>2 </sup>= 0.47, P<sup>2</sup><sub>kin </sub>= 0.42 and AUC = 0.83.</p> <p>Conclusions</p> <p>Our results strongly support the applicability of proteochemometrics for kinome-wide interaction modelling. Proteochemometrics might be used to speed-up identification and optimization of protein kinase targeted and multi-targeted inhibitors.</p
    corecore