22 research outputs found

    Protein binding affinity prediction using support vector regression and interfecial features

    Get PDF
    In understanding biology at the molecular level, analysis of protein interactions and protein binding affinity is a challenge. It is an important problem in computational and structural biology. Experimental measurement of binding affinity in the wet-lab is expensive and time consuming. Therefore, machine learning approaches are widely used to predict protein interactions and binding affinities by learning from specific properties of existing complexes. In this work, we propose an innovative computational model to predict binding affinities and interaction based on sequence, structural and interface features of the interacting proteins that are robust to binding associated conformational changes. We modeled the prediction of binding affinity as classification and regression problem with least-squared and support vector regression models using structure and sequence features of proteins. Specifically, we have used the number and composition of interacting residues at protein complexes interface as features and sequence features. We evaluated the performance of our prediction models using Affinity Benchmark Dataset version 2.0 which contains a diverse set of both bound and unbound protein complex structures with known binding affinities. We evaluated our regression performance results with root mean square error (RMSE) as well as Spearman and Pearson's correlation coefficients using a leave-one-out cross-validation protocol. We evaluate classification results with AUC-ROC and AUC-PR Our results show that Support Vector Regression performs significantly better than other models with a Spearman Correlation coefficient of 0.58, Pearson Correlation score of 0.55 and RMSE of 2.41 using 3-mer and sequence feature. It is interesting to note that simple features based on 3-mer features and the properties of the interface of a protein complex are predictive of its binding affinity. These features, together with support vector regression achieve higher accuracy than existing sequence based methods

    Improving mitotic cell counting accuracy and efficiency using phosphohistone‐H3 ( PHH3 ) antibody counterstained with haematoxylin and eosin as part of breast cancer grading

    Get PDF
    Background: Mitotic count in breast cancer is an important prognostic marker. Unfortunately, substantial inter‐ and intraobserver variation exists when pathologists manually count mitotic figures. To alleviate this problem, we developed a new technique incorporating both haematoxylin and eosin (H&E) and phosphorylated histone H3 (PHH3), a marker highly specific to mitotic figures, and compared it to visual scoring of mitotic figures using H&E only. Methods: Two full‐face sections from 97 cases were cut, one stained with H&E only, and the other was stained with PHH3 and counterstained with H&E (PHH3–H&E). Counting mitoses using PHH3–H&E was compared to traditional mitoses scoring using H&E in terms of reproducibility, scoring time, and the ability to detect mitosis hotspots. We assessed the agreement between manual and image analysis‐assisted scoring of mitotic figures using H&E and PHH3–H&E‐stained cells. The diagnostic performance of PHH3 in detecting mitotic figures in terms of sensitivity and specificity was measured. Finally, PHH3 replaced the mitosis score in a multivariate analysis to assess its significance. Results: Pathologists detected significantly higher mitotic figures using the PHH3–H&E (median ± SD, 20 ± 33) compared with H&E alone (median ± SD, 16 ± 25), P < 0.001. The concordance between pathologists in identifying mitotic figures was highest when using the dual PHH3–H&E technique; in addition, it highlighted mitotic figures at low power, allowing better agreement on choosing the hotspot area (k = 0.842) in comparison with standard H&E (k = 0.625). A better agreement between image analysis‐assisted software and the human eye was observed for PHH3‐stained mitotic figures. When the mitosis score was replaced with PHH3 in a Cox regression model with other grade components, PHH3 was an independent predictor of survival (hazard ratio [HR] 5.66, 95% confidence interval [CI] 1.92–16.69; P = 0.002), and even showed a more significant association with breast cancer‐specific survival (BCSS) than mitosis (HR 3.63, 95% CI 1.49–8.86; P = 0.005) and Ki67 (P = 0.27). Conclusion: Using PHH3–H&E‐stained slides can reliably be used in routine scoring of mitotic figures and integrating both techniques will compensate for each other's limitations and improve diagnostic accuracy, quality, and precision

    Machine learning predicts new anti-CRISPR proteins

    Get PDF
    Abstract The increasing use of CRISPR–Cas9 in medicine, agriculture, and synthetic biology has accelerated the drive to discover new CRISPR–Cas inhibitors as potential mechanisms of control for gene editing applications. Many anti-CRISPRs have been found that inhibit the CRISPR–Cas adaptive immune system. However, comparing all currently known anti-CRISPRs does not reveal a shared set of properties for facile bioinformatic identification of new anti-CRISPR families. Here, we describe AcRanker, a machine learning based method to aid direct identification of new potential anti-CRISPRs using only protein sequence information. Using a training set of known anti-CRISPRs, we built a model based on XGBoost ranking. We then applied AcRanker to predict candidate anti-CRISPRs from predicted prophage regions within self-targeting bacterial genomes and discovered two previously unknown anti-CRISPRs: AcrllA20 (ML1) and AcrIIA21 (ML8). We show that AcrIIA20 strongly inhibits Streptococcus iniae Cas9 (SinCas9) and weakly inhibits Streptococcus pyogenes Cas9 (SpyCas9). We also show that AcrIIA21 inhibits SpyCas9, Streptococcus aureus Cas9 (SauCas9) and SinCas9 with low potency. The addition of AcRanker to the anti-CRISPR discovery toolkit allows researchers to directly rank potential anti-CRISPR candidate genes for increased speed in testing and validation of new anti-CRISPRs. A web server implementation for AcRanker is available online at http://acranker.pythonanywhere.com/

    Development and validation of a weakly supervised deep learning framework to predict the status of molecular pathways and key mutations in colorectal cancer from routine histology images : a retrospective study

    Get PDF
    Background: Determining the status of molecular pathways and key mutations in colorectal cancer is crucial for optimal therapeutic decision making. We therefore aimed to develop a novel deep learning pipeline to predict the status of key molecular pathways and mutations from whole-slide images of haematoxylin and eosin-stained colorectal cancer slides as an alternative to current tests. Methods: In this retrospective study, we used 502 diagnostic slides of primary colorectal tumours from 499 patients in The Cancer Genome Atlas colon and rectal cancer (TCGA-CRC-DX) cohort and developed a weakly supervised deep learning framework involving three separate convolutional neural network models. Whole-slide images were divided into equally sized tiles and model 1 (ResNet18) extracted tumour tiles from non-tumour tiles. These tumour tiles were inputted into model 2 (adapted ResNet34), trained by iterative draw and rank sampling to calculate a prediction score for each tile that represented the likelihood of a tile belonging to the molecular labels of high mutation density (vs low mutation density), microsatellite instability (vs microsatellite stability), chromosomal instability (vs genomic stability), CpG island methylator phenotype (CIMP)-high (vs CIMP-low), BRAFmut (vs BRAFWT), TP53mut (vs TP53WT), and KRASWT (vs KRASmut). These scores were used to identify the top-ranked titles from each slide, and model 3 (HoVer-Net) segmented and classified the different types of cell nuclei in these tiles. We calculated the area under the convex hull of the receiver operating characteristic curve (AUROC) as a model performance measure and compared our results with those of previously published methods. Findings: Our iterative draw and rank sampling method yielded mean AUROCs for the prediction of hypermutation (0·81 [SD 0·03] vs 0·71), microsatellite instability (0·86 [0·04] vs 0·74), chromosomal instability (0·83 [0·02] vs 0·73), BRAFmut (0·79 [0·01] vs 0·66), and TP53mut (0·73 [0·02] vs 0·64) in the TCGA-CRC-DX cohort that were higher than those from previously published methods, and an AUROC for KRASmut that was similar to previously reported methods (0·60 [SD 0·04] vs 0·60). Mean AUROC for predicting CIMP-high status was 0·79 (SD 0·05). We found high proportions of tumour-infiltrating lymphocytes and necrotic tumour cells to be associated with microsatellite instability, and high proportions of tumour-infiltrating lymphocytes and a low proportion of necrotic tumour cells to be associated with hypermutation. Interpretation: After large-scale validation, our proposed algorithm for predicting clinically important mutations and molecular pathways, such as microsatellite instability, in colorectal cancer could be used to stratify patients for targeted therapies with potentially lower costs and quicker turnaround times than sequencing-based or immunohistochemistry-based approaches. Funding: The UK Medical Research Council

    Visual histological assessment of morphological features reflects the underlying molecular profile in invasive breast cancer : a morpho‐molecular study

    Get PDF
    Background: Tumour genotype and phenotype are related and can predict outcome. In this study, we hypothesised that the visual assessment of breast cancer (BC) morphological features can provide valuable insight into underlying molecular profiles. Methods: The Cancer Genome Atlas (TCGA) BC cohort was used (n=743) and morphological features including Nottingham grade and its components and nucleolar prominence were assessed utilising whole slide images (WSIs). Two independent scores were assigned, and discordant cases were utilised to represent cases with intermediate morphological features. Differentially expressed genes (DEGs) were identified for each feature, compared among concordant/discordant cases and tested for specific pathways. Results: Concordant grading was observed in 467/743 (63%) of cases. Among concordant case groups, 8 common DEGs (UGT8, DDC, RGR, RLBP1, SPRR1B, CXorf49B, PSAPL1, and SPRR2G) were associated with overall tumour grade and its components. These genes are related mainly to cellular proliferation, differentiation and metabolism. The number of DEGs in cases with discordant grading was larger than those identified in concordant cases. The largest number of DEGs was observed in discordant grade 1:3 cases (n=1185). DEGs were identified for each discordant component. Some DEGs were uniquely associated with well‐defined specific morphological features, whereas expression/co‐expression of other genes was identified across multiple features and underlined intermediate morphological features. Conclusion: Morphological features are likely related to distinct underlying molecular profiles that drive both morphology and behaviour. This study provides further evidence to support the use of image‐based analysis of WSIs, including artificial intelligence algorithms, to predict tumour molecular profiles and outcome

    Visual histological assessment of morphological features reflects the underlying molecular profile in invasive breast cancer: a morphomolecular study

    Get PDF
    © 2020 The Authors. Histopathology published by John Wiley & Sons Ltd Aims: Tumour genotype and phenotype are related and can predict outcome. In this study, we hypothesised that the visual assessment of breast cancer (BC) morphological features can provide valuable insight into underlying molecular profiles. Methods and results: The Cancer Genome Atlas (TCGA) BC cohort was used (n=743) and morphological features, including Nottingham grade and its components and nucleolar prominence, were assessed utilising whole-slide images (WSIs). Two independent scores were assigned, and discordant cases were utilised to represent cases with intermediate morphological features. Differentially expressed genes (DEGs) were identified for each feature, compared among concordant/discordant cases and tested for specific pathways. Concordant grading was observed in 467 of 743 (63%) of cases. Among concordant case groups, eight common DEGs (UGT8, DDC, RGR, RLBP1, SPRR1B, CXorf49B, PSAPL1 and SPRR2G) were associated with overall tumour grade and its components. These genes are related mainly to cellular proliferation, differentiation and metabolism. The number of DEGs in cases with discordant grading was larger than those identified in concordant cases. The largest number of DEGs was observed in discordant grade 1:3 cases (n=1185). DEGs were identified for each discordant component. Some DEGs were uniquely associated with well-defined specific morphological features, whereas expression/co-expression of other genes was identified across multiple features and underlined intermediate morphological features. Conclusion: Morphological features are probably related to distinct underlying molecular profiles that drive both morphology and behaviour. This study provides further evidence to support the use of image-based analysis of WSIs, including artificial intelligence algorithms, to predict tumour molecular profiles and outcome

    TIAToolbox as an end-to-end library for advanced tissue image analytics

    Get PDF
    Background: Computational pathology has seen rapid growth in recent years, driven by advanced deep-learning algorithms. Due to the sheer size and complexity of multi-gigapixel whole-slide images, to the best of our knowledge, there is no open-source software library providing a generic end-to-end API for pathology image analysis using best practices. Most researchers have designed custom pipelines from the bottom up, restricting the development of advanced algorithms to specialist users. To help overcome this bottleneck, we present TIAToolbox, a Python toolbox designed to make computational pathology accessible to computational, biomedical, and clinical researchers. Methods: By creating modular and configurable components, we enable the implementation of computational pathology algorithms in a way that is easy to use, flexible and extensible. We consider common sub-tasks including reading whole slide image data, patch extraction, stain normalization and augmentation, model inference, and visualization. For each of these steps, we provide a user-friendly application programming interface for commonly used methods and models. Results: We demonstrate the use of the interface to construct a full computational pathology deep-learning pipeline. We show, with the help of examples, how state-of-the-art deep-learning algorithms can be reimplemented in a streamlined manner using our library with minimal effort. Conclusions: We provide a usable and adaptable library with efficient, cutting-edge, and unit-tested tools for data loading, pre-processing, model inference, post-processing, and visualization. This enables a range of users to easily build upon recent deep-learning developments in the computational pathology literature

    Pitfalls in machine learning‐based assessment of tumor‐infiltrating lymphocytes in breast cancer: a report of the international immuno‐oncology biomarker working group

    Get PDF
    The clinical significance of the tumor-immune interaction in breast cancer (BC) has been well established, and tumor-infiltrating lymphocytes (TILs) have emerged as a predictive and prognostic biomarker for patients with triple-negative (estrogen receptor, progesterone receptor, and HER2 negative) breast cancer (TNBC) and HER2-positive breast cancer. How computational assessment of TILs can complement manual TIL-assessment in trial- and daily practices is currently debated and still unclear. Recent efforts to use machine learning (ML) for the automated evaluation of TILs show promising results. We review state-of-the-art approaches and identify pitfalls and challenges by studying the root cause of ML discordances in comparison to manual TILs quantification. We categorize our findings into four main topics; (i) technical slide issues, (ii) ML and image analysis aspects, (iii) data challenges, and (iv) validation issues. The main reason for discordant assessments is the inclusion of false-positive areas or cells identified by performance on certain tissue patterns, or design choices in the computational implementation. To aid the adoption of ML in TILs assessment, we provide an in-depth discussion of ML and image analysis including validation issues that need to be considered before reliable computational reporting of TILs can be incorporated into the trial- and routine clinical management of patients with TNBC

    Elementary effects analysis of factors controlling COVID-19 infections in computational simulation reveals the importance of social distancing and mask usage

    No full text
    COVID-19 was declared a pandemic by the World Health Organisation (WHO) on March 11th, 2020. With half of the world's countries in lockdown as of April due to this pandemic, monitoring and understanding the spread of the virus and infection rates and how these factors relate to behavioural and societal parameters is crucial for developing control strategies. This paper aims to investigate the effectiveness of masks, social distancing, lockdown and self-isolation for reducing the spread of SARS-CoV-2 infections. Our findings from an agent-based simulation modelling showed that whilst requiring a lockdown is widely believed to be the most efficient method to quickly reduce infection numbers, the practice of social distancing and the usage of surgical masks can potentially be more effective than requiring a lockdown. Our multivariate analysis of simulation results using the Morris Elementary Effects Method suggests that if a sufficient proportion of the population uses surgical masks and follows social distancing regulations, then SARS-CoV-2 infections can be controlled without requiring a lockdown
    corecore