2,383 research outputs found

    Three machine learning models for the 2019 Solubility Challenge

    Get PDF
    We describe three machine learning models submitted to the 2019 Solubility Challenge. All are founded on tree-like classifiers, with one model being based on Random Forest and another on the related Extra Trees algorithm. The third model is a consensus predictor combining the former two with a Bagging classifier. We call this consensus classifier Vox Machinarum, and here discuss how it benefits from the Wisdom of Crowds. On the first 2019 Solubility Challenge test set of 100 low-variance intrinsic aqueous solubilities, Extra Trees is our best classifier. One the other, a high-variance set of 32 molecules, we find that Vox Machinarum and Random Forest both perform a little better than Extra Trees, and almost equally to one another. We also compare the gold standard solubilities from the 2019 Solubility Challenge with a set of literature-based solubilities for most of the same compounds.Publisher PDFPeer reviewe

    We are probably not Sims

    Get PDF
    In this article, I discuss the current state of the debate around the simulation hypothesis, the idea that the world we inhabit is a computer simulation in or within another universe. Considering recent work from a range of authors, I suggest that statistical arguments in favour of a simulated world are naive and fail to account either for Ockham’s Razor or for alternative existential possibilities besides base reality and a simulation. Most significantly, I observe that it would be computationally impossible in our own universe to simulate a similar cosmos at fine granularity. This implies substantial differences in size and information content between simulating and simulated universes. I argue that this makes serious analysis of the simulation argument extremely difficult. I suggest that Christian theology has no reason to reinvent itself to accommodate simulism; the two should be viewed as mutually exclusive world-views. Further, I note that the existence of a human soul or spirit, or indeed any non-reductionist explanation of human consciousness, could undermine the assumption of substrate independence that simulism requires.PreprintPostprintPeer reviewe

    Chemistry in Bioinformatics

    Get PDF
    A preprint of an invited submission to BioMedCentral Bioinformatics. This short manuscript is an overview or the current problems and opportunities in publishing chemical information. Full details of technology are given in the sibling manuscript http://www.dspace.cam.ac.uk/handle/1810/34579 The manuscript is the authors' preprint although it has been automatically transformed into this archived PDF by the submission system. The authors are not responsible for the formattingChemical information is now seen as critical for most areas of life sciences. But unlike Bioinformatics, where data is Openly available and freely re−usable, most chemical information is closed and cannot be re−distributed without permission. This has led to a failure to adopt modern informatics and software techniques and therefore paucity of chemistry in bioinformatics. New technology, however, offers the hope of making chemical data (compounds and properties) Free during the authoring process. We argue that the technology is already available; we require a collective agreement to enhance publication protocols

    Verifying the fully “Laplacianised” posterior Naïve Bayesian approach and more

    Get PDF
    Mussa and Glen would like to thank Unilever for financial support, whereas Mussa and Mitchell thank the BBSRC for funding this research through grant BB/I00596X/1. Mitchell thanks the Scottish Universities Life Sciences Alliance (SULSA) for financial support.Background In a recent paper, Mussa, Mitchell and Glen (MMG) have mathematically demonstrated that the “Laplacian Corrected Modified Naïve Bayes” (LCMNB) algorithm can be viewed as a variant of the so-called Standard Naïve Bayes (SNB) scheme, whereby the role played by absence of compound features in classifying/assigning the compound to its appropriate class is ignored. MMG have also proffered guidelines regarding the conditions under which this omission may hold. Utilising three data sets, the present paper examines the validity of these guidelines in practice. The paper also extends MMG’s work and introduces a new version of the SNB classifier: “Tapered Naïve Bayes” (TNB). TNB does not discard the role of absence of a feature out of hand, nor does it fully consider its role. Hence, TNB encapsulates both SNB and LCMNB. Results LCMNB, SNB and TNB performed differently on classifying 4,658, 5,031 and 1,149 ligands (all chosen from the ChEMBL Database) distributed over 31 enzymes, 23 membrane receptors, and one ion-channel, four transporters and one transcription factor as their target proteins. When the number of features utilised was equal to or smaller than the “optimal” number of features for a given data set, SNB classifiers systematically gave better classification results than those yielded by LCMNB classifiers. The opposite was true when the number of features employed was markedly larger than the “optimal” number of features for this data set. Nonetheless, these LCMNB performances were worse than the classification performance achieved by SNB when the “optimal” number of features for the data set was utilised. TNB classifiers systematically outperformed both SNB and LCMNB classifiers. Conclusions The classification results obtained in this study concur with the mathematical based guidelines given in MMG’s paper—that is, ignoring the role of absence of a feature out of hand does not necessarily improve classification performance of the SNB approach; if anything, it could make the performance of the SNB method worse. The results obtained also lend support to the rationale, on which the TNB algorithm rests: handled judiciously, taking into account absence of features can enhance (not impair) the discriminatory classification power of the SNB approach.Publisher PDFPeer reviewe

    Enzyme function and its evolution

    Get PDF
    With rapid increases over recent years in the determination of protein sequence and structure, alongside knowledge of thousands of enzyme functions and hundreds of chemical mechanisms, it is now possible to combine breadth and depth in our understanding of enzyme evolution. Phylogenetics continues to move forward, though determining correct evolutionary family trees is not trivial. Protein function prediction has spawned a variety of promising methods that offer the prospect of identifying enzymes across the whole range of chemical functions and over numerous species. This knowledge is essential to understand antibiotic resistance, as well as in protein re-engineering and de novo enzyme design.PostprintPeer reviewe

    Predicting melting points of organic molecules : applications to aqueous solubility prediction using the General Solubility Equation

    Get PDF
    In this work we make predictions of several important molecular properties of academic and industrial importance to seek answers to two questions: 1) Can we apply efficient machine learning techniques, using inexpensive descriptors, to predict melting points to a reasonable level of accuracy? 2) Can values of this level of accuracy be usefully applied to predicting aqueous solubility? We present predictions of melting points made by several novel machine learning models, previously applied to solubility prediction. Additionally, we make predictions of solubility via the General Solubility Equation (GSE) and monitor the impact of varying the logP prediction model (AlogP and XlogP) on the GSE. We note that the machine learning models presented, using a modest number of 2D descriptors, can make melting point predictions in line with the current state of the art prediction methods (RMSE ≥ 40 oC). We also find that predicted melting points, with an RMSE of tens of degrees Celsius, can be usefully applied to the GSE to yield accurate solubility predictions (log10S RMSE < 1) over a small dataset of druglike molecules.PostprintPostprintPeer reviewe

    Allosteric activation unveils protein-mass modulation of ATP phosphoribosyltransferase product release

    Get PDF
    Funding: This work was supported by the Biotechnology and Biological Sciences Research Council (BBSRC) [Grant BB/M010996/1] via an EASTBIO Doctoral Training Partnership studentship to B.J.R.Heavy-isotope substitution into enzymes slows down bond vibrations and may alter transition-state barrier crossing probability if this is coupled to fast protein motions. ATP phosphoribosyltransferase from Acinetobacter baumannii is a multi-protein complex where the regulatory protein HisZ allosterically enhances catalysis by the catalytic protein HisGS. This is accompanied by a shift in rate-limiting step from chemistry to product release. Here we report that isotope-labelling of HisGS has no effect on the nonactivated reaction, which involves negative activation heat capacity, while HisZ-activated HisGS catalytic rate decreases in a strictly mass-dependent fashion across five different HisGS masses, at low temperatures. Surprisingly, the effect is not linked to the chemical step, but to fast motions governing product release in the activated enzyme. Disruption of a specific enzyme-product interaction abolishes the isotope effects. Results highlight how altered protein mass perturbs allosterically modulated thermal motions relevant to the catalytic cycle beyond the chemical step.Peer reviewe

    Computational insights into the catalytic mechanism of Is-PETase : an enzyme capable of degrading poly(ethylene) terephthalate

    Get PDF
    This work was supported through a studentship from BBSRC in the EastBio doctoral training programme for E. S.-P.Is-PETase has become an enzyme of significant interest due to its ability to catalyse the degradation of polyethylene terephthalate (PET) at mesophilic temperatures. We performed hybrid quantum mechanics and molecular mechanics (QM/MM) at the DSD-PBEP86-D3/ma-def2-TZVP/CHARMM27//rev-PBE-D3/dev2-SVP/CHARMM level to calculate the energy profile for the degradation of a suitable PET model by this enzyme. Very low overall barriers are computed for serine protease-type hydrolysis steps (as low as 34.1 kJ mol-1). Spontaneous deprotonation of the final product, terephthalic acid, with a high computed driving force indicates that product release could be rate limiting.Publisher PDFPeer reviewe

    Bony pelvis dimensions in women with and without stress urinary incontinence

    Full text link
    Aims To test the null hypothesis that bony pelvis dimensions are similar in women with and without stress urinary incontinence (SUI), both in the postpartum and midlife periods. Methods Secondary analyses were performed of two case–control studies comparing women with SUI to asymptomatic controls. One study examined primiparas in the first 9–12 months postpartum; the other study involved middle‐aged women. SUI was confirmed by full‐bladder stress test. All subjects underwent pelvic magnetic resonance imaging. The interspinous and intertuberous diameters, subpubic angle, and sacrococcygeal joint‐to‐the inferior pubic point distance were measured from the images independently by two authors. Results In the young cohorts, we compared primiparas with de novo postpartum SUI to both continent primiparas and nulliparas. Postpartum SUI is associated with a wider subpubic angle. There is also a trend towards wider interspinous and intertuberous diameters in the stress‐incontinent primiparas as compared to the continent cohorts, although this did not reach statistical significance with our sample sizes. By contrast, no significant differences in bony pelvis dimensions were identified when comparing middle‐aged women with SUI and their continent controls. Conclusions Bony pelvis dimensions are different in women with SUI than in matched continent controls. However, these differences are only identified in young primiparas in the postpartum period, not in middle‐aged women. Neurourol. Urodynam. 32: 37–42, 2013. © 2012 Wiley Periodicals, Inc.Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/95230/1/22275_ftp.pd

    Robust identification of interactions between heat-stress responsive genes in the chicken brain using Bayesian networks and augmented expression data

    Get PDF
    Funding: This work was supported by the European Union’s Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie grant agreement No 812777.Bayesian networks represent a useful tool to explore interactions within biological systems. The aims of this study were to identify a reduced number of genes associated with a stress condition in chickens (Gallus gallus) and to unravel their interactions by implementing a Bayesian network approach. Initially, one publicly available dataset (3 control vs 3 heat-stressed chickens) was used to identify the stress signal, represented by 25 differentially expressed genes (DEGs). The dataset was augmented by looking for the 25 DEGs in other four publicly available databases. Bayesian network algorithms were used to discover the informative relationships between the DEGs. Only ten out of the 25 DEGs displayed interactions. Four of them were Heat Shock Proteins that could be playing a key role, especially under stress conditions, where maintaining the correct functioning of the cell machinery might be crucial. One of the DEGs is an open reading frame whose function is yet unknown, highlighting the power of Bayesian networks in knowledge discovery. Identifying an initial stress signal, augmenting it by combining other databases, and finally learning the structure of Bayesian networks allowed us to find genes closely related to stress, with the possibility of further exploring the system in future studies.Peer reviewe