83 research outputs found

    Module detection in complex networks using integer optimisation

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The detection of <it>modules or community structure </it>is widely used to reveal the underlying properties of complex networks in biology, as well as physical and social sciences. Since the adoption of modularity as a measure of network topological properties, several methodologies for the discovery of community structure based on modularity maximisation have been developed. However, satisfactory partitions of large graphs with modest computational resources are particularly challenging due to the NP-hard nature of the related optimisation problem. Furthermore, it has been suggested that optimising the modularity metric can reach a resolution limit whereby the algorithm fails to detect smaller communities than a specific size in large networks.</p> <p>Results</p> <p>We present a novel solution approach to identify community structure in large complex networks and address resolution limitations in module detection. The proposed algorithm employs modularity to express network community structure and it is based on mixed integer optimisation models. The solution procedure is extended through an iterative procedure to diminish effects that tend to agglomerate smaller modules (resolution limitations).</p> <p>Conclusions</p> <p>A comprehensive comparative analysis of methodologies for module detection based on modularity maximisation shows that our approach outperforms previously reported methods. Furthermore, in contrast to previous reports, we propose a strategy to handle resolution limitations in modularity maximisation. Overall, we illustrate ways to improve existing methodologies for community structure identification so as to increase its efficiency and applicability.</p

    Protein phosphorylation prediction: limitations, merits and pitfalls

    Get PDF
    Protein phosphorylation is a major protein post-translational modification process that plays a pivotal role in numerous cellular processes, such as recognition, signaling or degradation. It can be studied experimentally by various methodologies, including western blot analysis, site-directed mutagenesis, 2D gel electrophoresis, mass spectrometry etc. A number of in silico tools have also been developed in order to predict plausible phosphorylation sites in a given protein. In this review, we conducted a benchmark study including the leading protein phosphorylation prediction software, in an effort to determine which performs best. The first place was taken by GPS 2.2, having predicted all phosphorylation sites with a 83% fidelity while in second place came NetPhos 2.0 with 69%. Ā 

    Optimal Piecewise Linear Regression Algorithm for QSAR Modelling

    Get PDF
    Quantitative Structureā€Activity Relationship (QSAR) models have been successfully applied to lead optimisation, virtual screening and other areas of drug discovery over the years. Recent studies, however, have focused on the development of models that are predictive but often not interpretable. In this article, we propose the application of a piecewise linear regression algorithm, OPLRAreg, to develop both predictive and interpretable QSAR models. The algorithm determines a feature to best separate the data into regions and identifies linear equations to predict the outcome variable in each region. A regularisation term is introduced to prevent overfitting problems and implicitly selects the most informative features. As OPLRAreg is based on mathematical programming, a flexible and transparent representation for optimisation problems, the algorithm also permits customised constraints to be easily added to the model. The proposed algorithm is presented as a more interpretable alternative to other commonly used machine learning algorithms and has shown comparable predictive accuracy to Random Forest, Support Vector Machine and Random Generalised Linear Model on tests with five QSAR data sets compiled from the ChEMBL database

    Optimisation Models for Pathway Activity Inference in Cancer

    Get PDF
    BACKGROUND: With advances in high-throughput technologies, there has been an enormous increase in data related to profiling the activity of molecules in disease. While such data provide more comprehensive information on cellular actions, their large volume and complexity pose difficulty in accurate classification of disease phenotypes. Therefore, novel modelling methods that can improve accuracy while offering interpretable means of analysis are required. Biological pathways can be used to incorporate a priori knowledge of biological interactions to decrease data dimensionality and increase the biological interpretability of machine learning models. METHODOLOGY: A mathematical optimisation model is proposed for pathway activity inference towards precise disease phenotype prediction and is applied to RNA-Seq datasets. The model is based on mixed-integer linear programming (MILP) mathematical optimisation principles and infers pathway activity as the linear combination of pathway member gene expression, multiplying expression values with model-determined gene weights that are optimised to maximise discrimination of phenotype classes and minimise incorrect sample allocation. RESULTS: The model is evaluated on the transcriptome of breast and colorectal cancer, and exhibits solution results of good optimality as well as good prediction performance on related cancer subtypes. Two baseline pathway activity inference methods and three advanced methods are used for comparison. Sample prediction accuracy, robustness against noise expression data, and survival analysis suggest competitive prediction performance of our model while providing interpretability and insight on key pathways and genes. Overall, our work demonstrates that the flexible nature of mathematical programming lends itself well to developing efficient computational strategies for pathway activity inference and disease subtype prediction

    Three nontrivial solutions for Neumann problems resonant at any positive eigenvalue

    Get PDF
    We consider a semilinear Neumann problem with a parametric reaction which has a concave term and a perturbation which at Ā±āˆž can be resonant with respect to any positive eigenvalue. Using variational methods based on the critical point theory and Morse theory, we show that there exists a critical parameter value Ī» āˆ— > 0 such that if Ī» āˆˆ(0, Ī» āˆ— ), then the problem has at least three nontrivial smooth solutions

    A series of Notch3 mutations in CADASIL; insights from 3D molecular modelling and evolutionary analyses

    Get PDF
    CADASILĀ disease belongs to the group of rare diseases. It is well established that the Notch3 protein is primarily responsible for the development of CADASIL syndrome. Herein, we attempt to shed light to the actual molecular mechanism underlyingĀ CADASILĀ via insights that we have from preliminaryĀ in silicoĀ and proteomics studies on the Notch3 protein. At the moment, we are aware of a series of Notch3 point mutations that promote CADASIL. In this direction, we investigate the nature, extent, physicochemical and structural significance of the mutant species in an effort to identify the underlying mechanism of Notch3 role and implications in cell signal transduction. Overall, ourĀ in silicoĀ study has revealed a rather complex molecular mechanism of Notch3 on the structural level; depending of the nature and position of each mutation, a consensus significant loss of beta-sheet structure is observed throughout allĀ in silicoĀ modeled mutant/wild type biological systems

    Molecular dynamics simulations through GPU video games technologies

    Get PDF
    Bioinformatics is the scientific field that focuses on the application of computer technology to the management of biological information. Over the years, bioinformatics applications have been used to store, process and integrate biological and genetic information, using a wide range of methodologies. One of the most de novo techniques used to understand the physical movements of atoms and molecules is molecular dynamics (MD). MD is an in silico method to simulate the physical motions of atoms and molecules under certain conditions. This has become a state strategic technique and now plays a key role in many areas of exact sciences, such as chemistry, biology, physics and medicine. Due to their complexity, MD calculations could require enormous amounts of computer memory and time and therefore their execution has been a big problem. Despite the huge computational cost, molecular dynamics have been implemented using traditional computers with a central memory unit (CPU). A graphics processing unit (GPU) computing technology was first designed with the goal to improve video games, by rapidly creating and displaying images in a frame buffer such as screens. The hybrid GPU-CPU implementation, combined with parallel computing is a novel technology to perform a wide range of calculations. GPUs have been proposed and used to accelerate many scientific computations including MD simulations. Herein, we describe the new methodologies developed initially as video games and how they are now applied in MD simulations

    Dogs are more permissive than cats or guinea pigs to experimental infection with a human isolate of Bartonella rochalimae

    Get PDF
    Bartonella rochalimae was first isolated from the blood of a human who traveled to Peru and was exposed to multiple insect bites. Foxes and dogs are likely natural reservoirs for this bacterium. We report the results of experimental inoculation of two dogs, five cats and six guinea pigs with the only human isolate of this new Bartonella species. Both dogs became bacteremic for 5ā€“7 weeks, with a peak of 103ā€“104 colony forming units (CFU)/mL blood. Three cats had low bacteremia levels (<Ā 200Ā CFU/mL) of 6ā€“8 weeksā€™ duration. One cat that remained seronegative had two bacterial colonies isolated at a single culture time point. A fifth cat never became bacteremic, but seroconverted. None of the guinea pigs became bacteremic, but five seroconverted. These results suggest that dogs could be a reservoir of this strain of B. rochalimae, in contrast to cats and guinea pigs

    A Regression Tree Approach using Mathematical Programming

    Get PDF
    Regression analysis is a machine learning approach that aims to accurately predict the value of continuous output variables from certain independent input variables, via automatic estimation of their latent relationship from data. Tree-based regression models are popular in literature due to their flexibility to model higher order non-linearity and great interpretability. Conventionally, regression tree models are trained in a two-stage procedure, i.e. recursive binary partitioning is employed to produce a tree structure, followed by a pruning process of removing insignificant leaves, with the possibility of assigning multivariate functions to terminal leaves to improve generalisation. This work introduces a novel methodology of node partitioning which, in a single optimisation model, simultaneously performs the two tasks of identifying the break-point of a binary split and assignment of multivariate functions to either leaf, thus leading to an efficient regression tree model. Using six real world benchmark problems, we demonstrate that the proposed method consistently outperforms a number of state-of-the-art regression tree models and methods based on other techniques, with an average improvement of 7ā€“60% on the mean absolute errors (MAE) of the predictions
    • ā€¦
    corecore