10,436 research outputs found

    Data-Driven Theory Refinement Algorithms for Bioinformatics

    Get PDF
    Bioinformatics and related applications call for efficient algorithms for knowledge intensive learning and data driven knowledge refinement. Knowledge based artificial neural networks offer an attractive approach to extending or modifying incomplete knowledge bases or domain theories. We present results of experiments with several such algorithms for data driven knowledge discovery and theory refinement in some simple bioinformatics applications. Results of experiments on the ribosome binding site and promoter site identification problems indicate that the performance of KBDistAl and Tiling Pyramid algorithms compares quite favorably with those of substantially more computationally demanding techniques

    Recovering complete and draft population genomes from metagenome datasets.

    Get PDF
    Assembly of metagenomic sequence data into microbial genomes is of fundamental value to improving our understanding of microbial ecology and metabolism by elucidating the functional potential of hard-to-culture microorganisms. Here, we provide a synthesis of available methods to bin metagenomic contigs into species-level groups and highlight how genetic diversity, sequencing depth, and coverage influence binning success. Despite the computational cost on application to deeply sequenced complex metagenomes (e.g., soil), covarying patterns of contig coverage across multiple datasets significantly improves the binning process. We also discuss and compare current genome validation methods and reveal how these methods tackle the problem of chimeric genome bins i.e., sequences from multiple species. Finally, we explore how population genome assembly can be used to uncover biogeographic trends and to characterize the effect of in situ functional constraints on the genome-wide evolution

    Multilevel Weighted Support Vector Machine for Classification on Healthcare Data with Missing Values

    Full text link
    This work is motivated by the needs of predictive analytics on healthcare data as represented by Electronic Medical Records. Such data is invariably problematic: noisy, with missing entries, with imbalance in classes of interests, leading to serious bias in predictive modeling. Since standard data mining methods often produce poor performance measures, we argue for development of specialized techniques of data-preprocessing and classification. In this paper, we propose a new method to simultaneously classify large datasets and reduce the effects of missing values. It is based on a multilevel framework of the cost-sensitive SVM and the expected maximization imputation method for missing values, which relies on iterated regression analyses. We compare classification results of multilevel SVM-based algorithms on public benchmark datasets with imbalanced classes and missing values as well as real data in health applications, and show that our multilevel SVM-based method produces fast, and more accurate and robust classification results.Comment: arXiv admin note: substantial text overlap with arXiv:1503.0625

    An empirical comparison of supervised machine learning techniques in bioinformatics

    Get PDF
    Research in bioinformatics is driven by the experimental data. Current biological databases are populated by vast amounts of experimental data. Machine learning has been widely applied to bioinformatics and has gained a lot of success in this research area. At present, with various learning algorithms available in the literature, researchers are facing difficulties in choosing the best method that can apply to their data. We performed an empirical study on 7 individual learning systems and 9 different combined methods on 4 different biological data sets, and provide some suggested issues to be considered when answering the following questions: (i) How does one choose which algorithm is best suitable for their data set? (ii) Are combined methods better than a single approach? (iii) How does one compare the effectiveness of a particular algorithm to the others

    Language-based Abstractions for Dynamical Systems

    Get PDF
    Ordinary differential equations (ODEs) are the primary means to modelling dynamical systems in many natural and engineering sciences. The number of equations required to describe a system with high heterogeneity limits our capability of effectively performing analyses. This has motivated a large body of research, across many disciplines, into abstraction techniques that provide smaller ODE systems while preserving the original dynamics in some appropriate sense. In this paper we give an overview of a recently proposed computer-science perspective to this problem, where ODE reduction is recast to finding an appropriate equivalence relation over ODE variables, akin to classical models of computation based on labelled transition systems.Comment: In Proceedings QAPL 2017, arXiv:1707.0366

    Protein docking refinement by convex underestimation in the low-dimensional subspace of encounter complexes

    Get PDF
    We propose a novel stochastic global optimization algorithm with applications to the refinement stage of protein docking prediction methods. Our approach can process conformations sampled from multiple clusters, each roughly corresponding to a different binding energy funnel. These clusters are obtained using a density-based clustering method. In each cluster, we identify a smooth “permissive” subspace which avoids high-energy barriers and then underestimate the binding energy function using general convex polynomials in this subspace. We use the underestimator to bias sampling towards its global minimum. Sampling and subspace underestimation are repeated several times and the conformations sampled at the last iteration form a refined ensemble. We report computational results on a comprehensive benchmark of 224 protein complexes, establishing that our refined ensemble significantly improves the quality of the conformations of the original set given to the algorithm. We also devise a method to enhance the ensemble from which near-native models are selected.Published versio

    Computational structure‐based drug design: Predicting target flexibility

    Get PDF
    The role of molecular modeling in drug design has experienced a significant revamp in the last decade. The increase in computational resources and molecular models, along with software developments, is finally introducing a competitive advantage in early phases of drug discovery. Medium and small companies with strong focus on computational chemistry are being created, some of them having introduced important leads in drug design pipelines. An important source for this success is the extraordinary development of faster and more efficient techniques for describing flexibility in three‐dimensional structural molecular modeling. At different levels, from docking techniques to atomistic molecular dynamics, conformational sampling between receptor and drug results in improved predictions, such as screening enrichment, discovery of transient cavities, etc. In this review article we perform an extensive analysis of these modeling techniques, dividing them into high and low throughput, and emphasizing in their application to drug design studies. We finalize the review with a section describing our Monte Carlo method, PELE, recently highlighted as an outstanding advance in an international blind competition and industrial benchmarks.We acknowledge the BSC-CRG-IRB Joint Research Program in Computational Biology. This work was supported by a grant from the Spanish Government CTQ2016-79138-R.J.I. acknowledges support from SVP-2014-068797, awarded by the Spanish Government.Peer ReviewedPostprint (author's final draft

    LightDock: a new multi-scale approach to protein–protein docking

    Get PDF
    Computational prediction of protein–protein complex structure by docking can provide structural and mechanistic insights for protein interactions of biomedical interest. However, current methods struggle with difficult cases, such as those involving flexible proteins, low-affinity complexes or transient interactions. A major challenge is how to efficiently sample the structural and energetic landscape of the association at different resolution levels, given that each scoring function is often highly coupled to a specific type of search method. Thus, new methodologies capable of accommodating multi-scale conformational flexibility and scoring are strongly needed. We describe here a new multi-scale protein–protein docking methodology, LightDock, capable of accommodating conformational flexibility and a variety of scoring functions at different resolution levels. Implicit use of normal modes during the search and atomic/coarse-grained combined scoring functions yielded improved predictive results with respect to state-of-the-art rigid-body docking, especially in flexible cases.B.J-G was supported by a FPI fellowship from the Spanish Ministry of Economy and Competitiveness. This work was supported by I+D+I Research Project grants BIO2013-48213-R and BIO2016-79930-R from the Spanish Ministry of Economy and Competitiveness. This work is partially supported by the European Union H2020 program through HiPEAC (GA 687698), by the Spanish Government through Programa Severo Ochoa (SEV-2015-0493), by the Spanish Ministry of Science and Technology (TIN2015-65316-P) and the Departament d’Innovació, Universitats i Empresa de la Generalitat de Catalunya, under project MPEXPAR: Models de Programaciói Entorns d’Execució Paral·lels (2014-SGR-1051).Peer ReviewedPostprint (author's final draft
    corecore