18,053 research outputs found

    Rerepresenting and Restructuring Domain Theories: A Constructive Induction Approach

    Full text link
    Theory revision integrates inductive learning and background knowledge by combining training examples with a coarse domain theory to produce a more accurate theory. There are two challenges that theory revision and other theory-guided systems face. First, a representation language appropriate for the initial theory may be inappropriate for an improved theory. While the original representation may concisely express the initial theory, a more accurate theory forced to use that same representation may be bulky, cumbersome, and difficult to reach. Second, a theory structure suitable for a coarse domain theory may be insufficient for a fine-tuned theory. Systems that produce only small, local changes to a theory have limited value for accomplishing complex structural alterations that may be required. Consequently, advanced theory-guided learning systems require flexible representation and flexible structure. An analysis of various theory revision systems and theory-guided learning systems reveals specific strengths and weaknesses in terms of these two desired properties. Designed to capture the underlying qualities of each system, a new system uses theory-guided constructive induction. Experiments in three domains show improvement over previous theory-guided systems. This leads to a study of the behavior, limitations, and potential of theory-guided constructive induction.Comment: See http://www.jair.org/ for an online appendix and other files accompanying this articl

    On the Informativeness of the DNA Promoter Sequences Domain Theory

    Full text link
    The DNA promoter sequences domain theory and database have become popular for testing systems that integrate empirical and analytical learning. This note reports a simple change and reinterpretation of the domain theory in terms of M-of-N concepts, involving no learning, that results in an accuracy of 93.4% on the 106 items of the database. Moreover, an exhaustive search of the space of M-of-N domain theory interpretations indicates that the expected accuracy of a randomly chosen interpretation is 76.5%, and that a maximum accuracy of 97.2% is achieved in 12 cases. This demonstrates the informativeness of the domain theory, without the complications of understanding the interactions between various learning algorithms and the theory. In addition, our results help characterize the difficulty of learning using the DNA promoters theory.Comment: See http://www.jair.org/ for any accompanying file

    Protein folding using contact maps

    Full text link
    We present the development of the idea to use dynamics in the space of contact maps as a computational approach to the protein folding problem. We first introduce two important technical ingredients, the reconstruction of a three dimensional conformation from a contact map and the Monte Carlo dynamics in contact map space. We then discuss two approximations to the free energy of the contact maps and a method to derive energy parameters based on perceptron learning. Finally we present results, first for predictions based on threading and then for energy minimization of crambin and of a set of 6 immunoglobulins. The main result is that we proved that the two simple approximations we studied for the free energy are not suitable for protein folding. Perspectives are discussed in the last section.Comment: 29 pages, 10 figure

    Exploration of Reaction Pathways and Chemical Transformation Networks

    Full text link
    For the investigation of chemical reaction networks, the identification of all relevant intermediates and elementary reactions is mandatory. Many algorithmic approaches exist that perform explorations efficiently and automatedly. These approaches differ in their application range, the level of completeness of the exploration, as well as the amount of heuristics and human intervention required. Here, we describe and compare the different approaches based on these criteria. Future directions leveraging the strengths of chemical heuristics, human interaction, and physical rigor are discussed.Comment: 48 pages, 4 figure

    Bayesian models and algorithms for protein beta-sheet prediction

    Get PDF
    Prediction of the three-dimensional structure greatly benefits from the information related to secondary structure, solvent accessibility, and non-local contacts that stabilize a protein's structure. Prediction of such components is vital to our understanding of the structure and function of a protein. In this paper, we address the problem of beta-sheet prediction. We introduce a Bayesian approach for proteins with six or less beta-strands, in which we model the conformational features in a probabilistic framework. To select the optimum architecture, we analyze the space of possible conformations by efficient heuristics. Furthermore, we employ an algorithm that finds the optimum pairwise alignment between beta-strands using dynamic programming. Allowing any number of gaps in an alignment enables us to model beta-bulges more effectively. Though our main focus is proteins with six or less beta-strands, we are also able to perform predictions for proteins with more than six beta-strands by combining the predictions of BetaPro with the gapped alignment algorithm. We evaluated the accuracy of our method and BetaPro. We performed a 10-fold cross validation experiment on the BetaSheet916 set and we obtained significant improvements in the prediction accuracy

    Thermodynamic graph-rewriting

    Get PDF
    We develop a new thermodynamic approach to stochastic graph-rewriting. The ingredients are a finite set of reversible graph-rewriting rules called generating rules, a finite set of connected graphs P called energy patterns and an energy cost function. The idea is that the generators define the qualitative dynamics, by showing which transformations are possible, while the energy patterns and cost function specify the long-term probability π\pi of any reachable graph. Given the generators and energy patterns, we construct a finite set of rules which (i) has the same qualitative transition system as the generators; and (ii) when equipped with suitable rates, defines a continuous-time Markov chain of which π\pi is the unique fixed point. The construction relies on the use of site graphs and a technique of `growth policy' for quantitative rule refinement which is of independent interest. This division of labour between the qualitative and long-term quantitative aspects of the dynamics leads to intuitive and concise descriptions for realistic models (see the examples in S4 and S5). It also guarantees thermodynamical consistency (AKA detailed balance), otherwise known to be undecidable, which is important for some applications. Finally, it leads to parsimonious parameterizations of models, again an important point in some applications
    corecore