18,053 research outputs found
Rerepresenting and Restructuring Domain Theories: A Constructive Induction Approach
Theory revision integrates inductive learning and background knowledge by
combining training examples with a coarse domain theory to produce a more
accurate theory. There are two challenges that theory revision and other
theory-guided systems face. First, a representation language appropriate for
the initial theory may be inappropriate for an improved theory. While the
original representation may concisely express the initial theory, a more
accurate theory forced to use that same representation may be bulky,
cumbersome, and difficult to reach. Second, a theory structure suitable for a
coarse domain theory may be insufficient for a fine-tuned theory. Systems that
produce only small, local changes to a theory have limited value for
accomplishing complex structural alterations that may be required.
Consequently, advanced theory-guided learning systems require flexible
representation and flexible structure. An analysis of various theory revision
systems and theory-guided learning systems reveals specific strengths and
weaknesses in terms of these two desired properties. Designed to capture the
underlying qualities of each system, a new system uses theory-guided
constructive induction. Experiments in three domains show improvement over
previous theory-guided systems. This leads to a study of the behavior,
limitations, and potential of theory-guided constructive induction.Comment: See http://www.jair.org/ for an online appendix and other files
accompanying this articl
On the Informativeness of the DNA Promoter Sequences Domain Theory
The DNA promoter sequences domain theory and database have become popular for
testing systems that integrate empirical and analytical learning. This note
reports a simple change and reinterpretation of the domain theory in terms of
M-of-N concepts, involving no learning, that results in an accuracy of 93.4% on
the 106 items of the database. Moreover, an exhaustive search of the space of
M-of-N domain theory interpretations indicates that the expected accuracy of a
randomly chosen interpretation is 76.5%, and that a maximum accuracy of 97.2%
is achieved in 12 cases. This demonstrates the informativeness of the domain
theory, without the complications of understanding the interactions between
various learning algorithms and the theory. In addition, our results help
characterize the difficulty of learning using the DNA promoters theory.Comment: See http://www.jair.org/ for any accompanying file
Protein folding using contact maps
We present the development of the idea to use dynamics in the space of
contact maps as a computational approach to the protein folding problem. We
first introduce two important technical ingredients, the reconstruction of a
three dimensional conformation from a contact map and the Monte Carlo dynamics
in contact map space. We then discuss two approximations to the free energy of
the contact maps and a method to derive energy parameters based on perceptron
learning. Finally we present results, first for predictions based on threading
and then for energy minimization of crambin and of a set of 6 immunoglobulins.
The main result is that we proved that the two simple approximations we studied
for the free energy are not suitable for protein folding. Perspectives are
discussed in the last section.Comment: 29 pages, 10 figure
Exploration of Reaction Pathways and Chemical Transformation Networks
For the investigation of chemical reaction networks, the identification of
all relevant intermediates and elementary reactions is mandatory. Many
algorithmic approaches exist that perform explorations efficiently and
automatedly. These approaches differ in their application range, the level of
completeness of the exploration, as well as the amount of heuristics and human
intervention required. Here, we describe and compare the different approaches
based on these criteria. Future directions leveraging the strengths of chemical
heuristics, human interaction, and physical rigor are discussed.Comment: 48 pages, 4 figure
Bayesian models and algorithms for protein beta-sheet prediction
Prediction of the three-dimensional structure greatly benefits from the information related to secondary structure, solvent accessibility, and non-local contacts that stabilize a protein's structure. Prediction of such components is vital to our understanding of the structure and function of a protein. In this paper, we address the problem of beta-sheet prediction. We introduce a Bayesian approach for proteins with six or less beta-strands, in which we model the conformational features in a probabilistic framework. To select the optimum architecture, we analyze the space of possible conformations by efficient heuristics. Furthermore, we employ an algorithm that finds the optimum pairwise alignment between beta-strands using dynamic programming. Allowing any number of gaps in an alignment enables us to model beta-bulges more effectively. Though our main focus is proteins with six or less beta-strands, we are also able to perform predictions for proteins with more than six beta-strands by combining the predictions of BetaPro with the gapped alignment algorithm. We evaluated the accuracy of our method and BetaPro. We performed a 10-fold cross validation experiment on the BetaSheet916 set and we obtained significant improvements in the prediction accuracy
Thermodynamic graph-rewriting
We develop a new thermodynamic approach to stochastic graph-rewriting. The
ingredients are a finite set of reversible graph-rewriting rules called
generating rules, a finite set of connected graphs P called energy patterns and
an energy cost function. The idea is that the generators define the qualitative
dynamics, by showing which transformations are possible, while the energy
patterns and cost function specify the long-term probability of any
reachable graph. Given the generators and energy patterns, we construct a
finite set of rules which (i) has the same qualitative transition system as the
generators; and (ii) when equipped with suitable rates, defines a
continuous-time Markov chain of which is the unique fixed point. The
construction relies on the use of site graphs and a technique of `growth
policy' for quantitative rule refinement which is of independent interest. This
division of labour between the qualitative and long-term quantitative aspects
of the dynamics leads to intuitive and concise descriptions for realistic
models (see the examples in S4 and S5). It also guarantees thermodynamical
consistency (AKA detailed balance), otherwise known to be undecidable, which is
important for some applications. Finally, it leads to parsimonious
parameterizations of models, again an important point in some applications
- …