Search CORE

18,053 research outputs found

Rerepresenting and Restructuring Domain Theories: A Constructive Induction Approach

Author: Donoho S. K.
Rendell L. A.
Publication venue
Publication date: 01/01/1995
Field of study

Theory revision integrates inductive learning and background knowledge by combining training examples with a coarse domain theory to produce a more accurate theory. There are two challenges that theory revision and other theory-guided systems face. First, a representation language appropriate for the initial theory may be inappropriate for an improved theory. While the original representation may concisely express the initial theory, a more accurate theory forced to use that same representation may be bulky, cumbersome, and difficult to reach. Second, a theory structure suitable for a coarse domain theory may be insufficient for a fine-tuned theory. Systems that produce only small, local changes to a theory have limited value for accomplishing complex structural alterations that may be required. Consequently, advanced theory-guided learning systems require flexible representation and flexible structure. An analysis of various theory revision systems and theory-guided learning systems reveals specific strengths and weaknesses in terms of these two desired properties. Designed to capture the underlying qualities of each system, a new system uses theory-guided constructive induction. Experiments in three domains show improvement over previous theory-guided systems. This leads to a study of the behavior, limitations, and potential of theory-guided constructive induction.Comment: See http://www.jair.org/ for an online appendix and other files accompanying this articl

arXiv.org e-Print Archive

CiteSeerX

On the Informativeness of the DNA Promoter Sequences Domain Theory

Author: Ortega J.
Publication venue
Publication date: 01/01/1995
Field of study

The DNA promoter sequences domain theory and database have become popular for testing systems that integrate empirical and analytical learning. This note reports a simple change and reinterpretation of the domain theory in terms of M-of-N concepts, involving no learning, that results in an accuracy of 93.4% on the 106 items of the database. Moreover, an exhaustive search of the space of M-of-N domain theory interpretations indicates that the expected accuracy of a randomly chosen interpretation is 76.5%, and that a maximum accuracy of 97.2% is achieved in 12 cases. This demonstrates the informativeness of the domain theory, without the complications of understanding the interactions between various learning algorithms and the theory. In addition, our results help characterize the difficulty of learning using the DNA promoters theory.Comment: See http://www.jair.org/ for any accompanying file

arXiv.org e-Print Archive

CiteSeerX

Protein folding using contact maps

Author: A. Bairoch
A. M. Gutin
A. Sali
A. T. Brünger
A. V. Finkelstein
C. J. Camacho
C. Micheletti
D. A. Hinds
D. Nabutovsky
E. Domany
E. I. Shakhnovich
F. Seno
H. Frauenfelder
H. Frauenkron
H. Li
H. Li
K. D. Klimov
K. F. Lau
L. Mirny
L. Mirny
M. H. Hao
M. L. Minsky
M. Vendruscolo
M. Vendruscolo
M. Vendruscolo
M. Vendruscolo
P. D. Thomas
R. L. Jernigan
R. Najmanovich
S. Miyazawa
T. Garel
T. Garel
V. S. Pande
V. S. Pande
Publication venue: 'American Physical Society (APS)'
Publication date: 21/01/1999
Field of study

We present the development of the idea to use dynamics in the space of contact maps as a computational approach to the protein folding problem. We first introduce two important technical ingredients, the reconstruction of a three dimensional conformation from a contact map and the Monte Carlo dynamics in contact map space. We then discuss two approximations to the free energy of the contact maps and a method to derive energy parameters based on perceptron learning. Finally we present results, first for predictions based on threading and then for energy minimization of crambin and of a set of 6 immunoglobulins. The main result is that we proved that the two simple approximations we studied for the free energy are not suitable for protein folding. Perspectives are discussed in the last section.Comment: 29 pages, 10 figure

arXiv.org e-Print Archive

Crossref

Exploration of Reaction Pathways and Chemical Transformation Networks

Author: Reiher Markus
Simm Gregor N.
Vaucher Alain C.
Publication venue: 'American Chemical Society (ACS)'
Publication date: 03/12/2018
Field of study

For the investigation of chemical reaction networks, the identification of all relevant intermediates and elementary reactions is mandatory. Many algorithmic approaches exist that perform explorations efficiently and automatedly. These approaches differ in their application range, the level of completeness of the exploration, as well as the amount of heuristics and human intervention required. Here, we describe and compare the different approaches based on these criteria. Future directions leveraging the strengths of chemical heuristics, human interaction, and physical rigor are discussed.Comment: 48 pages, 4 figure

arXiv.org e-Print Archive

Repository for Publications and Research Data

Bayesian models and algorithms for protein beta-sheet prediction

Author: Altunbasak Yucel
Altunbaşak Yücel
Aydın Zafer
Aydin Zafer
Erdogan Hakan
Erdoğan Hakan
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/03/2011
Field of study

Prediction of the three-dimensional structure greatly benefits from the information related to secondary structure, solvent accessibility, and non-local contacts that stabilize a protein's structure. Prediction of such components is vital to our understanding of the structure and function of a protein. In this paper, we address the problem of beta-sheet prediction. We introduce a Bayesian approach for proteins with six or less beta-strands, in which we model the conformational features in a probabilistic framework. To select the optimum architecture, we analyze the space of possible conformations by efficient heuristics. Furthermore, we employ an algorithm that finds the optimum pairwise alignment between beta-strands using dynamic programming. Allowing any number of gaps in an alignment enables us to model beta-bulges more effectively. Though our main focus is proteins with six or less beta-strands, we are also able to perform predictions for proteins with more than six beta-strands by combining the predictions of BetaPro with the gapped alignment algorithm. We evaluated the accuracy of our method and BetaPro. We performed a 10-fold cross validation experiment on the BetaSheet916 set and we obtained significant improvements in the prediction accuracy

Sabanci University Research Database

Thermodynamic graph-rewriting

Author: F. Bai
J. Krivine
J.A. Bachman
J.R. Faeder
N. Metropolis
O. Bournez
R. Heckel
S. Lack
V. Danos
V. Danos
V. Danos
Publication venue: 'Logical Methods in Computer Science e.V.'
Publication date: 01/01/2013
Field of study

We develop a new thermodynamic approach to stochastic graph-rewriting. The ingredients are a finite set of reversible graph-rewriting rules called generating rules, a finite set of connected graphs P called energy patterns and an energy cost function. The idea is that the generators define the qualitative dynamics, by showing which transformations are possible, while the energy patterns and cost function specify the long-term probability

\pi

of any reachable graph. Given the generators and energy patterns, we construct a finite set of rules which (i) has the same qualitative transition system as the generators; and (ii) when equipped with suitable rates, defines a continuous-time Markov chain of which

\pi

is the unique fixed point. The construction relies on the use of site graphs and a technique of `growth policy' for quantitative rule refinement which is of independent interest. This division of labour between the qualitative and long-term quantitative aspects of the dynamics leads to intuitive and concise descriptions for realistic models (see the examples in S4 and S5). It also guarantees thermodynamical consistency (AKA detailed balance), otherwise known to be undecidable, which is important for some applications. Finally, it leads to parsimonious parameterizations of models, again an important point in some applications

arXiv.org e-Print Archive

HAL-ENS-LYON

CiteSeerX

Crossref

INRIA a CCSD electronic archive server

Hal-Diderot