13 research outputs found

    Prediction of DNA-binding propensity of proteins by the ball-histogram method using automatic template search

    Get PDF
    We contribute a novel, ball-histogram approach to DNA-binding propensity prediction of proteins. Unlike state-of-the-art methods based on constructing an ad-hoc set of features describing physicochemical properties of the proteins, the ball-histogram technique enables a systematic, Monte-Carlo exploration of the spatial distribution of amino acids complying with automatically selected properties. This exploration yields a model for the prediction of DNA binding propensity. We validate our method in prediction experiments, improving on state-of-the-art accuracies. Moreover, our method also provides interpretable features involving spatial distributions of selected amino acids

    Studying the Functional Genomics of Stress Responses in Loblolly Pine With the Expresso Microarray Experiment Management System

    Get PDF
    Conception, design, and implementation of cDNA microarray experiments present a variety of bioinformatics challenges for biologists and computational scientists. The multiple stages of data acquisition and analysis have motivated the design of Expresso, a system for microarray experiment management. Salient aspects of Expresso include support for clone replication and randomized placement; automatic gridding, extraction of expression data from each spot, and quality monitoring; flexible methods of combining data from individual spots into information about clones and functional categories; and the use of inductive logic programming for higher-level data analysis and mining. The development of Expresso is occurring in parallel with several generations of microarray experiments aimed at elucidating genomic responses to drought stress in loblolly pine seedlings. The current experimental design incorporates 384 pine cDNAs replicated and randomly placed in two specific microarray layouts. We describe the design of Expresso as well as results of analysis with Expresso that suggest the importance of molecular chaperones and membrane transport proteins in mechanisms conferring successful adaptation to long-term drought stress

    Compositional Mining of Multi-Relational Biological Datasets

    Get PDF
    High-throughput biological screens are yielding ever-growing streams of information about multiple aspects of cellular activity. As more and more categories of datasets come online, there is a corresponding multitude of ways in which inferences can be chained across them, motivating the need for compositional data mining algorithms. In this paper, we argue that such compositional data mining can be effectively realized by functionally cascading redescription mining and biclustering algorithms as primitives. Both these primitives mirror shifts of vocabulary that can be composed in arbitrary ways to create rich chains of inferences. Given a relational database and its schema, we show how the schema can be automatically compiled into a compositional data mining program, and how different domains in the schema can be related through logical sequences of biclustering and redescription invocations. This feature allows us to rapidly prototype new data mining applications, yielding greater understanding of scientific datasets. We describe two applications of compositional data mining: (i) matching terms across categories of the Gene Ontology and (ii) understanding the molecular mechanisms underlying stress response in human cells

    Naive Bayesian Classification of Structured Data

    Full text link

    An Extended Transformation Approach to Inductive Logic Programming

    No full text
    this paper we show how this limitation can be overcome, by systematic first-order feature construction using a particular individual-centered feature bias. The approach can be applied in any domain where there is a clear notion of individual. We also show how to improve upon exhaustive first-order feature construction by using a relevancy filter. The proposed approach is illustrated on the "trains" and "mutagenesis" ILP domain
    corecore