9 research outputs found

    Predicting zinc binding at the proteome level

    Get PDF
    BACKGROUND: Metalloproteins are proteins capable of binding one or more metal ions, which may be required for their biological function, for regulation of their activities or for structural purposes. Metal-binding properties remain difficult to predict as well as to investigate experimentally at the whole-proteome level. Consequently, the current knowledge about metalloproteins is only partial. RESULTS: The present work reports on the development of a machine learning method for the prediction of the zinc-binding state of pairs of nearby amino-acids, using predictors based on support vector machines. The predictor was trained using chains containing zinc-binding sites and non-metalloproteins in order to provide positive and negative examples. Results based on strong non-redundancy tests prove that (1) zinc-binding residues can be predicted and (2) modelling the correlation between the binding state of nearby residues significantly improves performance. The trained predictor was then applied to the human proteome. The present results were in good agreement with the outcomes of previous, highly manually curated, efforts for the identification of human zinc-binding proteins. Some unprecedented zinc-binding sites could be identified, and were further validated through structural modelling. The software implementing the predictor is freely available at: CONCLUSION: The proposed approach constitutes a highly automated tool for the identification of metalloproteins, which provides results of comparable quality with respect to highly manually refined predictions. The ability to model correlations between pairwise residues allows it to obtain a significant improvement over standard 1D based approaches. In addition, the method permits the identification of unprecedented metal sites, providing important hints for the work of experimentalists

    Weighted decomposition kernels

    No full text
    We introduce a family of kernels on discrete data structures within the general class of decomposition kernels. A weighted decomposition kernel (WDK) is computed by dividing objects into substructures indexed by a selector. Two substructures are then matched if their selectors satisfy an equality predicate, while the importance of the match is determined by a probability kernel on local distributions fitted on the substructures. Under reasonable assumptions, a WDK can be computed efficiently and can avoid combinatorial explosion of the feature space. We report experimental evidence that the proposed kernel is highly competitive with respect to more complex state-of-the-art methods on a set of problems in bioinformatics

    Wide coverage natural language processing using kernel methods and neural networks for structured data

    No full text
    Convolution kernels and recursive neural networks are both suitable approaches for supervised learning when the input is a discrete structure like a labeled tree or graph. We compare these techniques in two natural language problems. In both problems, the learning task consists in choosing the best alternative tree in a set of candidates. We report about an empirical evaluation between the two methods on a large corpus of parsed sentences and speculate on the role played by the representation and the loss function

    Comparing Convolution Kernels and Recursive Neural Networks for Learning Preferences on Structured Data

    No full text
    Convolution kernels and recursive neural networks (RNN) are both suitable approaches for supervised learning when the input portion of an instance is a discrete structure like a tree or a graph. We report about an empirical comparison between the two architectures in a large scale preference learning problem related to natural language processing, where instances are candidate incremental parse trees. We found that kernels never outperform RNNs, even when a limited number of examples is employed for learning. We argue that convolution kernels may lead to feature space representations that are too sparse and too general because not focused on the specific learning task. The adaptive encoding mechanism in RNNs in this case allows us to obtain better prediction accuracy at smaller computational cost

    Improving prediction of zinc binding sites by modeling the linkage between residues close in sequence

    No full text
    Abstract. We describe and empirically evaluate machine learning methods for the prediction of zinc binding sites from protein sequences. We start by observing that a data set consisting of single residues as examples is affected by autocorrelation and we propose an ad-hoc remedy in which sequentially close pairs of candidate residues are classified as being jointly involved in the coordination of a zinc ion. We develop a kernel for this particular type of data that can handle variable length gaps between candidate coordinating residues. Our empirical evaluation on a data set of non redundant protein chains shows that explicit modeling the correlation between residues close in sequence allows us to gain a significant improvement in the prediction performance.

    Decomposition Kernels for Natural Language Processing

    No full text
    We propose a simple solution to the sequence labeling problem based on an extension of weighted decomposition kernels. We additionally introduce a multiinstance kernel approach for representing lexical word sense information. These new ideas have been preliminarily tested on named entity recognition and PP attachment disambiguation. We finally suggest how these techniques could be potentially merged using a declarative formalism that may provide a basis for the integration of multiple sources of information when using kernel-based learning in NLP.
    corecore