Search CORE

741 research outputs found

Probabilistic grammatical model of protein language and its application to helix-helix contact site classification

Author: Dyrka Witold
Jean‐Christophe Nebel
Malgorzata Kotulska
Witold WitoldDyrka
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2013
Field of study

BACKGROUND: Hidden Markov Models power many state‐of‐the‐art tools in the field of protein bioinformatics. While excelling in their tasks, these methods of protein analysis do not convey directly information on medium‐ and long‐range residue‐residue interactions. This requires an expressive power of at least context‐free grammars. However, application of more powerful grammar formalisms to protein analysis has been surprisingly limited. RESULTS: In this work, we present a probabilistic grammatical framework for problem‐specific protein languages and apply it to classification of transmembrane helix‐helix pairs configurations. The core of the model consists of a probabilistic context‐free grammar, automatically inferred by a genetic algorithm from only a generic set of expert‐based rules and positive training samples. The model was applied to produce sequence based descriptors of four classes of transmembrane helix‐helix contact site configurations. The highest performance of the classifiers reached AUCROC of 0.70. The analysis of grammar parse trees revealed the ability of representing structural features of helix‐helix contact sites. CONCLUSIONS: We demonstrated that our probabilistic context‐free framework for analysis of protein sequences outperforms the state of the art in the task of helix‐helix contact site classification. However, this is achieved without necessarily requiring modeling long range dependencies between interacting residues. A significant feature of our approach is that grammar rules and parse trees are human‐readable. Thus they could provide biologically meaningful information for molecular biologists

Springer - Publisher Connector

INRIA a CCSD electronic archive server

Calibrating Generative Models: The Probabilistic Chomsky-Schützenberger Hierarchy

Author: Icard Thomas
Publication venue
Publication date: 01/01/2020
Field of study

A probabilistic Chomsky–Schützenberger hierarchy of grammars is introduced and studied, with the aim of understanding the expressive power of generative models. We offer characterizations of the distributions definable at each level of the hierarchy, including probabilistic regular, context-free, (linear) indexed, context-sensitive, and unrestricted grammars, each corresponding to familiar probabilistic machine classes. Special attention is given to distributions on (unary notations for) positive integers. Unlike in the classical case where the "semi-linear" languages all collapse into the regular languages, using analytic tools adapted from the classical setting we show there is no collapse in the probabilistic hierarchy: more distributions become definable at each level. We also address related issues such as closure under probabilistic conditioning

PhilPapers

On the learning of vague languages for syntactic pattern recognition

Author: Flasiński Mariusz
Jurek Janusz
Peszek Tomasz
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2022
Field of study

The method of the learning of vague languages which represent distorted/ambiguous patterns is proposed in the paper. The goal of the method is to infer the quasi-context-sensitive string grammar which is used in our model as the generator of patterns. The method is an important component of the multi-derivational model of the parsing of vague languages used for syntactic pattern recognition

Jagiellonian Univeristy Repository

Extracting Context-Free Grammars from Recurrent

Author: Barbot B.
Bollig B.
Finkel A.
Haddad S.
Khmelnitsky I.
Leucker M.
Neider D.
Roy R.
Ye L.
Publication venue
Publication date: 01/01/2021
Field of study

Learning Interactions of Local and Non-Local Phonotactic Constraints from Positive Input

Author: Aksënova Alëna
De Santo Aniello
Publication venue: ScholarWorks@UMass Amherst
Publication date: 01/01/2021
Field of study

This paper proposes a grammatical inference algorithm to learn input-sensitive tier-based strictly local languages across multiple tiers from positive data only, when the locality of the tier-constraints and the tier-projection function is set to 2 (MITSL; De Santo and Graf, 2019). We conduct simulations showing that the algorithm succeeds in learning MITSL patterns over a set of artificial languages

Feasible Learnability of Formal Grammars and the Theory of Natural Language Acquisition

Author: Abe Naoki
Publication venue: ScholarlyCommons
Publication date: 01/01/1988
Field of study

We propose to apply a complexity theoretic notion of feasible learnability called polynomial learnability to the evaluation of grammatical formalisms for linguistic description. Polynomial learnability was originally defined by Valiant in the context of boolean concept learning and subsequently generalized by Blumer et al. to infinitary domains. We give a clear, intuitive exposition of this notion of learnability and what characteristics of a collection of languages may or may not help feasible learnability under this paradigm. In particular, we present a novel, nontrivial constraint on the degree of locality of grammars which allows a rich class of mildly context sensitive languages to be feasibly learnable. We discuss possible implications of this observation to the theory of natural language acquisition

Segmentation of Document Using Discriminative Context-free Grammar Inference and Alignment Similarities

Author: Ramesh Thakur
Publication venue: 'Auricle Technologies, Pvt., Ltd.'
Publication date: 30/04/2015
Field of study

Text Documents present a great challenge to the field of document recognition. Automatic segmentation and layout analysis of documents is used for interpretation and machine translation of documents. Document such as research papers, address book, news etc. is available in the form of un-structured format. Extracting relevant Knowledge from this document has been recognized as promising task. Extracting interesting rules form it is complex and tedious process. Conditional random fields (CRFs) utilizing contextual information, hand-coded wrappers to label the text (such as Name, Phone number and Address etc). In this paper we propose a novel approach to infer grammar rules using alignment similarity and discriminative context-free grammar. It helps in extracting desired information from the document. DOI: 10.17762/ijritcc2321-8169.160410

International Journal on Recent and Innovation Trends in Computing and Communication

Machine Learning Theory and Practice as a Source of Insight into Universal Grammar

Author: Lappin Shalom
Shieber S
Publication venue
Publication date: 01/01/2007
Field of study

Articl

SAS-SPACE