21,959 research outputs found

    Inducing Probabilistic Grammars by Bayesian Model Merging

    Full text link
    We describe a framework for inducing probabilistic grammars from corpora of positive samples. First, samples are {\em incorporated} by adding ad-hoc rules to a working grammar; subsequently, elements of the model (such as states or nonterminals) are {\em merged} to achieve generalization and a more compact representation. The choice of what to merge and when to stop is governed by the Bayesian posterior probability of the grammar given the data, which formalizes a trade-off between a close fit to the data and a default preference for simpler models (`Occam's Razor'). The general scheme is illustrated using three types of probabilistic grammars: Hidden Markov models, class-based nn-grams, and stochastic context-free grammars.Comment: To appear in Grammatical Inference and Applications, Second International Colloquium on Grammatical Inference; Springer Verlag, 1994. 13 page

    The Hush Cryptosystem

    Full text link
    In this paper we describe a new cryptosystem we call "The Hush Cryptosystem" for hiding encrypted data in innocent Arabic sentences. The main purpose of this cryptosystem is to fool observer-supporting software into thinking that the encrypted data is not encrypted at all. We employ a modified Word Substitution Method known as the Grammatical Substitution Method in our cryptosystem. We also make use of Hidden Markov Models. We test our cryptosystem using a computer program written in the Java Programming Language. Finally, we test the output of our cryptosystem using statistical tests.Comment: 7 pages. 5 figures. Appeared in the 2nd International Conference on Security of Information and Networks (SIN 2009), North Cyprus, Turkey; Proceedings of the 2nd International Conference on Security of Information and Networks (SIN 2009), North Cyprus, Turke

    HMM with auxiliary memory: a new tool for modeling RNA structures

    Get PDF
    For a long time, proteins have been believed to perform most of the important functions in all cells. However, recent results in genomics have revealed that many RNAs that do not encode proteins play crucial roles in the cell machinery. The so-called ncRNA genes that are transcribed into RNAs but not translated into proteins, frequently conserve their secondary structures more than they conserve their primary sequences. Therefore, in order to identify ncRNA genes, we have to take the secondary structure of RNAs into consideration. Traditional approaches that are mainly based on base-composition statistics cannot be used for modeling and identifying such structures and models with more descriptive power are required. In this paper, we introduce the concept of context-sensitive HMMs, which is capable of describing pairwise interactions between distant symbols. It is demonstrated that the proposed model can efficiently model various RNA secondary structures that are frequently observed

    Grammars and cellular automata for evolving neural networks architectures

    Get PDF
    IEEE International Conference on Systems, Man, and Cybernetics. Nashville, TN, 8-11 October 2000The class of feedforward neural networks trained with back-propagation admits a large variety of specific architectures applicable to approximation pattern tasks. Unfortunately, the architecture design is still a human expert job. In recent years, the interest to develop automatic methods to determine the architecture of the feedforward neural network has increased, most of them based on the evolutionary computation paradigm. From this approach, some perspectives can be considered: at one extreme, every connection and node of architecture can be specified in the chromosome representation using binary bits. This kind of representation scheme is called the direct encoding scheme. In order to reduce the length of the genotype and the search space, and to make the problem more scalable, indirect encoding schemes have been introduced. An indirect scheme under a constructive algorithm, on the other hand, starts with a minimal architecture and new levels, neurons and connections are added, step by step, via some sets of rules. The rules and/or some initial conditions are codified into a chromosome of a genetic algorithm. In this work, two indirect constructive encoding schemes based on grammars and cellular automata, respectively, are proposed to find the optimal architecture of a feedforward neural network

    Developing and applying heterogeneous phylogenetic models with XRate

    Get PDF
    Modeling sequence evolution on phylogenetic trees is a useful technique in computational biology. Especially powerful are models which take account of the heterogeneous nature of sequence evolution according to the "grammar" of the encoded gene features. However, beyond a modest level of model complexity, manual coding of models becomes prohibitively labor-intensive. We demonstrate, via a set of case studies, the new built-in model-prototyping capabilities of XRate (macros and Scheme extensions). These features allow rapid implementation of phylogenetic models which would have previously been far more labor-intensive. XRate's new capabilities for lineage-specific models, ancestral sequence reconstruction, and improved annotation output are also discussed. XRate's flexible model-specification capabilities and computational efficiency make it well-suited to developing and prototyping phylogenetic grammar models. XRate is available as part of the DART software package: http://biowiki.org/DART .Comment: 34 pages, 3 figures, glossary of XRate model terminolog

    Viterbi Training for PCFGs: Hardness Results and Competitiveness of Uniform Initialization

    Get PDF
    We consider the search for a maximum likelihood assignment of hidden derivations and grammar weights for a probabilistic context-free grammar, the problem approximately solved by “Viterbi training.” We show that solving and even approximating Viterbi training for PCFGs is NP-hard. We motivate the use of uniformat-random initialization for Viterbi EM as an optimal initializer in absence of further information about the correct model parameters, providing an approximate bound on the log-likelihood.

    Attribute Multiset Grammars for Global Explanations of Activities

    Get PDF
    • 

    corecore