Search CORE

21,959 research outputs found

Inducing Probabilistic Grammars by Bayesian Model Merging

Author: A. P. Dempster
A. Stolcke
C. M. Cook
D. Angluin
D. Ron
F. Jelinek
J. E. Hopcroft
J. G. Wolff
J. J. Horning
J. K. Baker
J. Oncina
J. R. Quinlan
L. E. Baum
L. R. Rabiner
P. F. Brown
S. F. Gull
S. M. Omohundro
T. C. Bell
T. L. Booth
Y. Sakakibara
Publication venue
Publication date: 01/01/1994
Field of study

We describe a framework for inducing probabilistic grammars from corpora of positive samples. First, samples are {\em incorporated} by adding ad-hoc rules to a working grammar; subsequently, elements of the model (such as states or nonterminals) are {\em merged} to achieve generalization and a more compact representation. The choice of what to merge and when to stop is governed by the Bayesian posterior probability of the grammar given the data, which formalizes a trade-off between a close fit to the data and a default preference for simpler models (`Occam's Razor'). The general scheme is illustrated using three types of probabilistic grammars: Hidden Markov models, class-based

n

-grams, and stochastic context-free grammars.Comment: To appear in Grammatical Inference and Applications, Second International Colloquium on Grammatical Inference; Springer Verlag, 1994. 13 page

arXiv.org e-Print Archive

CiteSeerX

Crossref

The Hush Cryptosystem

Author: Hussein Sari Haj
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2009
Field of study

In this paper we describe a new cryptosystem we call "The Hush Cryptosystem" for hiding encrypted data in innocent Arabic sentences. The main purpose of this cryptosystem is to fool observer-supporting software into thinking that the encrypted data is not encrypted at all. We employ a modified Word Substitution Method known as the Grammatical Substitution Method in our cryptosystem. We also make use of Hidden Markov Models. We test our cryptosystem using a computer program written in the Java Programming Language. Finally, we test the output of our cryptosystem using statistical tests.Comment: 7 pages. 5 figures. Appeared in the 2nd International Conference on Security of Information and Networks (SIN 2009), North Cyprus, Turkey; Proceedings of the 2nd International Conference on Security of Information and Networks (SIN 2009), North Cyprus, Turke

arXiv.org e-Print Archive

Chalmers Research

Chalmers Publication Library

HMM with auxiliary memory: a new tool for modeling RNA structures

Author: Vaidyanathan P. P.
Yoon Byung-Jun
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/11/2004
Field of study

For a long time, proteins have been believed to perform most of the important functions in all cells. However, recent results in genomics have revealed that many RNAs that do not encode proteins play crucial roles in the cell machinery. The so-called ncRNA genes that are transcribed into RNAs but not translated into proteins, frequently conserve their secondary structures more than they conserve their primary sequences. Therefore, in order to identify ncRNA genes, we have to take the secondary structure of RNAs into consideration. Traditional approaches that are mainly based on base-composition statistics cannot be used for modeling and identifying such structures and models with more descriptive power are required. In this paper, we introduce the concept of context-sensitive HMMs, which is capable of describing pairwise interactions between distant symbols. It is demonstrated that the proposed model can efficiently model various RNA secondary structures that are frequently observed

Caltech Authors

Grammars and cellular automata for evolving neural networks architectures

Author: Galván Inés M.
Isasi Pedro
Molina López José Manuel
Sanchis de Miguel María Araceli
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2000
Field of study

IEEE International Conference on Systems, Man, and Cybernetics. Nashville, TN, 8-11 October 2000The class of feedforward neural networks trained with back-propagation admits a large variety of specific architectures applicable to approximation pattern tasks. Unfortunately, the architecture design is still a human expert job. In recent years, the interest to develop automatic methods to determine the architecture of the feedforward neural network has increased, most of them based on the evolutionary computation paradigm. From this approach, some perspectives can be considered: at one extreme, every connection and node of architecture can be specified in the chromosome representation using binary bits. This kind of representation scheme is called the direct encoding scheme. In order to reduce the length of the genotype and the search space, and to make the problem more scalable, indirect encoding schemes have been introduced. An indirect scheme under a constructive algorithm, on the other hand, starts with a minimal architecture and new levels, neurons and connections are added, step by step, via some sets of rules. The rules and/or some initial conditions are codified into a chromosome of a genetic algorithm. In this work, two indirect constructive encoding schemes based on grammars and cellular automata, respectively, are proposed to find the optimal architecture of a feedforward neural network

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Universidad Carlos III de Madrid e-Archivo

Developing and applying heterogeneous phylogenetic models with XRate

Author: A Heger
A Siepel
A Varadarajan
AJ Drummond
B Knudsen
B Knudsen
Christos A. Ouzounis
D Ayres
DB Searls
E Birney
G Lunter
GSC Slater
Ian Holmes
IM Meyer
J Felsenstein
J Goecks
J Watts
JS Pedersen
L Stein
M Garber
M Hasegawa
M Kimura
M Zuker
ME Skinner
N Saitou
O Penn
Oscar Westesson
PS Klosterman
RK Bradley
SR Eddy
TH Jukes
WJ Kent
Z Yang
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 16/02/2012
Field of study

Modeling sequence evolution on phylogenetic trees is a useful technique in computational biology. Especially powerful are models which take account of the heterogeneous nature of sequence evolution according to the "grammar" of the encoded gene features. However, beyond a modest level of model complexity, manual coding of models becomes prohibitively labor-intensive. We demonstrate, via a set of case studies, the new built-in model-prototyping capabilities of XRate (macros and Scheme extensions). These features allow rapid implementation of phylogenetic models which would have previously been far more labor-intensive. XRate's new capabilities for lineage-specific models, ancestral sequence reconstruction, and improved annotation output are also discussed. XRate's flexible model-specification capabilities and computational efficiency make it well-suited to developing and prototyping phylogenetic grammar models. XRate is available as part of the DART software package: http://biowiki.org/DART .Comment: 34 pages, 3 figures, glossary of XRate model terminolog

arXiv.org e-Print Archive

Crossref

PubMed Central

FigShare

Viterbi Training for PCFGs: Hardness Results and Competitiveness of Uniform Initialization

Author: Cohen S. B.
Smith N. A.
Publication venue
Publication date: 01/01/2010
Field of study

We consider the search for a maximum likelihood assignment of hidden derivations and grammar weights for a probabilistic context-free grammar, the problem approximately solved by “Viterbi training.” We show that solving and even approximating Viterbi training for PCFGs is NP-hard. We motivate the use of uniformat-random initialization for Viterbi EM as an optimal initializer in absence of further information about the correct model parameters, providing an approximate bound on the log-likelihood.

CiteSeerX

Edinburgh Research Explorer

Attribute Multiset Grammars for Global Explanations of Activities

Author: Damen Dima
Hogg David
Publication venue: 'British Machine Vision Association and Society for Pattern Recognition'
Publication date: 01/01/2009
Field of study

Crossref

Explore Bristol Research