4,472 research outputs found
Learning OT constraint rankings using a maximum entropy model
Abstract. A weakness of standard Optimality Theory is its inability to account for grammar
Effects of Temperature and Crowding on the Pathogenicity of Edwardsiella ictaluri in Channel Catfish (Ictalurus punctatus)
Channel catfish were injected with Edwardsiella ictaluri and stocked at increasing temperatures and densities. Bacteriological examination of kidney, liver and spleen revealed the greatest numbers of organisms in fish from the highest temperature and stocking density tested. Survival time was the shortest for fish held at the highest temperature and stocking density. Increased temperature and crowding were directly proportional to the number of organisms recovered from the organs and inversely proportional to fish survival time
Edge-Based Best-First Chart Parsing
Best-first probabilistic chart parsing attempts to parse efficiently by working on edges that are judged 'best' by some probabilistic figure of merit (FOM). Recent work has used proba- bilistic context-free grammars (PCFGs) to sign probabilities to constituents, and to use these probabilities as the starting point for the FOM. This paper extends this approach to us- ing a probabilistic FOM to judge edges (incomplete constituents), thereby giving a much finergrained control over parsing effort. We show how this can be accomplished in a particularly simple way using the common idea of binarizing the PCFG. The results obtained are about a factor of twenty improvement over the best prior results -- that is, our parser achieves equivalent results using one twentieth the number of edges. Furthermore we show that this improvement is obtained with parsing precision and recall levels superior to those achieved by exhaustive parsing
Modeling Graph Languages with Grammars Extracted via Tree Decompositions
Work on probabilistic models of natural language tends to focus on strings and trees, but there is increasing interest in more general graph-shaped structures since they seem to be better suited for representing natural language semantics, ontologies, or other varieties of knowledge structures. However, while there are relatively simple approaches to defining generative models over strings and trees, it has proven more challenging for more general graphs. This paper describes a natural generalization of the n-gram to graphs, making use of Hyperedge Replacement Grammars to define generative models of graph languages.9 page(s
Producing power-law distributions and damping word frequencies with two-stage language models
Standard statistical models of language fail to capture one of the most striking properties of natural languages: the power-law distribution in the frequencies of word tokens. We present a framework for developing statisticalmodels that can generically produce power laws, breaking generativemodels into two stages. The first stage, the generator, can be any standard probabilistic model, while the second stage, the adaptor, transforms the word frequencies of this model to provide a closer match to natural language. We show that two commonly used Bayesian models, the Dirichlet-multinomial model and the Dirichlet process, can be viewed as special cases of our framework. We discuss two stochastic processes-the Chinese restaurant process and its two-parameter generalization based on the Pitman-Yor process-that can be used as adaptors in our framework to produce power-law distributions over word frequencies. We show that these adaptors justify common estimation procedures based on logarithmic or inverse-power transformations of empirical frequencies. In addition, taking the Pitman-Yor Chinese restaurant process as an adaptor justifies the appearance of type frequencies in formal analyses of natural language and improves the performance of a model for unsupervised learning of morphology.48 page(s
A Note on the Implementation of Hierarchical Dirichlet Processes
The implementation of collapsed Gibbs samplers for non-parametric Bayesian models is non-trivial, requiring considerable book-keeping. Goldwater et al. (2006a) presented an approximation which significantly reduces the storage and computation overhead, but we show here that their formulation was incorrect and, even after correction, is grossly inaccurate. We present an alternative formulation which is exact and can be computed easily. However this approach does not work for hierarchical models, for which case we present an efficient data structure which has a better space complexity than the naive approach.4 page(s
- …