1,448 research outputs found

    Unambiguous 1-Uniform Morphisms

    Get PDF
    A morphism h is unambiguous with respect to a word w if there is no other morphism g that maps w to the same image as h. In the present paper we study the question of whether, for any given word, there exists an unambiguous 1-uniform morphism, i.e., a morphism that maps every letter in the word to an image of length 1.Comment: In Proceedings WORDS 2011, arXiv:1108.341

    Unsupervised learning of probabilistic grammars

    Get PDF
    Probabilistic grammars define joint probability distributions over sentences and their grammatical structures. They have been used in many areas, such as natural language processing, bioinformatics and pattern recognition, mainly for the purpose of deriving grammatical structures from data (sentences). Unsupervised approaches to learning probabilistic grammars induce a grammar from unannotated sentences, which eliminates the need for manual annotation of grammatical structures that can be laborious and error-prone. In this thesis we study unsupervised learning of probabilistic context-free grammars and probabilistic dependency grammars, both of which are expressive enough for many real-world languages but remain tractable in inference. We investigate three different approaches. The first approach is a structure search approach for learning probabilistic context-free grammars. It acquires rules of an unknown probabilistic context-free grammar through iterative coherent biclustering of the bigrams in the training corpus. A greedy procedure is used in our approach to add rules from biclusters such that each set of rules being added into the grammar results in the largest increase in the posterior of the grammar given the training corpus. Our experiments on several benchmark datasets show that this approach is competitive with existing methods for unsupervised learning of context-free grammars. The second approach is a parameter learning approach for learning natural language grammars based on the idea of unambiguity regularization. We make the observation that natural language is remarkably unambiguous in the sense that each natural language sentence has a large number of possible parses but only a few of the parses are syntactically valid. We incorporate this prior information into parameter learning by means of posterior regularization. The resulting algorithm family contains classic EM and Viterbi EM, as well as a novel softmax-EM algorithm that can be implemented with a simple and efficient extension to classic EM. Our experiments show that unambiguity regularization improves natural language grammar learning, and when combined with other techniques our approach achieves the state-of-the-art grammar learning results. The third approach is grammar learning with a curriculum. A curriculum is a means of presenting training samples in a meaningful order. We introduce the incremental construction hypothesis that explains the benefits of a curriculum in learning grammars and offers some useful insights into the design of curricula as well as learning algorithms. We present results of experiments with (a) carefully crafted synthetic data that provide support for our hypothesis and (b) natural language corpus that demonstrate the utility of curricula in unsupervised learning of real-world probabilistic grammars

    Tracking Down the Origins of Ambiguity in Context-Free Grammars

    Get PDF
    Context-free grammars are widely used but still hindered by ambiguity. This stresses the need for detailed detection methods that point out the sources of ambiguity in a grammar. In this paper we show how the approximative Noncanonical Unambiguity Test by Schmitz can be extended to conservatively identify production rules that do not contribute to the ambiguity of a grammar. We prove the correctness of our approach and consider its practical applicability

    Tracking Down the Origins of Ambiguity in Context-Free Grammars

    Get PDF
    Context-free grammars are widely used but still hindered by ambiguity. This stresses the need for detailed detection methods that point out the sources of ambiguity in a grammar. In this paper we show how the approximative Noncanonical Unambiguity Test by Schmitz can be extended to conservatively identify production rules that do not contribute to the ambiguity of a grammar. We prove the correctness of our approach and consider its practical applicability

    The Unambiguity of Aristotelian Being

    Get PDF
    In this paper, I shall try to enhance our understanding of Aristotle\u27s thought by relating it to certain contemporary problems and insights of philosophical logicians. One of the most central current issues in philosophical logic is a challenge to a hundred-year old dogma. Almost all twentieth-century philosophers in English-speaking countries have followed Frege and Russell and claimed that the words for being in natural languages — is, ist, ἔστι etc.— are ambiguous between the is of predication, the is of existence, the is of identity, and the generic is. The significance of this ambiguity thesis has not been limited to topical discussions but has extended to historical studies, including studies of ancient Greek philosophy. A generation or two of scholars working in this area used the Frege-Russell ambiguity thesis as an important ingredient of their interpretational framework. Many of us have by this time come to suspect that the Frege-Russell ambiguity claim is completely anachronistic when applied to Aristotle. The sources of this dark professional secret are various, ranging from G. E. L. Owen\u27s brilliant studies of Aristotle on being to Charles Kahn\u27s patient examination of the Greek verb τὸ εῖναι . Most of us good Aristotelians have nevertheless remained in the closet. As was illustrated by the fate that befell the first major study in which Plato\u27s failure to draw the Frege-Russell distinction was noted, most of the unliberated Aristotelians seem to have thought that to note Aristotle\u27s failure to draw the distinction is to accuse him of an abject logical mistake. Accordingly, we have shied away from such impiety. It is time for some consciousness-raising, however. It is not convincing enough merely to register the inapplicability of the modern distinction to Aristotle. We need a deeper understanding of the whole situation. In an earlier paper, I have shown that there need not be anything logically or semantically wrong with a theory which treats the verbs of being as not exhibiting the Frege- Russell ambiguity. (See Hintikka 1979.) More than that: not only can we now say that Aristotle’s procedure is free from any taint of fallacy; he may have been a better semanticist of natural language than Frege and Russell in this particular respect

    Efficient asymmetric inclusion of regular expressions with interleaving and counting for XML type-checking

    Get PDF
    The inclusion of Regular Expressions (REs) is the kernel of any type-checking algorithm for XML manipulation languages. XML applications would benefit from the extension of REs with interleaving and counting, but this is not feasible in general, since inclusion is EXPSPACE-complete for such extended REs. In Colazzo et al. (2009) [1] we introduced a notion of ?conflict-free REs?, which are extended REs with excellent complexity behaviour, including a polynomial inclusion algorithm [1] and linear membership (Ghelli et al., 2008 [2]). Conflict-free REs have interleaving and counting, but the complexity is tamed by the ?conflict-free? limitations, which have been found to be satisfied by the vast majority of the content models published on the Web.However, a type-checking algorithm needs to compare machine-generated subtypes against human-defined supertypes. The conflict-free restriction, while quite harmless for the human-defined supertype, is far too restrictive for the subtype. We show here that the PTIME inclusion algorithm can be actually extended to deal with totally unrestricted REs with counting and interleaving in the subtype position, provided that the supertype is conflict-free.This is exactly the expressive power that we need in order to use subtyping inside type-checking algorithms, and the cost of this generalized algorithm is only quadratic, which is as good as the best algorithm we have for the symmetric case (see [1]). The result is extremely surprising, since we had previously found that symmetric inclusion becomes NP-hard as soon as the candidate subtype is enriched with binary intersection, a generalization that looked much more innocent than what we achieve here

    Unambiguous morphic images of strings

    Get PDF
    We study a fundamental combinatorial problem on morphisms in free semigroups: With regard to any string α over some alphabet we ask for the existence of a morphism σ such that σ(α) is unambiguous, i.e. there is no morphism T with T(i) ≠ σ(i) for some symbol i in α and, nevertheless, T(α) = σ(α). As a consequence of its elementary nature, this question shows a variety of connections to those topics in discrete mathematics which are based on finite strings and morphisms such as pattern languages, equality sets and, thus, the Post Correspondence Problem. Our studies demonstrate that the existence of unambiguous morphic images essen- tially depends on the structure of α: We introduce a partition of the set of all finite strings into those that are decomposable (referred to as prolix) in a particular manner and those that are indecomposable (called succinct). This partition, that is also known to be of major importance for the research on pattern languages and on finite fixed points of morphisms, allows to formulate our main result according to which a string α can be mapped by an injective morphism onto an unambiguous image if and only if α is succinct

    Tracking Down the Origins of Ambiguity in Context-Free Grammars

    Get PDF
    Context-free grammars are widely used but still hindered by ambiguity. This stresses the need for detailed detection methods that point out the sources of ambiguity in a grammar. In this paper we show how the approximative Noncanonical Unambiguity Test by Schmitz can be extended to conservatively identify production rules that do not contribute to the ambiguity of a grammar. Furthermore we can identify tree patterns that will never occur in derivations of ambiguous strings. We prove the correctness of our approach and consider its practical applicability
    corecore