11,195 research outputs found

    Complexity of grammars by group theoretic methods

    Get PDF
    AbstractLet G be a context free (phrase) structure grammar generating the context free language L. The set P = P G of all “generation histories” of words in L can be coded as words in some augmented alphabet. It is proved here that P = R∩G where R is a regular (finite automaton definable) set and G is a “free group kernel” or Dyck set, a result first proved by Chomsky and Schützenberger [3].We can construct the Lower central series of the free group kernel G1∼G2∼ … ∼Gn∼ …, so ∩Gn= G. Let Pn= R∩Gn, so ∩Pn=P.Pn is the n-th order approximation of P.Pn need not be a context free language but it can be computed by n cascade or sequential banks of counters (integers). We give two equivalent characterizations of Pn, one “grammatical” and one “statistical”, which follow from the theorems of Magnus, Witt, M. Hall, etc. for free groups. The main new theoretical tool used here for the study of grammars is the Magnus transform on the free group, a→1+a, a−1→1−a+a2− a3+a4…, which acts like a non-commutative Fourier transform

    An example-based approach to translating sign language

    Get PDF
    Users of sign languages are often forced to use a language in which they have reduced competence simply because documentation in their preferred format is not available. While some research exists on translating between natural and sign languages, we present here what we believe to be the first attempt to tackle this problem using an example-based (EBMT) approach. Having obtained a set of English–Dutch Sign Language examples, we employ an approach to EBMT using the ‘Marker Hypothesis’ (Green, 1979), analogous to the successful system of (Way & Gough, 2003), (Gough & Way, 2004a) and (Gough & Way, 2004b). In a set of experiments, we show that encouragingly good translation quality may be obtained using such an approach

    Algorithmic and Statistical Perspectives on Large-Scale Data Analysis

    Full text link
    In recent years, ideas from statistics and scientific computing have begun to interact in increasingly sophisticated and fruitful ways with ideas from computer science and the theory of algorithms to aid in the development of improved worst-case algorithms that are useful for large-scale scientific and Internet data analysis problems. In this chapter, I will describe two recent examples---one having to do with selecting good columns or features from a (DNA Single Nucleotide Polymorphism) data matrix, and the other having to do with selecting good clusters or communities from a data graph (representing a social or information network)---that drew on ideas from both areas and that may serve as a model for exploiting complementary algorithmic and statistical perspectives in order to solve applied large-scale data analysis problems.Comment: 33 pages. To appear in Uwe Naumann and Olaf Schenk, editors, "Combinatorial Scientific Computing," Chapman and Hall/CRC Press, 201

    A memory-based classification approach to marker-based EBMT

    Get PDF
    We describe a novel approach to example-based machine translation that makes use of marker-based chunks, in which the decoder is a memory-based classifier. The classifier is trained to map trigrams of source-language chunks onto trigrams of target-language chunks; then, in a second decoding step, the predicted trigrams are rearranged according to their overlap. We present the first results of this method on a Dutch-to-English translation system using Europarl data. Sparseness of the class space causes the results to lag behind a baseline phrase-based SMT system. In a further comparison, we also apply the method to a word-aligned version of the same data, and report a smaller difference with a word-based SMT system. We explore the scaling abilities of the memory-based approach, and observe linear scaling behavior in training and classification speed and memory costs, and loglinear BLEU improvements in the amount of training examples

    Learning Semantic Correspondences in Technical Documentation

    Full text link
    We consider the problem of translating high-level textual descriptions to formal representations in technical documentation as part of an effort to model the meaning of such documentation. We focus specifically on the problem of learning translational correspondences between text descriptions and grounded representations in the target documentation, such as formal representation of functions or code templates. Our approach exploits the parallel nature of such documentation, or the tight coupling between high-level text and the low-level representations we aim to learn. Data is collected by mining technical documents for such parallel text-representation pairs, which we use to train a simple semantic parsing model. We report new baseline results on sixteen novel datasets, including the standard library documentation for nine popular programming languages across seven natural languages, and a small collection of Unix utility manuals.Comment: accepted to ACL-201

    Grammatical relations, agreement, and genetic stability

    Get PDF
    Languages vary in whether or not primary grammatical relations (PGRs) are sensitive to information from clause-level case or phrase structures. This variation correlates with a difference between verb agreement systems based on feature unification and systems based on feature composition. The choice between different PGR and agreement principles is found to be highly stable genetically and to characterize Indo-European as systematically different from Sino-Tibetan. Although the choice is partially similar to the Configurationality Parameter, it is shown that Indo-European languages of South Asia are nonconfigurational due to areal pressure but follow their European relatives in PGR and agreement principles

    <i>‘Je sais et tout mais...’</i> might the general extenders in European French be changing?

    Get PDF
    This paper addresses contemporary trends in the use of general extenders in two recent corpora of spontaneous French stratified by age. In these corpora, certain variants (e.g. et tout) are highly prevalent in the speech of young people compared to older speakers, while others are not. Other studies have shown that general extenders’ form as well as frequency tends to vary with respect to speakers’ age, while some extenders may also undergo grammaticalisation. The present study includes a comparison with a late 20th-century corpus of spoken French, and finds that not only age grading but also generational change might be occurring. This conclusion is supported by qualitative and quantitative analysis of the contemporary data, showing that the forms most frequent among young people appear to have acquired new pragmatic functions
    corecore