5,529 research outputs found

    Frequency Value Grammar and Information Theory

    Get PDF
    I previously laid the groundwork for Frequency Value Grammar (FVG) in papers I submitted in the proceedings of the 4th International Conference on Cognitive Science (2003), Sydney Australia, and Corpus Linguistics Conference (2003), Lancaster, UK. FVG is a formal syntax theoretically based in large part on Information Theory principles. FVG relies on dynamic physical principles external to the corpus which shape and mould the corpus whereas generative grammar and other formal syntactic theories are based exclusively on patterns (fractals) found occurring within the well-formed portion of the corpus. However, FVG should not be confused with Probability Syntax, (PS), as described by Manning (2003). PS is a corpus based approach that will yield the probability distribution of possible syntax constructions over a fixed corpus. PS makes no distinction between well and ill formed sentence constructions and assumes everything found in the corpus is well formed. In contrast, FVG’s primary objective is to distinguish between well and ill formed sentence constructions and, in so doing, relies on corpus based parameters which determine sentence competency. In PS, a syntax of high probability will not necessarily yield a well formed sentence. However, in FVG, a syntax or sentence construction of high ‘frequency value’ will yield a well-formed sentence, at least, 95% of the time satisfying most empirical standards. Moreover, in FVG, a sentence construction of ‘high frequency value’ could very well be represented by an underlying syntactic construction of low probability as determined by PS. The characteristic ‘frequency values’ calculated in FVG are not measures of probability but rather are fundamentally determined values derived from exogenous principles which impact and determine corpus based parameters serving as an index of sentence competency. The theoretical framework of FVG has broad applications beyond that of formal syntax and NLP. In this paper, I will demonstrate how FVG can be used as a model for improving the upper bound calculation of entropy of written English. Generally speaking, when a function word precedes an open class word, the backward n-gram analysis will be homomorphic with the information source and will result in frequency values more representative of co-occurrences in the information source

    A Machine learning approach to POS tagging

    Get PDF
    We have applied inductive learning of statistical decision trees and relaxation labelling to the Natural Language Processing (NLP) task of morphosyntactic disambiguation (Part Of Speech Tagging). The learning process is supervised and obtains a language model oriented to resolve POS ambiguities. This model consists of a set of statistical decision trees expressing distribution of tags and words in some relevant contexts. The acquired language models are complete enough to be directly used as sets of POS disambiguation rules, and include more complex contextual information than simple collections of n-grams usually used in statistical taggers. We have implemented a quite simple and fast tagger that has been tested and evaluated on the Wall Street Journal (WSJ) corpus with a remarkable accuracy. However, better results can be obtained by translating the trees into rules to feed a flexible relaxation labelling based tagger. In this direction we describe a tagger which is able to use information of any kind (n-grams, automatically acquired constraints, linguistically motivated manually written constraints, etc.), and in particular to incorporate the machine learned decision trees. Simultaneously, we address the problem of tagging when only small training material is available, which is crucial in any process of constructing, from scratch, an annotated corpus. We show that quite high accuracy can be achieved with our system in this situation.Postprint (published version

    Improving Semantic Composition with Offset Inference

    Get PDF
    Count-based distributional semantic models suffer from sparsity due to unobserved but plausible co-occurrences in any text collection. This problem is amplified for models like Anchored Packed Trees (APTs), that take the grammatical type of a co-occurrence into account. We therefore introduce a novel form of distributional inference that exploits the rich type structure in APTs and infers missing data by the same mechanism that is used for semantic composition.Comment: to appear at ACL 2017 (short papers

    A Hybrid Environment for Syntax-Semantic Tagging

    Full text link
    The thesis describes the application of the relaxation labelling algorithm to NLP disambiguation. Language is modelled through context constraint inspired on Constraint Grammars. The constraints enable the use of a real value statind "compatibility". The technique is applied to POS tagging, Shallow Parsing and Word Sense Disambigation. Experiments and results are reported. The proposed approach enables the use of multi-feature constraint models, the simultaneous resolution of several NL disambiguation tasks, and the collaboration of linguistic and statistical models.Comment: PhD Thesis. 120 page

    Statistical Models for Co-occurrence Data

    Get PDF
    Modeling and predicting co-occurrences of events is a fundamental problem of unsupervised learning. In this contribution we develop a statistical framework for analyzing co-occurrence data in a general setting where elementary observations are joint occurrences of pairs of abstract objects from two finite sets. The main challenge for statistical models in this context is to overcome the inherent data sparseness and to estimate the probabilities for pairs which were rarely observed or even unobserved in a given sample set. Moreover, it is often of considerable interest to extract grouping structure or to find a hierarchical data organization. A novel family of mixture models is proposed which explain the observed data by a finite number of shared aspects or clusters. This provides a common framework for statistical inference and structure discovery and also includes several recently proposed models as special cases. Adopting the maximum likelihood principle, EM algorithms are derived to fit the model parameters. We develop improved versions of EM which largely avoid overfitting problems and overcome the inherent locality of EM--based optimization. Among the broad variety of possible applications, e.g., in information retrieval, natural language processing, data mining, and computer vision, we have chosen document retrieval, the statistical analysis of noun/adjective co-occurrence and the unsupervised segmentation of textured images to test and evaluate the proposed algorithms

    The Mechanism of Additive Composition

    Get PDF
    Additive composition (Foltz et al, 1998; Landauer and Dumais, 1997; Mitchell and Lapata, 2010) is a widely used method for computing meanings of phrases, which takes the average of vector representations of the constituent words. In this article, we prove an upper bound for the bias of additive composition, which is the first theoretical analysis on compositional frameworks from a machine learning point of view. The bound is written in terms of collocation strength; we prove that the more exclusively two successive words tend to occur together, the more accurate one can guarantee their additive composition as an approximation to the natural phrase vector. Our proof relies on properties of natural language data that are empirically verified, and can be theoretically derived from an assumption that the data is generated from a Hierarchical Pitman-Yor Process. The theory endorses additive composition as a reasonable operation for calculating meanings of phrases, and suggests ways to improve additive compositionality, including: transforming entries of distributional word vectors by a function that meets a specific condition, constructing a novel type of vector representations to make additive composition sensitive to word order, and utilizing singular value decomposition to train word vectors.Comment: More explanations on theory and additional experiments added. Accepted by Machine Learning Journa
    • …
    corecore