Search CORE

1,065 research outputs found

Unsupervised learning of probabilistic grammars

Author: Tu Kewei
Publication venue: Iowa State University Digital Repository
Publication date: 01/01/2012
Field of study

Probabilistic grammars define joint probability distributions over sentences and their grammatical structures. They have been used in many areas, such as natural language processing, bioinformatics and pattern recognition, mainly for the purpose of deriving grammatical structures from data (sentences). Unsupervised approaches to learning probabilistic grammars induce a grammar from unannotated sentences, which eliminates the need for manual annotation of grammatical structures that can be laborious and error-prone. In this thesis we study unsupervised learning of probabilistic context-free grammars and probabilistic dependency grammars, both of which are expressive enough for many real-world languages but remain tractable in inference. We investigate three different approaches. The first approach is a structure search approach for learning probabilistic context-free grammars. It acquires rules of an unknown probabilistic context-free grammar through iterative coherent biclustering of the bigrams in the training corpus. A greedy procedure is used in our approach to add rules from biclusters such that each set of rules being added into the grammar results in the largest increase in the posterior of the grammar given the training corpus. Our experiments on several benchmark datasets show that this approach is competitive with existing methods for unsupervised learning of context-free grammars. The second approach is a parameter learning approach for learning natural language grammars based on the idea of unambiguity regularization. We make the observation that natural language is remarkably unambiguous in the sense that each natural language sentence has a large number of possible parses but only a few of the parses are syntactically valid. We incorporate this prior information into parameter learning by means of posterior regularization. The resulting algorithm family contains classic EM and Viterbi EM, as well as a novel softmax-EM algorithm that can be implemented with a simple and efficient extension to classic EM. Our experiments show that unambiguity regularization improves natural language grammar learning, and when combined with other techniques our approach achieves the state-of-the-art grammar learning results. The third approach is grammar learning with a curriculum. A curriculum is a means of presenting training samples in a meaningful order. We introduce the incremental construction hypothesis that explains the benefits of a curriculum in learning grammars and offers some useful insights into the design of curricula as well as learning algorithms. We present results of experiments with (a) carefully crafted synthetic data that provide support for our hypothesis and (b) natural language corpus that demonstrate the utility of curricula in unsupervised learning of real-world probabilistic grammars

Digital Repository @ Iowa State University (ISU)

Recommended from our members

Discovering Models of Software Processes from Event-Based Data ; CU-CS-819-96

Author: Cook Jonathan E
Wolf Alexander L
Publication venue: CU Scholar
Publication date: 01/01/1996
Field of study

Many software process methods and tools presuppose the existence of a formal model of a process. Unfortunately, developing a formal model for an on-going, complex process can be dicult, costly, and error prone. This presents a practical barrier to the adoption of process technologies, which would be lowered by automated assistance in creating formal models. To this end, we have developed a data analysis technique that we term process discovery. Under this technique, data describing process events are rst captured from an on-going process and then used to generate a formal model of the behavior of that process. In this paper we describe a Markov method that we developed specically for process discovery, as well as describe two additional methods that we adopted from other domains and augmented for our purposes. The three methods range from the purely algorithmic to the purely statistical. We compare the methods and discuss their application in an industrial case study

CU Scholar Institutional Repository

CiteSeerX

Learning Grammars for Architecture-Specific Facade Parsing

Author: Gadde Raghudeep
Marlet Renaud
Paragios Nikos
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 02/10/2015
Field of study

International audienceParsing facade images requires optimal handcrafted grammar for a given class of buildings. Such a handcrafted grammar is often designed manually by experts. In this paper, we present a novel framework to learn a compact grammar from a set of ground-truth images. To this end, parse trees of ground-truth annotated images are obtained running existing inference algorithms with a simple, very general grammar. From these parse trees, repeated subtrees are sought and merged together to share derivations and produce a grammar with fewer rules. Furthermore, unsupervised clustering is performed on these rules, so that, rules corresponding to the same complex pattern are grouped together leading to a rich compact grammar. Experimental validation and comparison with the state-of-the-art grammar-based methods on four diff erent datasets show that the learned grammar helps in much faster convergence while producing equal or more accurate parsing results compared to handcrafted grammars as well as grammars learned by other methods. Besides, we release a new dataset of facade images from Paris following the Art-deco style and demonstrate the general applicability and extreme potential of the proposed framework

HAL-CentraleSupelec

Crossref

INRIA a CCSD electronic archive server

HAL-Ecole des Ponts ParisTech

HAL - UPEC / UPEM

HAL-Rennes 1

Learning probabilistic models for model checking: an evolutionary approach and an empirical study

Author: PANG Jun
SUN Jun
WANG Jingyi
YUAN Qixia
Publication venue: Springer Verlag
Publication date: 01/01/2018
Field of study

Crossref

Institutional Knowledge at Singapore Management University

Open Repository and Bibliography - Luxembourg

Learning probabilistic models for model checking: an evolutionary approach and an empirical study

Author: PANG Jun
SUN Jun
WANG Jingyi
YUAN Qixia
Publication venue: Springer Verlag
Publication date: 01/11/2018
Field of study

Institutional Knowledge at Singapore Management University

A Syntactic Neural Model for General-Purpose Code Generation

Author: Neubig Graham
Yin Pengcheng
Publication venue
Publication date: 01/01/2017
Field of study

We consider the problem of parsing natural language descriptions into source code written in a general-purpose programming language like Python. Existing data-driven methods treat this problem as a language generation task without considering the underlying syntax of the target programming language. Informed by previous work in semantic parsing, in this paper we propose a novel neural architecture powered by a grammar model to explicitly capture the target syntax as prior knowledge. Experiments find this an effective way to scale up to generation of complex programs from natural language descriptions, achieving state-of-the-art results that well outperform previous code generation and semantic parsing approaches.Comment: To appear in ACL 201

arXiv.org e-Print Archive

Crossref

Semantification of text through summarisation

Author: Joshi Monika
Publication venue
Publication date: 01/03/2019
Field of study

Ulster University's Research Portal