54 research outputs found

    Unsupervised grammar induction with Combinatory Categorial Grammars

    Get PDF
    Language is a highly structured medium for communication. An idea starts in the speaker's mind (semantics) and is transformed into a well formed, intelligible, sentence via the specific syntactic rules of a language. We aim to discover the fingerprints of this process in the choice and location of words used in the final utterance. What is unclear is how much of this latent process can be discovered from the linguistic signal alone and how much requires shared non-linguistic context, knowledge, or cues. Unsupervised grammar induction is the task of analyzing strings in a language to discover the latent syntactic structure of the language without access to labeled training data. Successes in unsupervised grammar induction shed light on the amount of syntactic structure that is discoverable from raw or part-of-speech tagged text. In this thesis, we present a state-of-the-art grammar induction system based on Combinatory Categorial Grammars. Our choice of syntactic formalism enables the first labeled evaluation of an unsupervised system. This allows us to perform an in-depth analysis of the system’s linguistic strengths and weaknesses. In order to completely eliminate reliance on any supervised systems, we also examine how performance is affected when we use induced word clusters instead of gold-standard POS tags. Finally, we perform a semantic evaluation of induced grammars, providing unique insights into future directions for unsupervised grammar induction systems

    Cross-lingual Semantic Parsing with Categorial Grammars

    Get PDF
    Humans communicate using natural language. We need to make sure that computers can understand us so that they can act on our spoken commands or independently gain new insights from knowledge that is written down as text. A “semantic parser” is a program that translates natural-language sentences into computer commands or logical formulas–something a computer can work with. Despite much recent progress on semantic parsing, most research focuses on English, and semantic parsers for other languages cannot keep up with the developments. My thesis aims to help close this gap. It investigates “cross-lingual learning” methods by which a computer can automatically adapt a semantic parser to another language, such as Dutch. The computer learns by looking at example sentences and their translations, e.g., “She likes to read books”/”Ze leest graag boeken”. Even with many such examples, learning which word means what and how word meanings combine into sentence meanings is a challenge, because translations are rarely word-for-word. They exhibit grammatical differences and non-literalities. My thesis presents a method for tackling these challenges based on the grammar formalism Combinatory Categorial Grammar. It shows that this is a suitable formalism for this purpose, that many structural differences between sentences and their translations can be dealt with in this framework, and that a (rudimentary) semantic parser for Dutch can be learned cross-lingually based on one for English. I also investigate methods for building large corpora of texts annotated with logical formulas to further study and improve semantic parsers

    MANIPULATION ACTION UNDERSTANDING FOR OBSERVATION AND EXECUTION

    Get PDF
    Modern intelligent agents will need to learn the actions that humans perform. They will need to recognize these actions when they see them and they will need to perform these actions themselves. We want to propose a cognitive system that interprets human manipulation actions from perceptual information (image and depth data) and consists of perceptual modules and reasoning modules that are in interaction with each other. The contributions of this work are given along two core problems at the heart of action understanding: a.) the grounding of relevant information about actions in perception (the perception - action integration problem), and b.) the organization of perceptual and high-level symbolic information for interpreting the actions (the sequencing problem). At the high level, actions are represented with the Manipulation Action Context-free Grammar (MACFG) , a syntactic grammar and associated parsing algorithms, which organizes actions as a sequence of sub-events. Each sub-event is described by the hand (as well as grasp type), movements (actions) and the objects and tools involved, and the relevant information about these quantities is obtained from biological-inspired perception modules. These modules track the hands and objects and recognize the hand grasp, actions, segmentation, and action consequences. Furthermore, a probabilistic semantic parsing framework based on CCG (Combinatory Categorial Grammar) theory is adopted to model the semantic meaning of human manipulation actions. Additionally, the lesson from the findings on mirror neurons is that the two processes of interpreting visually observed action and generating actions, should share the same underlying cognitive process. Recent studies have shown that grammatical structures underlie the representation of manipulation actions, which are used both to understand and to execute these actions. Analogically, understanding manipulation actions is like understanding language, while executing them is like generating language. Experiments on two tasks, 1) a robot observing people performing manipulation actions, and 2) a robot then executing manipulation actions accordingly, are presented to validate the formalism. The technical parts of this thesis are devoted to the experimental setting of task (1), while the task (2) is given as a live demonstration

    Neural Combinatory Constituency Parsing

    Get PDF
    東京都立大学Tokyo Metropolitan University博士(情報科学)doctoral thesi

    Learning natural language syntax

    Get PDF

    Supervised Training on Synthetic Languages: A Novel Framework for Unsupervised Parsing

    Get PDF
    This thesis focuses on unsupervised dependency parsing—parsing sentences of a language into dependency trees without accessing the training data of that language. Different from most prior work that uses unsupervised learning to estimate the parsing parameters, we estimate the parameters by supervised training on synthetic languages. Our parsing framework has three major components: Synthetic language generation gives a rich set of training languages by mix-and-match over the real languages; surface-form feature extraction maps an unparsed corpus of a language into a fixed-length vector as the syntactic signature of that language; and, finally, language-agnostic parsing incorporates the syntactic signature during parsing so that the decision on each word token is reliant upon the general syntax of the target language. The fundamental question we are trying to answer is whether some useful information about the syntax of a language could be inferred from its surface-form evidence (unparsed corpus). This is the same question that has been implicitly asked by previous papers on unsupervised parsing, which only assumes an unparsed corpus to be available for the target language. We show that, indeed, useful features of the target language can be extracted automatically from an unparsed corpus, which consists only of gold part-of-speech (POS) sequences. Providing these features to our neural parser enables it to parse sequences like those in the corpus. Strikingly, our system has no supervision in the target language. Rather, it is a multilingual system that is trained end-to-end on a variety of other languages, so it learns a feature extractor that works well. This thesis contains several large-scale experiments requiring hundreds of thousands of CPU-hours. To our knowledge, this is the largest study of unsupervised parsing yet attempted. We show experimentally across multiple languages: (1) Features computed from the unparsed corpus improve parsing accuracy. (2) Including thousands of synthetic languages in the training yields further improvement. (3) Despite being computed from unparsed corpora, our learned task-specific features beat previous works’ interpretable typological features that require parsed corpora or expert categorization of the language

    文に隠れた構文構造を発見する統計モデル

    Get PDF
    要旨あり統計的言語研究の現在研究詳

    Character-based Neural Semantic Parsing

    Get PDF
    Humans and computers do not speak the same language. A lot of day-to-day tasks would be vastly more efficient if we could communicate with computers using natural language instead of relying on an interface. It is necessary, then, that the computer does not see a sentence as a collection of individual words, but instead can understand the deeper, compositional meaning of the sentence. A way to tackle this problem is to automatically assign a formal, structured meaning representation to each sentence, which are easy for computers to interpret. There have been quite a few attempts at this before, but these approaches were usually heavily reliant on predefined rules, word lists or representations of the syntax of the text. This made the general usage of these methods quite complicated. In this thesis we employ an algorithm that can learn to automatically assign meaning representations to texts, without using any such external resource. Specifically, we use a type of artificial neural network called a sequence-to-sequence model, in a process that is often referred to as deep learning. The devil is in the details, but we find that this type of algorithm can produce high quality meaning representations, with better performance than the more traditional methods. Moreover, a main finding of the thesis is that, counter intuitively, it is often better to represent the text as a sequence of individual characters, and not words. This is likely the case because it helps the model in dealing with spelling errors, unknown words and inflections
    corecore