36,540 research outputs found

    Treebank-based acquisition of LFG resources for Chinese

    Get PDF
    This paper presents a method to automatically acquire wide-coverage, robust, probabilistic Lexical-Functional Grammar resources for Chinese from the Penn Chinese Treebank (CTB). Our starting point is the earlier, proofof- concept work of (Burke et al., 2004) on automatic f-structure annotation, LFG grammar acquisition and parsing for Chinese using the CTB version 2 (CTB2). We substantially extend and improve on this earlier research as regards coverage, robustness, quality and fine-grainedness of the resulting LFG resources. We achieve this through (i) improved LFG analyses for a number of core Chinese phenomena; (ii) a new automatic f-structure annotation architecture which involves an intermediate dependency representation; (iii) scaling the approach from 4.1K trees in CTB2 to 18.8K trees in CTB version 5.1 (CTB5.1) and (iv) developing a novel treebank-based approach to recovering non-local dependencies (NLDs) for Chinese parser output. Against a new 200-sentence good standard of manually constructed f-structures, the method achieves 96.00% f-score for f-structures automatically generated for the original CTB trees and 80.01%for NLD-recovered f-structures generated for the trees output by Bikel’s parser

    Feature discovery and visualization of robot mission data using convolutional autoencoders and Bayesian nonparametric topic models

    Full text link
    The gap between our ability to collect interesting data and our ability to analyze these data is growing at an unprecedented rate. Recent algorithmic attempts to fill this gap have employed unsupervised tools to discover structure in data. Some of the most successful approaches have used probabilistic models to uncover latent thematic structure in discrete data. Despite the success of these models on textual data, they have not generalized as well to image data, in part because of the spatial and temporal structure that may exist in an image stream. We introduce a novel unsupervised machine learning framework that incorporates the ability of convolutional autoencoders to discover features from images that directly encode spatial information, within a Bayesian nonparametric topic model that discovers meaningful latent patterns within discrete data. By using this hybrid framework, we overcome the fundamental dependency of traditional topic models on rigidly hand-coded data representations, while simultaneously encoding spatial dependency in our topics without adding model complexity. We apply this model to the motivating application of high-level scene understanding and mission summarization for exploratory marine robots. Our experiments on a seafloor dataset collected by a marine robot show that the proposed hybrid framework outperforms current state-of-the-art approaches on the task of unsupervised seafloor terrain characterization.Comment: 8 page

    Hierarchical Quantized Representations for Script Generation

    Full text link
    Scripts define knowledge about how everyday scenarios (such as going to a restaurant) are expected to unfold. One of the challenges to learning scripts is the hierarchical nature of the knowledge. For example, a suspect arrested might plead innocent or guilty, and a very different track of events is then expected to happen. To capture this type of information, we propose an autoencoder model with a latent space defined by a hierarchy of categorical variables. We utilize a recently proposed vector quantization based approach, which allows continuous embeddings to be associated with each latent variable value. This permits the decoder to softly decide what portions of the latent hierarchy to condition on by attending over the value embeddings for a given setting. Our model effectively encodes and generates scripts, outperforming a recent language modeling-based method on several standard tasks, and allowing the autoencoder model to achieve substantially lower perplexity scores compared to the previous language modeling-based method.Comment: EMNLP 201

    Coupling of quantum angular momenta: an insight into analogic/discrete and local/global models of computation

    Full text link
    In the past few years there has been a tumultuous activity aimed at introducing novel conceptual schemes for quantum computing. The approach proposed in (Marzuoli A and Rasetti M 2002, 2005a) relies on the (re)coupling theory of SU(2) angular momenta and can be viewed as a generalization to arbitrary values of the spin variables of the usual quantum-circuit model based on `qubits' and Boolean gates. Computational states belong to finite-dimensional Hilbert spaces labelled by both discrete and continuous parameters, and unitary gates may depend on quantum numbers ranging over finite sets of values as well as continuous (angular) variables. Such a framework is an ideal playground to discuss discrete (digital) and analogic computational processes, together with their relationships occuring when a consistent semiclassical limit takes place on discrete quantum gates. When working with purely discrete unitary gates, the simulator is naturally modelled as families of quantum finite states--machines which in turn represent discrete versions of topological quantum computation models. We argue that our model embodies a sort of unifying paradigm for computing inspired by Nature and, even more ambitiously, a universal setting in which suitably encoded quantum symbolic manipulations of combinatorial, topological and algebraic problems might find their `natural' computational reference model.Comment: 17 pages, 1 figure; Workshop `Natural processes and models of computation' Bologna (Italy) June 16-18 2005; to appear in Natural Computin
    corecore