36,540 research outputs found
Treebank-based acquisition of LFG resources for Chinese
This paper presents a method to automatically acquire wide-coverage, robust, probabilistic Lexical-Functional Grammar resources for Chinese from the Penn Chinese Treebank (CTB). Our starting point is the earlier, proofof-
concept work of (Burke et al., 2004) on automatic f-structure annotation, LFG grammar acquisition and parsing for Chinese using the CTB version 2 (CTB2). We substantially extend and improve on this earlier research as regards coverage, robustness, quality and fine-grainedness of the resulting LFG resources. We achieve this through (i) improved LFG analyses for a number of core Chinese phenomena; (ii) a new automatic f-structure annotation architecture which involves an intermediate dependency representation; (iii) scaling the approach from 4.1K trees in CTB2 to 18.8K trees in CTB version 5.1 (CTB5.1) and (iv) developing a novel treebank-based approach to recovering non-local dependencies (NLDs) for Chinese parser output. Against a new 200-sentence good standard of manually constructed f-structures, the method achieves 96.00% f-score for f-structures automatically generated for the original CTB trees and 80.01%for NLD-recovered f-structures generated for the trees output by Bikel’s parser
Feature discovery and visualization of robot mission data using convolutional autoencoders and Bayesian nonparametric topic models
The gap between our ability to collect interesting data and our ability to
analyze these data is growing at an unprecedented rate. Recent algorithmic
attempts to fill this gap have employed unsupervised tools to discover
structure in data. Some of the most successful approaches have used
probabilistic models to uncover latent thematic structure in discrete data.
Despite the success of these models on textual data, they have not generalized
as well to image data, in part because of the spatial and temporal structure
that may exist in an image stream.
We introduce a novel unsupervised machine learning framework that
incorporates the ability of convolutional autoencoders to discover features
from images that directly encode spatial information, within a Bayesian
nonparametric topic model that discovers meaningful latent patterns within
discrete data. By using this hybrid framework, we overcome the fundamental
dependency of traditional topic models on rigidly hand-coded data
representations, while simultaneously encoding spatial dependency in our topics
without adding model complexity. We apply this model to the motivating
application of high-level scene understanding and mission summarization for
exploratory marine robots. Our experiments on a seafloor dataset collected by a
marine robot show that the proposed hybrid framework outperforms current
state-of-the-art approaches on the task of unsupervised seafloor terrain
characterization.Comment: 8 page
Hierarchical Quantized Representations for Script Generation
Scripts define knowledge about how everyday scenarios (such as going to a
restaurant) are expected to unfold. One of the challenges to learning scripts
is the hierarchical nature of the knowledge. For example, a suspect arrested
might plead innocent or guilty, and a very different track of events is then
expected to happen. To capture this type of information, we propose an
autoencoder model with a latent space defined by a hierarchy of categorical
variables. We utilize a recently proposed vector quantization based approach,
which allows continuous embeddings to be associated with each latent variable
value. This permits the decoder to softly decide what portions of the latent
hierarchy to condition on by attending over the value embeddings for a given
setting. Our model effectively encodes and generates scripts, outperforming a
recent language modeling-based method on several standard tasks, and allowing
the autoencoder model to achieve substantially lower perplexity scores compared
to the previous language modeling-based method.Comment: EMNLP 201
Coupling of quantum angular momenta: an insight into analogic/discrete and local/global models of computation
In the past few years there has been a tumultuous activity aimed at
introducing novel conceptual schemes for quantum computing. The approach
proposed in (Marzuoli A and Rasetti M 2002, 2005a) relies on the (re)coupling
theory of SU(2) angular momenta and can be viewed as a generalization to
arbitrary values of the spin variables of the usual quantum-circuit model based
on `qubits' and Boolean gates. Computational states belong to
finite-dimensional Hilbert spaces labelled by both discrete and continuous
parameters, and unitary gates may depend on quantum numbers ranging over finite
sets of values as well as continuous (angular) variables. Such a framework is
an ideal playground to discuss discrete (digital) and analogic computational
processes, together with their relationships occuring when a consistent
semiclassical limit takes place on discrete quantum gates. When working with
purely discrete unitary gates, the simulator is naturally modelled as families
of quantum finite states--machines which in turn represent discrete versions of
topological quantum computation models. We argue that our model embodies a sort
of unifying paradigm for computing inspired by Nature and, even more
ambitiously, a universal setting in which suitably encoded quantum symbolic
manipulations of combinatorial, topological and algebraic problems might find
their `natural' computational reference model.Comment: 17 pages, 1 figure; Workshop `Natural processes and models of
computation' Bologna (Italy) June 16-18 2005; to appear in Natural Computin
- …