13,564 research outputs found
Recovering non-local dependencies for Chinese
To date, work on Non-Local Dependencies (NLDs) has focused almost exclusively on English and it is an open research question how well these approaches migrate to other languages. This paper surveys non-local dependency constructions in Chinese as represented in the Penn Chinese Treebank (CTB) and provides an approach for generating
proper predicate-argument-modifier structures including NLDs from surface contextfree phrase structure trees. Our approach recovers non-local dependencies at the level
of Lexical-Functional Grammar f-structures, using automatically acquired subcategorisation frames and f-structure paths linking antecedents and traces in NLDs. Currently our algorithm achieves 92.2% f-score for trace
insertion and 84.3% for antecedent recovery evaluating on gold-standard CTB trees, and 64.7% and 54.7%, respectively, on CTBtrained state-of-the-art parser output trees
Matrix Factorization at Scale: a Comparison of Scientific Data Analytics in Spark and C+MPI Using Three Case Studies
We explore the trade-offs of performing linear algebra using Apache Spark,
compared to traditional C and MPI implementations on HPC platforms. Spark is
designed for data analytics on cluster computing platforms with access to local
disks and is optimized for data-parallel tasks. We examine three widely-used
and important matrix factorizations: NMF (for physical plausability), PCA (for
its ubiquity) and CX (for data interpretability). We apply these methods to
TB-sized problems in particle physics, climate modeling and bioimaging. The
data matrices are tall-and-skinny which enable the algorithms to map
conveniently into Spark's data-parallel model. We perform scaling experiments
on up to 1600 Cray XC40 nodes, describe the sources of slowdowns, and provide
tuning guidance to obtain high performance
Recommended from our members
Proceedings of QG2010: The Third Workshop on Question Generation
These are the peer-reviewed proceedings of "QG2010, The Third Workshop on Question Generation". The workshop included a special track for "QGSTEC2010: The First Question Generation Shared Task and Evaluation Challenge".
QG2010 was held as part of The Tenth International Conference on Intelligent Tutoring Systems (ITS2010)
Active Coverage for PAC Reinforcement Learning
Collecting and leveraging data with good coverage properties plays a crucial
role in different aspects of reinforcement learning (RL), including reward-free
exploration and offline learning. However, the notion of "good coverage" really
depends on the application at hand, as data suitable for one context may not be
so for another. In this paper, we formalize the problem of active coverage in
episodic Markov decision processes (MDPs), where the goal is to interact with
the environment so as to fulfill given sampling requirements. This framework is
sufficiently flexible to specify any desired coverage property, making it
applicable to any problem that involves online exploration. Our main
contribution is an instance-dependent lower bound on the sample complexity of
active coverage and a simple game-theoretic algorithm, CovGame, that nearly
matches it. We then show that CovGame can be used as a building block to solve
different PAC RL tasks. In particular, we obtain a simple algorithm for PAC
reward-free exploration with an instance-dependent sample complexity that, in
certain MDPs which are "easy to explore", is lower than the minimax one. By
further coupling this exploration algorithm with a new technique to do implicit
eliminations in policy space, we obtain a computationally-efficient algorithm
for best-policy identification whose instance-dependent sample complexity
scales with gaps between policy values.Comment: Accepted at COLT 202
System Identification with Time-Aware Neural Sequence Models
Established recurrent neural networks are well-suited to solve a wide variety
of prediction tasks involving discrete sequences. However, they do not perform
as well in the task of dynamical system identification, when dealing with
observations from continuous variables that are unevenly sampled in time, for
example due to missing observations. We show how such neural sequence models
can be adapted to deal with variable step sizes in a natural way. In
particular, we introduce a time-aware and stationary extension of existing
models (including the Gated Recurrent Unit) that allows them to deal with
unevenly sampled system observations by adapting to the observation times,
while facilitating higher-order temporal behavior. We discuss the properties
and demonstrate the validity of the proposed approach, based on samples from
two industrial input/output processes.Comment: 34th AAAI Conference on Artificial Intelligence (AAAI 2020
Towards Instance-Optimality in Online PAC Reinforcement Learning
Several recent works have proposed instance-dependent upper bounds on the
number of episodes needed to identify, with probability , an
-optimal policy in finite-horizon tabular Markov Decision
Processes (MDPs). These upper bounds feature various complexity measures for
the MDP, which are defined based on different notions of sub-optimality gaps.
However, as of now, no lower bound has been established to assess the
optimality of any of these complexity measures, except for the special case of
MDPs with deterministic transitions. In this paper, we propose the first
instance-dependent lower bound on the sample complexity required for the PAC
identification of a near-optimal policy in any tabular episodic MDP.
Additionally, we demonstrate that the sample complexity of the PEDEL algorithm
of \cite{Wagenmaker22linearMDP} closely approaches this lower bound.
Considering the intractability of PEDEL, we formulate an open question
regarding the possibility of achieving our lower bound using a
computationally-efficient algorithm
- …