4,189 research outputs found
Connectionist natural language parsing
The key developments of two decades of connectionist parsing are reviewed. Connectionist parsers are assessed according to their ability to learn to represent syntactic structures from examples automatically, without being presented with symbolic grammar rules. This review also considers the extent to which connectionist parsers offer computational models of human sentence processing and provide plausible accounts of psycholinguistic data. In considering these issues, special attention is paid to the level of realism, the nature of the modularity, and the type of processing that is to be found in a wide range of parsers
Extracting Biomolecular Interactions Using Semantic Parsing of Biomedical Text
We advance the state of the art in biomolecular interaction extraction with
three contributions: (i) We show that deep, Abstract Meaning Representations
(AMR) significantly improve the accuracy of a biomolecular interaction
extraction system when compared to a baseline that relies solely on surface-
and syntax-based features; (ii) In contrast with previous approaches that infer
relations on a sentence-by-sentence basis, we expand our framework to enable
consistent predictions over sets of sentences (documents); (iii) We further
modify and expand a graph kernel learning framework to enable concurrent
exploitation of automatically induced AMR (semantic) and dependency structure
(syntactic) representations. Our experiments show that our approach yields
interaction extraction systems that are more robust in environments where there
is a significant mismatch between training and test conditions.Comment: Appearing in Proceedings of the Thirtieth AAAI Conference on
Artificial Intelligence (AAAI-16
Learning Dynamic Feature Selection for Fast Sequential Prediction
We present paired learning and inference algorithms for significantly
reducing computation and increasing speed of the vector dot products in the
classifiers that are at the heart of many NLP components. This is accomplished
by partitioning the features into a sequence of templates which are ordered
such that high confidence can often be reached using only a small fraction of
all features. Parameter estimation is arranged to maximize accuracy and early
confidence in this sequence. Our approach is simpler and better suited to NLP
than other related cascade methods. We present experiments in left-to-right
part-of-speech tagging, named entity recognition, and transition-based
dependency parsing. On the typical benchmarking datasets we can preserve POS
tagging accuracy above 97% and parsing LAS above 88.5% both with over a
five-fold reduction in run-time, and NER F1 above 88 with more than 2x increase
in speed.Comment: Appears in The 53rd Annual Meeting of the Association for
Computational Linguistics, Beijing, China, July 201
Towards Zero-Overhead Disambiguation of Deep Priority Conflicts
**Context** Context-free grammars are widely used for language prototyping
and implementation. They allow formalizing the syntax of domain-specific or
general-purpose programming languages concisely and declaratively. However, the
natural and concise way of writing a context-free grammar is often ambiguous.
Therefore, grammar formalisms support extensions in the form of *declarative
disambiguation rules* to specify operator precedence and associativity, solving
ambiguities that are caused by the subset of the grammar that corresponds to
expressions.
**Inquiry** Implementing support for declarative disambiguation within a
parser typically comes with one or more of the following limitations in
practice: a lack of parsing performance, or a lack of modularity (i.e.,
disallowing the composition of grammar fragments of potentially different
languages). The latter subject is generally addressed by scannerless
generalized parsers. We aim to equip scannerless generalized parsers with novel
disambiguation methods that are inherently performant, without compromising the
concerns of modularity and language composition.
**Approach** In this paper, we present a novel low-overhead implementation
technique for disambiguating deep associativity and priority conflicts in
scannerless generalized parsers with lightweight data-dependency.
**Knowledge** Ambiguities with respect to operator precedence and
associativity arise from combining the various operators of a language. While
*shallow conflicts* can be resolved efficiently by one-level tree patterns,
*deep conflicts* require more elaborate techniques, because they can occur
arbitrarily nested in a tree. Current state-of-the-art approaches to solving
deep priority conflicts come with a severe performance overhead.
**Grounding** We evaluated our new approach against state-of-the-art
declarative disambiguation mechanisms. By parsing a corpus of popular
open-source repositories written in Java and OCaml, we found that our approach
yields speedups of up to 1.73x over a grammar rewriting technique when parsing
programs with deep priority conflicts--with a modest overhead of 1-2 % when
parsing programs without deep conflicts.
**Importance** A recent empirical study shows that deep priority conflicts
are indeed wide-spread in real-world programs. The study shows that in a corpus
of popular OCaml projects on Github, up to 17 % of the source files contain
deep priority conflicts. However, there is no solution in the literature that
addresses efficient disambiguation of deep priority conflicts, with support for
modular and composable syntax definitions
Generalizing input-driven languages: theoretical and practical benefits
Regular languages (RL) are the simplest family in Chomsky's hierarchy. Thanks
to their simplicity they enjoy various nice algebraic and logic properties that
have been successfully exploited in many application fields. Practically all of
their related problems are decidable, so that they support automatic
verification algorithms. Also, they can be recognized in real-time.
Context-free languages (CFL) are another major family well-suited to
formalize programming, natural, and many other classes of languages; their
increased generative power w.r.t. RL, however, causes the loss of several
closure properties and of the decidability of important problems; furthermore
they need complex parsing algorithms. Thus, various subclasses thereof have
been defined with different goals, spanning from efficient, deterministic
parsing to closure properties, logic characterization and automatic
verification techniques.
Among CFL subclasses, so-called structured ones, i.e., those where the
typical tree-structure is visible in the sentences, exhibit many of the
algebraic and logic properties of RL, whereas deterministic CFL have been
thoroughly exploited in compiler construction and other application fields.
After surveying and comparing the main properties of those various language
families, we go back to operator precedence languages (OPL), an old family
through which R. Floyd pioneered deterministic parsing, and we show that they
offer unexpected properties in two fields so far investigated in totally
independent ways: they enable parsing parallelization in a more effective way
than traditional sequential parsers, and exhibit the same algebraic and logic
properties so far obtained only for less expressive language families
- …