3,970 research outputs found
Having Your Cake and Eating It Too: Autonomy and Interaction in a Model of Sentence Processing
Is the human language understander a collection of modular processes
operating with relative autonomy, or is it a single integrated process? This
ongoing debate has polarized the language processing community, with two
fundamentally different types of model posited, and with each camp concluding
that the other is wrong. One camp puts forth a model with separate processors
and distinct knowledge sources to explain one body of data, and the other
proposes a model with a single processor and a homogeneous, monolithic
knowledge source to explain the other body of data. In this paper we argue that
a hybrid approach which combines a unified processor with separate knowledge
sources provides an explanation of both bodies of data, and we demonstrate the
feasibility of this approach with the computational model called COMPERE. We
believe that this approach brings the language processing community
significantly closer to offering human-like language processing systems.Comment: 7 pages, uses aaai.sty macr
Parameter Learning of Logic Programs for Symbolic-Statistical Modeling
We propose a logical/mathematical framework for statistical parameter
learning of parameterized logic programs, i.e. definite clause programs
containing probabilistic facts with a parameterized distribution. It extends
the traditional least Herbrand model semantics in logic programming to
distribution semantics, possible world semantics with a probability
distribution which is unconditionally applicable to arbitrary logic programs
including ones for HMMs, PCFGs and Bayesian networks. We also propose a new EM
algorithm, the graphical EM algorithm, that runs for a class of parameterized
logic programs representing sequential decision processes where each decision
is exclusive and independent. It runs on a new data structure called support
graphs describing the logical relationship between observations and their
explanations, and learns parameters by computing inside and outside probability
generalized for logic programs. The complexity analysis shows that when
combined with OLDT search for all explanations for observations, the graphical
EM algorithm, despite its generality, has the same time complexity as existing
EM algorithms, i.e. the Baum-Welch algorithm for HMMs, the Inside-Outside
algorithm for PCFGs, and the one for singly connected Bayesian networks that
have been developed independently in each research field. Learning experiments
with PCFGs using two corpora of moderate size indicate that the graphical EM
algorithm can significantly outperform the Inside-Outside algorithm
Data-Oriented Parsing with Discontinuous Constituents and Function Tags
Statistical parsers are e ective but are typically limited to producing projective dependencies or constituents. On the other hand, linguisti- cally rich parsers recognize non-local relations and analyze both form and function phenomena but rely on extensive manual grammar development. We combine advantages of the two by building a statistical parser that produces richer analyses. We investigate new techniques to implement treebank-based parsers that allow for discontinuous constituents. We present two systems. One system is based on a string-rewriting Linear Context-Free Rewriting System (LCFRS), while using a Probabilistic Discontinuous Tree Substitution Grammar (PDTSG) to improve disambiguation performance. Another system encodes the discontinuities in the labels of phrase structure trees, allowing for efficient context-free grammar parsing. The two systems demonstrate that tree fragments as used in tree-substitution grammar improve disambiguation performance while capturing non-local relations on an as-needed basis. Additionally, we present results of models that produce function tags, resulting in a more linguistically adequate model of the data. We report substantial accuracy improvements in discontinuous parsing for German, English, and Dutch, including results on spoken Dutch
Data-Oriented Parsing with discontinuous constituents and function tags
Statistical parsers are e ective but are typically limited to producing projective dependencies or constituents. On the other hand, linguisti- cally rich parsers recognize non-local relations and analyze both form and function phenomena but rely on extensive manual grammar development. We combine advantages of the two by building a statistical parser that produces richer analyses.
We investigate new techniques to implement treebank-based parsers that allow for discontinuous constituents. We present two systems. One system is based on a string-rewriting Linear Context-Free Rewriting System (LCFRS), while using a Probabilistic Discontinuous Tree Substitution Grammar (PDTSG) to improve disambiguation performance. Another system encodes the discontinuities in the labels of phrase structure trees, allowing for efficient context-free grammar parsing.
The two systems demonstrate that tree fragments as used in tree-substitution grammar improve disambiguation performance while capturing non-local relations on an as-needed basis. Additionally, we present results of models that produce function tags, resulting in a more linguistically adequate model of the data. We report substantial accuracy improvements in discontinuous parsing for German, English, and Dutch, including results on spoken Dutch
A rule-based approach to implicit emotion detection in text
Most research in the area of emotion detection in written text focused on detecting explicit expressions of emotions in text. In this paper, we present a rule-based pipeline approach for detecting implicit emotions in written text without emotion-bearing words based on the OCC Model. We have evaluated our approach on three different datasets with five emotion categories. Our results show that the proposed approach outperforms the lexicon matching method consistently across all the three datasets by a large margin of 17–30% in F-measure and gives competitive performance compared to a supervised classifier. In particular, when dealing with formal text which follows grammatical rules strictly, our approach gives an average F-measure of 82.7% on “Happy”, “Angry-Disgust” and “Sad”, even outperforming the supervised baseline by nearly 17% in F-measure. Our preliminary results show the feasibility of the approach for the task of implicit emotion detection in written text
Using semantic cues to learn syntax
We present a method for dependency grammar induction that utilizes sparse annotations of semantic relations. This induction set-up is attractive because such annotations provide useful
clues about the underlying syntactic structure, and they are readily available in many domains (e.g., info-boxes and HTML markup). Our method is based on the intuition that syntactic realizations of the same semantic predicate exhibit some degree of consistency. We incorporate this intuition in
a directed graphical model that tightly links the syntactic and semantic structures. This design enables us to exploit syntactic regularities while still allowing for variations. Another strength of the model lies in its ability to capture non-local dependency relations. Our results demonstrate that even a small amount of semantic annotations greatly improves the accuracy of learned dependencies when tested on both in-domain and out-of-domain texts.United States. Defense Advanced Research Projects Agency (Defense Advanced Research Projects Agency (DARPA) Machine Reading Program under Air Force Research Laboratory (AFRL) prime contract no. FA8750-09-C-0172)United States. Defense Advanced Research Projects Agency (Air Force Research Laboratory (AFRL) prime contract no. FA8750-09-C-0172)U.S. Army Research Laboratory (contract no. W911NF-10-1-0533
Recommended from our members
Subject/object processing asymmetries in Korean relative clauses: Evidence from ERP data
Subject relative (SR) clauses have a reliable processing advantage in VO languages like English in which relative clauses (RCs) follow the head noun. The question is whether this is also routinely true in OV languages like Japanese and Korean, in which RCs precede the head noun. We conducted an event-related brain potential (ERP) study of Korean RCs to test whether the SR advantage manifests in brain responses, and to tease apart the typological factors that might contribute to these responses. Our results suggest that brain responses to RCs are remarkably similar in VO and OV languages. Our results also suggest that the marking of the right edge of the RC in Chinese (Yang et al. 2010) and Korean and the absence of such marking in Japanese (Ueno & Garnsey 2008) affect the response to the following head noun. The consistent SR advantage found in ERP studies lends further support to a universal subject preference in the processing of relative clauses.Linguistic
From Paraphrase Database to Compositional Paraphrase Model and Back
The Paraphrase Database (PPDB; Ganitkevitch et al., 2013) is an extensive
semantic resource, consisting of a list of phrase pairs with (heuristic)
confidence estimates. However, it is still unclear how it can best be used, due
to the heuristic nature of the confidences and its necessarily incomplete
coverage. We propose models to leverage the phrase pairs from the PPDB to build
parametric paraphrase models that score paraphrase pairs more accurately than
the PPDB's internal scores while simultaneously improving its coverage. They
allow for learning phrase embeddings as well as improved word embeddings.
Moreover, we introduce two new, manually annotated datasets to evaluate
short-phrase paraphrasing models. Using our paraphrase model trained using
PPDB, we achieve state-of-the-art results on standard word and bigram
similarity tasks and beat strong baselines on our new short phrase paraphrase
tasks.Comment: 2015 TACL paper updated with an appendix describing new 300
dimensional embeddings. Submitted 1/2015. Accepted 2/2015. Published 6/201
- …