5,931 research outputs found
Deriving Verb Predicates By Clustering Verbs with Arguments
Hand-built verb clusters such as the widely used Levin classes (Levin, 1993)
have proved useful, but have limited coverage. Verb classes automatically
induced from corpus data such as those from VerbKB (Wijaya, 2016), on the other
hand, can give clusters with much larger coverage, and can be adapted to
specific corpora such as Twitter. We present a method for clustering the
outputs of VerbKB: verbs with their multiple argument types, e.g.
"marry(person, person)", "feel(person, emotion)." We make use of a novel
low-dimensional embedding of verbs and their arguments to produce high quality
clusters in which the same verb can be in different clusters depending on its
argument type. The resulting verb clusters do a better job than hand-built
clusters of predicting sarcasm, sentiment, and locus of control in tweets
Can Subcategorisation Probabilities Help a Statistical Parser?
Research into the automatic acquisition of lexical information from corpora
is starting to produce large-scale computational lexicons containing data on
the relative frequencies of subcategorisation alternatives for individual
verbal predicates. However, the empirical question of whether this type of
frequency information can in practice improve the accuracy of a statistical
parser has not yet been answered. In this paper we describe an experiment with
a wide-coverage statistical grammar and parser for English and
subcategorisation frequencies acquired from ten million words of text which
shows that this information can significantly improve parse accuracy.Comment: 9 pages, uses colacl.st
Bootstrapping Lexical Choice via Multiple-Sequence Alignment
An important component of any generation system is the mapping dictionary, a
lexicon of elementary semantic expressions and corresponding natural language
realizations. Typically, labor-intensive knowledge-based methods are used to
construct the dictionary. We instead propose to acquire it automatically via a
novel multiple-pass algorithm employing multiple-sequence alignment, a
technique commonly used in bioinformatics. Crucially, our method leverages
latent information contained in multi-parallel corpora -- datasets that supply
several verbalizations of the corresponding semantics rather than just one.
We used our techniques to generate natural language versions of
computer-generated mathematical proofs, with good results on both a
per-component and overall-output basis. For example, in evaluations involving a
dozen human judges, our system produced output whose readability and
faithfulness to the semantic input rivaled that of a traditional generation
system.Comment: 8 pages; to appear in the proceedings of EMNLP-200
On past participle agreement in transitive clauses in French
This paper provides a Minimalist analysis of past participle agreement in French in transitive
clauses. Our account posits that the head v of vP in such structures carries an (accusativeassigning) structural case feature which may apply (with or without concomitant agreement)
to case-mark a clause-mate object, the subject of a defective complement clause, or an
intermediate copy of a preposed subject in spec-CP. In structures where a goal is extracted
from vP (e.g. via wh-movement) v also carries an edge feature, and may also carry a
specificity feature and a set of (number and gender) agreement features. We show how these
assumptions account for agreement of a participle with a preposed specific clausemate object
or defective-clause subject, and for the absence of agreement with an embedded object, with
the complement of an impersonal verb, and with the subject of an embedded (finite or nonfinite) CP complement. We also argue that the absence of agreement marking (in expected
contexts) on the participles faitmade and laissĂŠlet in infinitive structures is essentially viral in
nature. Finally, we claim that obligatory participle agreement with reflexive and reciprocal
objects arises because the derivation of reflexives involves A-movement and concomitant
agreement
Towards Building a Knowledge Base of Monetary Transactions from a News Collection
We address the problem of extracting structured representations of economic
events from a large corpus of news articles, using a combination of natural
language processing and machine learning techniques. The developed techniques
allow for semi-automatic population of a financial knowledge base, which, in
turn, may be used to support a range of data mining and exploration tasks. The
key challenge we face in this domain is that the same event is often reported
multiple times, with varying correctness of details. We address this challenge
by first collecting all information pertinent to a given event from the entire
corpus, then considering all possible representations of the event, and
finally, using a supervised learning method, to rank these representations by
the associated confidence scores. A main innovative element of our approach is
that it jointly extracts and stores all attributes of the event as a single
representation (quintuple). Using a purpose-built test set we demonstrate that
our supervised learning approach can achieve 25% improvement in F1-score over
baseline methods that consider the earliest, the latest or the most frequent
reporting of the event.Comment: Proceedings of the 17th ACM/IEEE-CS Joint Conference on Digital
Libraries (JCDL '17), 201
Why swimming is just as difficult as dying for japanese learners of english
While both Japanese and English have a grammatic al form denoting the progressive, the two forms (te-iru & be+ing) interact differently with the inherent semantics of the verb to which they attach (Kindaichi, 1950; McClure, 1995; Shirai, 2000). Japanese change of state verbs are incompatible with a progressive interpretation, allowing only a resultative interpretation of V+ te-iru, while a progressive interpretation is preferred for activity predicates. English be+ing denotes a progressive interpretation regardless of the lexical semantics of the verb. The question that arises is how we can account for the fact that change of state verbs like dying can denote a progressive interpretation in English, but not in Japanese. While researchers such as Kageyama (1996) and Ogihara (1998, 1999) propose that the difference lies in the lexical semantics of the verbs themselves, others such as McClure (1995) have argued that the difference lies in the semantics of the grammatical forms, be+ing and te-iru. We present results from an experimental study of Japanese learnersâ interpretation of the English progressive which provide support for McClureâs proposal. Results indicate that independent of verb type, learners had significantly more difficulty with the past progressive. We argue that knowledge of L2 semantics-syntax correspondences proceeds not on the basis of L1 lexical semantic knowledge, but on the basis of grammatical forms
- âŚ