5,931 research outputs found

    Deriving Verb Predicates By Clustering Verbs with Arguments

    Full text link
    Hand-built verb clusters such as the widely used Levin classes (Levin, 1993) have proved useful, but have limited coverage. Verb classes automatically induced from corpus data such as those from VerbKB (Wijaya, 2016), on the other hand, can give clusters with much larger coverage, and can be adapted to specific corpora such as Twitter. We present a method for clustering the outputs of VerbKB: verbs with their multiple argument types, e.g. "marry(person, person)", "feel(person, emotion)." We make use of a novel low-dimensional embedding of verbs and their arguments to produce high quality clusters in which the same verb can be in different clusters depending on its argument type. The resulting verb clusters do a better job than hand-built clusters of predicting sarcasm, sentiment, and locus of control in tweets

    Can Subcategorisation Probabilities Help a Statistical Parser?

    Full text link
    Research into the automatic acquisition of lexical information from corpora is starting to produce large-scale computational lexicons containing data on the relative frequencies of subcategorisation alternatives for individual verbal predicates. However, the empirical question of whether this type of frequency information can in practice improve the accuracy of a statistical parser has not yet been answered. In this paper we describe an experiment with a wide-coverage statistical grammar and parser for English and subcategorisation frequencies acquired from ten million words of text which shows that this information can significantly improve parse accuracy.Comment: 9 pages, uses colacl.st

    Bootstrapping Lexical Choice via Multiple-Sequence Alignment

    Get PDF
    An important component of any generation system is the mapping dictionary, a lexicon of elementary semantic expressions and corresponding natural language realizations. Typically, labor-intensive knowledge-based methods are used to construct the dictionary. We instead propose to acquire it automatically via a novel multiple-pass algorithm employing multiple-sequence alignment, a technique commonly used in bioinformatics. Crucially, our method leverages latent information contained in multi-parallel corpora -- datasets that supply several verbalizations of the corresponding semantics rather than just one. We used our techniques to generate natural language versions of computer-generated mathematical proofs, with good results on both a per-component and overall-output basis. For example, in evaluations involving a dozen human judges, our system produced output whose readability and faithfulness to the semantic input rivaled that of a traditional generation system.Comment: 8 pages; to appear in the proceedings of EMNLP-200

    On past participle agreement in transitive clauses in French

    Get PDF
    This paper provides a Minimalist analysis of past participle agreement in French in transitive clauses. Our account posits that the head v of vP in such structures carries an (accusativeassigning) structural case feature which may apply (with or without concomitant agreement) to case-mark a clause-mate object, the subject of a defective complement clause, or an intermediate copy of a preposed subject in spec-CP. In structures where a goal is extracted from vP (e.g. via wh-movement) v also carries an edge feature, and may also carry a specificity feature and a set of (number and gender) agreement features. We show how these assumptions account for agreement of a participle with a preposed specific clausemate object or defective-clause subject, and for the absence of agreement with an embedded object, with the complement of an impersonal verb, and with the subject of an embedded (finite or nonfinite) CP complement. We also argue that the absence of agreement marking (in expected contexts) on the participles faitmade and laissĂŠlet in infinitive structures is essentially viral in nature. Finally, we claim that obligatory participle agreement with reflexive and reciprocal objects arises because the derivation of reflexives involves A-movement and concomitant agreement

    Towards Building a Knowledge Base of Monetary Transactions from a News Collection

    Full text link
    We address the problem of extracting structured representations of economic events from a large corpus of news articles, using a combination of natural language processing and machine learning techniques. The developed techniques allow for semi-automatic population of a financial knowledge base, which, in turn, may be used to support a range of data mining and exploration tasks. The key challenge we face in this domain is that the same event is often reported multiple times, with varying correctness of details. We address this challenge by first collecting all information pertinent to a given event from the entire corpus, then considering all possible representations of the event, and finally, using a supervised learning method, to rank these representations by the associated confidence scores. A main innovative element of our approach is that it jointly extracts and stores all attributes of the event as a single representation (quintuple). Using a purpose-built test set we demonstrate that our supervised learning approach can achieve 25% improvement in F1-score over baseline methods that consider the earliest, the latest or the most frequent reporting of the event.Comment: Proceedings of the 17th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL '17), 201

    Why swimming is just as difficult as dying for japanese learners of english

    Get PDF
    While both Japanese and English have a grammatic al form denoting the progressive, the two forms (te-iru & be+ing) interact differently with the inherent semantics of the verb to which they attach (Kindaichi, 1950; McClure, 1995; Shirai, 2000). Japanese change of state verbs are incompatible with a progressive interpretation, allowing only a resultative interpretation of V+ te-iru, while a progressive interpretation is preferred for activity predicates. English be+ing denotes a progressive interpretation regardless of the lexical semantics of the verb. The question that arises is how we can account for the fact that change of state verbs like dying can denote a progressive interpretation in English, but not in Japanese. While researchers such as Kageyama (1996) and Ogihara (1998, 1999) propose that the difference lies in the lexical semantics of the verbs themselves, others such as McClure (1995) have argued that the difference lies in the semantics of the grammatical forms, be+ing and te-iru. We present results from an experimental study of Japanese learners’ interpretation of the English progressive which provide support for McClure’s proposal. Results indicate that independent of verb type, learners had significantly more difficulty with the past progressive. We argue that knowledge of L2 semantics-syntax correspondences proceeds not on the basis of L1 lexical semantic knowledge, but on the basis of grammatical forms
    • …
    corecore