288 research outputs found
Discourse relations and conjoined VPs: automated sense recognition
Sense classification of discourse relations is a sub-task of shallow discourse parsing. Discourse relations can occur both across sentences (inter-sentential) and within sentences (intra-sentential), and more than one discourse relation can hold between the same units. Using a newly available corpus of discourse-annotated intra-sentential conjoined verb phrases, we demonstrate a sequential classification system for their multi-label sense classification. We assess the importance of each feature used in the classification, the feature scope, and what is lost in moving from gold standard manual parses to the output of an off-the-shelf parser
CRPC-DB – A Discourse Bank for Portuguese
info:eu-repo/semantics/publishedVersio
An annotated corpus for the analysis of VP ellipsis
Verb Phrase Ellipsis (VPE) has been studied in great depth in theoretical linguistics, but empirical studies of VPE are rare. We extend the few previous corpus studies with an annotated corpus of VPE in all 25 sections of the Wall Street Journal corpus (WSJ) distributed with the Penn Treebank. We annotated the raw files using a stand-off annotation scheme that codes the auxiliary verb triggering the elided verb phrase, the start and end of the antecedent, the syntactic type of antecedent (VP, TV, NP, PP or AP), and the type of syntactic pattern between the source and target clauses of the VPE and its antecedent. We found 487 instances of VPE (including predicative ellipsis, antecedent-contained deletion, comparative constructions, and pseudo-gapping) plus 67 cases of related phenomena such as do so anaphora. Inter-annotator agreement was high, with a 0.97 average F-score for three annotators for one section of the WSJ. Our annotation is theory neutral, and has better coverage than earlier efforts that relied on automatic methods, e.g. simply searching the parsed version of the Penn Treebank for empty VP's achieves a high precision (0.95) but low recall (0.58) when compared with our manual annotation. The distribution of VPE source-target patterns deviates highly from the standard examples found in the theoretical linguistics literature on VPE, once more underlining the value of corpus studies. The resulting corpus will be useful for studying VPE phenomena as well as for evaluating natural language processing systems equipped with ellipsis resolution algorithms, and we propose evaluation measures for VPE detection and VPE antecedent selection. The stand-off annotation is freely available for research purposes
Prosodic Cues "and" Syntactic Disambiguation
This work was supported in part by a Summer Graduate Research Fellowship in Cognitive Science provided by the Center for Cognitive Science at The Ohio State University
Learning Sentence-internal Temporal Relations
In this paper we propose a data intensive approach for inferring
sentence-internal temporal relations. Temporal inference is relevant for
practical NLP applications which either extract or synthesize temporal
information (e.g., summarisation, question answering). Our method bypasses the
need for manual coding by exploiting the presence of markers like after", which
overtly signal a temporal relation. We first show that models trained on main
and subordinate clauses connected with a temporal marker achieve good
performance on a pseudo-disambiguation task simulating temporal inference
(during testing the temporal marker is treated as unseen and the models must
select the right marker from a set of possible candidates). Secondly, we assess
whether the proposed approach holds promise for the semi-automatic creation of
temporal annotations. Specifically, we use a model trained on noisy and
approximate data (i.e., main and subordinate clauses) to predict
intra-sentential relations present in TimeBank, a corpus annotated rich
temporal information. Our experiments compare and contrast several
probabilistic models differing in their feature space, linguistic assumptions
and data requirements. We evaluate performance against gold standard corpora
and also against human subjects
The Bracketing Guidelines for the Penn Chinese Treebank (3.0)
This document describes the bracketing guidelines for the Penn Chinese Treebank Project. The goal of the project is the creation of a 100-thousand-word corpus of Mandarin Chinese text with syntactic bracketing. The Chinese Treebank has been released via the Linguistic Data Consortium (LDC) and is available to the public.
This document can be divided into six parts. Section I discusses six fundamental grammatical relations that are represented in the Treebank. Section II introduces the bracketing tagset, which includes 23 syntactic labels, 26 functional tags, and 7 tags for null elements. Section III, IV and V specify our annotation schemata for noun phrases, verbs phrases, and other minor categories, respectively. Section VI describes our treatment for empty categories, such as trace for syntactic movement, PRO for control, and pro for argument drop. Section VII and VIII cover the coordinated clauses and subordinating clauses. Section IX, X and XI specify the way we handle punctuation, ambiguity, and some problematic cases
- …