18 research outputs found

    A Corpus and Model Integrating Multiword Expressions and Supersenses

    No full text
    <p>This paper introduces a task of identifying and semantically classifying lexical expressions in running text. We investigate the online reviews genre, adding semantic supersense annotations to a 55,000 word English corpus that was previously annotated for multiword expressions. The noun and verb supersenses apply to full lexical expressions, whether single- or multiword. We then present a sequence tagging model that jointly infers lexical expressions and their supersenses. Results show that even with our relatively small training corpus in a noisy domain, the joint task can be performed to attain 70% class labeling F<sub>1</sub>.</p

    Leveraging Heterogeneous Data Sources for Relational Semantic Parsing

    No full text
    <p>A number of semantic annotation efforts have produced a variety of annotated corpora, capturing various aspects of semantic knowledge in different formalisms. Due to to the cost of these annotation efforts and the relatively small amount of semantically annotated corpora, we argue it is advantageous to be able to leverage as much annotated data as possible. This work presents a preliminary exploration of the opportunities and challenges of learning semantic parsers from heterogeneous semantic annotation sources. We primarily focus on two semantic resources, FrameNet and PropBank, with the goal of improving frame-semantic parsing. Our analysis of the two data sources highlights the benefits that can be reaped by combining information across them.</p

    Probabilistic Frame-Semantic Parsing

    No full text
    This paper contributes a formalization of frame-semantic parsing as a structure prediction problem and describes an implemented parser that transforms an English sentence into a frame-semantic representation. It finds words that evoke FrameNet frames, selects frames for them, and locates the arguments for each frame. The system uses two featurebased, discriminative probabilistic (log-linear) models, one with latent variables to permit disambiguation of new predicate words. The parser is demonstrated to significantly outperform previously published results.</p

    Coarse Lexical Semantic Annotation with Supersenses: An Arabic Case Study

    No full text
    “Lightweight” semantic annotation of text calls for a simple representation, ideally without requiring a semantic lexicon to achieve good coverage in the language and domain. In this paper, we repurpose WordNet’s supersense tags for annotation, developing specific guidelines for nominal expressions and applying them to Arabic Wikipedia articles in four topical domains. The resulting corpus has high coverage and was completed quickly with reasonable inter-annotator agreement.</p

    SEMAFOR: Frame Argument Resolution with Log-Linear Models

    No full text
    This paper describes the SEMAFOR system’s performance in the SemEval 2010 task on linking events and their participants in discourse. Our entry is based upon SEMAFOR 1.0 (Das et al., 2010a), a frame-semantic probabilistic parser built from log-linear models. The extended system models null instantiations, including non-local argument reference. Performance is evaluated on the task data with and without gold-standard overt arguments. In both settings, it fares the best of the submitted systems with respect to recall and F1.</p

    Simplified Dependency Annotations with GFL-Web

    No full text
    <p>We present GFL-Web, a web-based interface for syntactic dependency annotation with the lightweight FUDG/GFL formalism. Syntactic attachments are specified in GFL notation and visualized as a graph. A one-day pilot of this work- flow with 26 annotators established that even novices were, with a bit of training, able to rapidly annotate the syntax of English Twitter messages.</p> <p>The open-source tool is easily installed and configured; it is available at: https://github.com/ Mordeaux/gfl_web</p

    Supersense Tagging for Arabic: the MT-in-the-Middle Attack

    No full text
    We consider the task of tagging Arabic nouns with WordNet supersenses. Three approaches are evaluated. The first uses an expertcrafted but limited-coverage lexicon, Arabic WordNet, and heuristics. The second uses unsupervised sequence modeling. The third and most successful approach uses machine translation to translate the Arabic into English, which is automatically tagged with English supersenses, the results of which are then projected back into Arabic. Analysis shows gains and remaining obstacles in four Wikipedia topical domains.</p

    Recall-Oriented Learning of Named Entities in Arabic Wikipedia

    No full text
    We consider the problem of NER in Arabic Wikipedia, a semisupervised domain adaptation setting for which we have no labeled training data in the target domain. To facilitate evaluation, we obtain annotations for articles in four topical groups, allowing annotators to identify domain-specific entity types in addition to standard categories. Standard supervised learning on newswire text leads to poor target-domain recall. We train a sequence model and show that a simple modification to the online learner—a loss function encouraging it to “arrogantly” favor recall over precision— substantially improves recall and F1. We then adapt our model with self-training on unlabeled target-domain data; enforcing the same recall-oriented bias in the selftraining stage yields marginal gains.1</p

    Augmenting English Adjective Senses with Supersenses

    No full text
    <p>We develop a supersense taxonomy for adjectives, based on that of GermaNet, and apply it to English adjectives in WordNet using human annotation and supervised classification. Results show that accuracy for automatic adjective type classification is high, but synsets are considerably more difficult to classify, even for trained human annotators. We release the manually annotated data, the classifier, and the induced supersense labeling of 12,304 WordNet adjective synsets</p

    Frame-Semantic Parsing

    No full text
    <p>© 2014 Association for Computational Linguistics</p> <p>The version of record is available online at http://dx.doi.org/10.1162/COLI_a_00163</p
    corecore