8,514 research outputs found

    Automatic Extraction of Subcategorization from Corpora

    Full text link
    We describe a novel technique and implemented system for constructing a subcategorization dictionary from textual corpora. Each dictionary entry encodes the relative frequency of occurrence of a comprehensive set of subcategorization classes for English. An initial experiment, on a sample of 14 verbs which exhibit multiple complementation patterns, demonstrates that the technique achieves accuracy comparable to previous approaches, which are all limited to a highly restricted set of subcategorization classes. We also demonstrate that a subcategorization dictionary built with the system improves the accuracy of a parser by an appreciable amount.Comment: 8 pages; requires aclap.sty. To appear in ANLP-9

    Semantic indeterminacy in object relative clauses

    Get PDF
    This article examined whether semantic indeterminacy plays a role in comprehension of complex structures such as object relative clauses. Study 1 used a gated sentence completion task to assess which alternative interpretations are dominant as the relative clause unfolds; Study 2 compared reading times in object relative clauses containing different animacy configurations to unambiguous passive controls; and Study 3 related completion data and reading data. The results showed that comprehension difficulty was modulated by animacy configuration and voice (active vs. passive). These differences were well correlated with the availability of alternative interpretations as the relative clause unfolds, as revealed by the completion data. In contrast to approaches arguing that comprehension difficulty stems from syntactic complexity, these results suggest that semantic indeterminacy is a major source of comprehension difficulty in object relative clauses. Results are consistent with constraint-based approaches to ambiguity resolution and bring new insights into previously identified sources of difficulty. (C) 2007 Elsevier Inc. All rights reserved

    Rhetorical relations for information retrieval

    Full text link
    Typically, every part in most coherent text has some plausible reason for its presence, some function that it performs to the overall semantics of the text. Rhetorical relations, e.g. contrast, cause, explanation, describe how the parts of a text are linked to each other. Knowledge about this socalled discourse structure has been applied successfully to several natural language processing tasks. This work studies the use of rhetorical relations for Information Retrieval (IR): Is there a correlation between certain rhetorical relations and retrieval performance? Can knowledge about a document's rhetorical relations be useful to IR? We present a language model modification that considers rhetorical relations when estimating the relevance of a document to a query. Empirical evaluation of different versions of our model on TREC settings shows that certain rhetorical relations can benefit retrieval effectiveness notably (> 10% in mean average precision over a state-of-the-art baseline)

    Using a Probabilistic Class-Based Lexicon for Lexical Ambiguity Resolution

    Full text link
    This paper presents the use of probabilistic class-based lexica for disambiguation in target-word selection. Our method employs minimal but precise contextual information for disambiguation. That is, only information provided by the target-verb, enriched by the condensed information of a probabilistic class-based lexicon, is used. Induction of classes and fine-tuning to verbal arguments is done in an unsupervised manner by EM-based clustering techniques. The method shows promising results in an evaluation on real-world translations.Comment: 7 pages, uses colacl.st

    ASR error management for improving spoken language understanding

    Get PDF
    This paper addresses the problem of automatic speech recognition (ASR) error detection and their use for improving spoken language understanding (SLU) systems. In this study, the SLU task consists in automatically extracting, from ASR transcriptions , semantic concepts and concept/values pairs in a e.g touristic information system. An approach is proposed for enriching the set of semantic labels with error specific labels and by using a recently proposed neural approach based on word embeddings to compute well calibrated ASR confidence measures. Experimental results are reported showing that it is possible to decrease significantly the Concept/Value Error Rate with a state of the art system, outperforming previously published results performance on the same experimental data. It also shown that combining an SLU approach based on conditional random fields with a neural encoder/decoder attention based architecture , it is possible to effectively identifying confidence islands and uncertain semantic output segments useful for deciding appropriate error handling actions by the dialogue manager strategy .Comment: Interspeech 2017, Aug 2017, Stockholm, Sweden. 201
    • …
    corecore