8,514 research outputs found
Automatic Extraction of Subcategorization from Corpora
We describe a novel technique and implemented system for constructing a
subcategorization dictionary from textual corpora. Each dictionary entry
encodes the relative frequency of occurrence of a comprehensive set of
subcategorization classes for English. An initial experiment, on a sample of 14
verbs which exhibit multiple complementation patterns, demonstrates that the
technique achieves accuracy comparable to previous approaches, which are all
limited to a highly restricted set of subcategorization classes. We also
demonstrate that a subcategorization dictionary built with the system improves
the accuracy of a parser by an appreciable amount.Comment: 8 pages; requires aclap.sty. To appear in ANLP-9
Semantic indeterminacy in object relative clauses
This article examined whether semantic indeterminacy plays a role in comprehension of complex structures such as object relative clauses. Study 1 used a gated sentence completion task to assess which alternative interpretations are dominant as the relative clause unfolds; Study 2 compared reading times in object relative clauses containing different animacy configurations to unambiguous passive controls; and Study 3 related completion data and reading data. The results showed that comprehension difficulty was modulated by animacy configuration and voice (active vs. passive). These differences were well correlated with the availability of alternative interpretations as the relative clause unfolds, as revealed by the completion data. In contrast to approaches arguing that comprehension difficulty stems from syntactic complexity, these results suggest that semantic indeterminacy is a major source of comprehension difficulty in object relative clauses. Results are consistent with constraint-based approaches to ambiguity resolution and bring new insights into previously identified sources of difficulty. (C) 2007 Elsevier Inc. All rights reserved
Rhetorical relations for information retrieval
Typically, every part in most coherent text has some plausible reason for its
presence, some function that it performs to the overall semantics of the text.
Rhetorical relations, e.g. contrast, cause, explanation, describe how the parts
of a text are linked to each other. Knowledge about this socalled discourse
structure has been applied successfully to several natural language processing
tasks. This work studies the use of rhetorical relations for Information
Retrieval (IR): Is there a correlation between certain rhetorical relations and
retrieval performance? Can knowledge about a document's rhetorical relations be
useful to IR? We present a language model modification that considers
rhetorical relations when estimating the relevance of a document to a query.
Empirical evaluation of different versions of our model on TREC settings shows
that certain rhetorical relations can benefit retrieval effectiveness notably
(> 10% in mean average precision over a state-of-the-art baseline)
Using a Probabilistic Class-Based Lexicon for Lexical Ambiguity Resolution
This paper presents the use of probabilistic class-based lexica for
disambiguation in target-word selection. Our method employs minimal but precise
contextual information for disambiguation. That is, only information provided
by the target-verb, enriched by the condensed information of a probabilistic
class-based lexicon, is used. Induction of classes and fine-tuning to verbal
arguments is done in an unsupervised manner by EM-based clustering techniques.
The method shows promising results in an evaluation on real-world translations.Comment: 7 pages, uses colacl.st
ASR error management for improving spoken language understanding
This paper addresses the problem of automatic speech recognition (ASR) error
detection and their use for improving spoken language understanding (SLU)
systems. In this study, the SLU task consists in automatically extracting, from
ASR transcriptions , semantic concepts and concept/values pairs in a e.g
touristic information system. An approach is proposed for enriching the set of
semantic labels with error specific labels and by using a recently proposed
neural approach based on word embeddings to compute well calibrated ASR
confidence measures. Experimental results are reported showing that it is
possible to decrease significantly the Concept/Value Error Rate with a state of
the art system, outperforming previously published results performance on the
same experimental data. It also shown that combining an SLU approach based on
conditional random fields with a neural encoder/decoder attention based
architecture , it is possible to effectively identifying confidence islands and
uncertain semantic output segments useful for deciding appropriate error
handling actions by the dialogue manager strategy .Comment: Interspeech 2017, Aug 2017, Stockholm, Sweden. 201
- …