129,454 research outputs found
EPIE Dataset: A Corpus For Possible Idiomatic Expressions
Idiomatic expressions have always been a bottleneck for language
comprehension and natural language understanding, specifically for tasks like
Machine Translation(MT). MT systems predominantly produce literal translations
of idiomatic expressions as they do not exhibit generic and linguistically
deterministic patterns which can be exploited for comprehension of the
non-compositional meaning of the expressions. These expressions occur in
parallel corpora used for training, but due to the comparatively high
occurrences of the constituent words of idiomatic expressions in literal
context, the idiomatic meaning gets overpowered by the compositional meaning of
the expression. State of the art Metaphor Detection Systems are able to detect
non-compositional usage at word level but miss out on idiosyncratic phrasal
idiomatic expressions. This creates a dire need for a dataset with a wider
coverage and higher occurrence of commonly occurring idiomatic expressions, the
spans of which can be used for Metaphor Detection. With this in mind, we
present our English Possible Idiomatic Expressions(EPIE) corpus containing
25206 sentences labelled with lexical instances of 717 idiomatic expressions.
These spans also cover literal usages for the given set of idiomatic
expressions. We also present the utility of our dataset by using it to train a
sequence labelling module and testing on three independent datasets with high
accuracy, precision and recall scores
Learning Fault-tolerant Speech Parsing with SCREEN
This paper describes a new approach and a system SCREEN for fault-tolerant
speech parsing. SCREEEN stands for Symbolic Connectionist Robust EnterprisE for
Natural language. Speech parsing describes the syntactic and semantic analysis
of spontaneous spoken language. The general approach is based on incremental
immediate flat analysis, learning of syntactic and semantic speech parsing,
parallel integration of current hypotheses, and the consideration of various
forms of speech related errors. The goal for this approach is to explore the
parallel interactions between various knowledge sources for learning
incremental fault-tolerant speech parsing. This approach is examined in a
system SCREEN using various hybrid connectionist techniques. Hybrid
connectionist techniques are examined because of their promising properties of
inherent fault tolerance, learning, gradedness and parallel constraint
integration. The input for SCREEN is hypotheses about recognized words of a
spoken utterance potentially analyzed by a speech system, the output is
hypotheses about the flat syntactic and semantic analysis of the utterance. In
this paper we focus on the general approach, the overall architecture, and
examples for learning flat syntactic speech parsing. Different from most other
speech language architectures SCREEN emphasizes an interactive rather than an
autonomous position, learning rather than encoding, flat analysis rather than
in-depth analysis, and fault-tolerant processing of phonetic, syntactic and
semantic knowledge.Comment: 6 pages, postscript, compressed, uuencoded to appear in Proceedings
of AAAI 9
- …