Search CORE

3,253 research outputs found

A Probabilistic Model of Syntactic and Semantic Acquisition from Child-Directed Utterances and their Meanings

Author: Goldwater Sharon
Kwiatkowski Tom
Steedman Mark
Zettlemoyer Luke
Publication venue
Publication date: 01/04/2012
Field of study

Inducing a Semantically Annotated Lexicon via EM-Based Clustering

Author: Beil Franz
Carroll Glenn
Prescher Detlef
Riezler Stefan
Rooth Mats
Publication venue
Publication date: 01/01/1999
Field of study

We present a technique for automatic induction of slot annotations for subcategorization frames, based on induction of hidden classes in the EM framework of statistical estimation. The models are empirically evalutated by a general decision test. Induction of slot labeling for subcategorization frames is accomplished by a further application of EM, and applied experimentally on frame observations derived from parsing large corpora. We outline an interpretation of the learned representations as theoretical-linguistic decompositional lexical entries.Comment: 8 pages, uses colacl.sty. Proceedings of the 37th Annual Meeting of the ACL, 199

arXiv.org e-Print Archive

CiteSeerX

A Machine Learning Approach For Opinion Holder Extraction In Arabic Language

Author: AbdelRahman Samir
Elarnaoty Mohamed
Fahmy Aly
Publication venue: 'Academy and Industry Research Collaboration Center (AIRCC)'
Publication date: 06/04/2012
Field of study

Opinion mining aims at extracting useful subjective information from reliable amounts of text. Opinion mining holder recognition is a task that has not been considered yet in Arabic Language. This task essentially requires deep understanding of clauses structures. Unfortunately, the lack of a robust, publicly available, Arabic parser further complicates the research. This paper presents a leading research for the opinion holder extraction in Arabic news independent from any lexical parsers. We investigate constructing a comprehensive feature set to compensate the lack of parsing structural outcomes. The proposed feature set is tuned from English previous works coupled with our proposed semantic field and named entities features. Our feature analysis is based on Conditional Random Fields (CRF) and semi-supervised pattern recognition techniques. Different research models are evaluated via cross-validation experiments achieving 54.03 F-measure. We publicly release our own research outcome corpus and lexicon for opinion mining community to encourage further research

arXiv.org e-Print Archive

Crossref

A comparison of parsing technologies for the biomedical domain

Author: Grover Claire
Lapata Mirella
Lascarides Alex
Publication venue: 'Cambridge University Press (CUP)'
Publication date: 01/01/2005
Field of study

This paper reports on a number of experiments which are designed to investigate the extent to which current nlp resources are able to syntactically and semantically analyse biomedical text. We address two tasks: parsing a real corpus with a hand-built widecoverage grammar, producing both syntactic analyses and logical forms; and automatically computing the interpretation of compound nouns where the head is a nominalisation (e.g., hospital arrival means an arrival at hospital, while patient arrival means an arrival of a patient). For the former task we demonstrate that exible and yet constrained `preprocessing ' techniques are crucial to success: these enable us to use part-of-speech tags to overcome inadequate lexical coverage, and to `package up' complex technical expressions prior to parsing so that they are blocked from creating misleading amounts of syntactic complexity. We argue that the xml-processing paradigm is ideally suited for automatically preparing the corpus for parsing. For the latter task, we compute interpretations of the compounds by exploiting surface cues and meaning paraphrases, which in turn are extracted from the parsed corpus. This provides an empirical setting in which we can compare the utility of a comparatively deep parser vs. a shallow one, exploring the trade-o between resolving attachment ambiguities on the one hand and generating errors in the parses on the other. We demonstrate that a model of the meaning of compound nominalisations is achievable with the aid of current broad-coverage parsers

CiteSeerX

Crossref

Edinburgh Research Explorer

Can Subcategorisation Probabilities Help a Statistical Parser?

Author: Briscoe Ted
Carroll John
Minnen Guido
Publication venue
Publication date: 01/01/1998
Field of study

Research into the automatic acquisition of lexical information from corpora is starting to produce large-scale computational lexicons containing data on the relative frequencies of subcategorisation alternatives for individual verbal predicates. However, the empirical question of whether this type of frequency information can in practice improve the accuracy of a statistical parser has not yet been answered. In this paper we describe an experiment with a wide-coverage statistical grammar and parser for English and subcategorisation frequencies acquired from ten million words of text which shows that this information can significantly improve parse accuracy.Comment: 9 pages, uses colacl.st

arXiv.org e-Print Archive

CiteSeerX

Sussex Research Online