3,168 research outputs found
Natural Language Processing at the School of Information Studies for Africa
The lack of persons trained in computational linguistic methods is a severe obstacle to making the Internet and computers accessible to people all over the world in their own languages.
The paper discusses the experiences of designing and teaching an introductory course in Natural Language Processing to graduate computer science students at Addis Ababa University, Ethiopia, in order to initiate the education of computational linguists in the Horn of Africa region
Anaphora and Discourse Structure
We argue in this paper that many common adverbial phrases generally taken to
signal a discourse relation between syntactically connected units within
discourse structure, instead work anaphorically to contribute relational
meaning, with only indirect dependence on discourse structure. This allows a
simpler discourse structure to provide scaffolding for compositional semantics,
and reveals multiple ways in which the relational meaning conveyed by adverbial
connectives can interact with that associated with discourse structure. We
conclude by sketching out a lexicalised grammar for discourse that facilitates
discourse interpretation as a product of compositional rules, anaphor
resolution and inference.Comment: 45 pages, 17 figures. Revised resubmission to Computational
Linguistic
Learning Dictionaries for Named Entity Recognition using Minimal Supervision
This paper describes an approach for automatic construction of dictionaries
for Named Entity Recognition (NER) using large amounts of unlabeled data and a
few seed examples. We use Canonical Correlation Analysis (CCA) to obtain lower
dimensional embeddings (representations) for candidate phrases and classify
these phrases using a small number of labeled examples. Our method achieves
16.5% and 11.3% F-1 score improvement over co-training on disease and virus NER
respectively. We also show that by adding candidate phrase embeddings as
features in a sequence tagger gives better performance compared to using word
embeddings.Comment: In 14th Conference of the European Chapter of the Association for
Computational Linguistic, 201
Detecting Sockpuppets in Deceptive Opinion Spam
This paper explores the problem of sockpuppet detection in deceptive opinion
spam using authorship attribution and verification approaches. Two methods are
explored. The first is a feature subsampling scheme that uses the KL-Divergence
on stylistic language models of an author to find discriminative features. The
second is a transduction scheme, spy induction that leverages the diversity of
authors in the unlabeled test set by sending a set of spies (positive samples)
from the training set to retrieve hidden samples in the unlabeled test set
using nearest and farthest neighbors. Experiments using ground truth sockpuppet
data show the effectiveness of the proposed schemes.Comment: 18 pages, Accepted at CICLing 2017, 18th International Conference on
Intelligent Text Processing and Computational Linguistic
- …