9,105 research outputs found
A finite-state approach to arabic broken noun morphology
In this paper, a finite-state computational approach to Arabic broken plural noun morphology is introduced. The paper considers the derivational aspect of the approach, and how generalizations about dependencies in the broken plural noun derivational system of Arabic are captured and handled computationally in this finite-state approach. The approach will be implemented using Xerox finite-state tool
Introduction to the special issue on cross-language algorithms and applications
With the increasingly global nature of our everyday interactions, the need for multilingual technologies to support efficient and efective information access and communication cannot be overemphasized. Computational modeling of language has been the focus of
Natural Language Processing, a subdiscipline of Artificial Intelligence. One of the current challenges for this discipline is to design methodologies and algorithms that are cross-language in order to create multilingual technologies rapidly. The goal of this JAIR special
issue on Cross-Language Algorithms and Applications (CLAA) is to present leading research in this area, with emphasis on developing unifying themes that could lead to the development of the science of multi- and cross-lingualism. In this introduction, we provide the reader with the motivation for this special issue and summarize the contributions of the papers that have been included. The selected papers cover a broad range of cross-lingual technologies including machine translation, domain and language adaptation for sentiment
analysis, cross-language lexical resources, dependency parsing, information retrieval and knowledge representation. We anticipate that this special issue will serve as an invaluable resource for researchers interested in topics of cross-lingual natural language processing.Postprint (published version
LDC Arabic Treebanks and Associated Corpora: Data Divisions Manual
The Linguistic Data Consortium (LDC) has developed hundreds of data corpora
for natural language processing (NLP) research. Among these are a number of
annotated treebank corpora for Arabic. Typically, these corpora consist of a
single collection of annotated documents. NLP research, however, usually
requires multiple data sets for the purposes of training models, developing
techniques, and final evaluation. Therefore it becomes necessary to divide the
corpora used into the required data sets (divisions). This document details a
set of rules that have been defined to enable consistent divisions for old and
new Arabic treebanks (ATB) and related corpora.Comment: 14 pages; one cove
A Machine Learning Approach For Opinion Holder Extraction In Arabic Language
Opinion mining aims at extracting useful subjective information from reliable
amounts of text. Opinion mining holder recognition is a task that has not been
considered yet in Arabic Language. This task essentially requires deep
understanding of clauses structures. Unfortunately, the lack of a robust,
publicly available, Arabic parser further complicates the research. This paper
presents a leading research for the opinion holder extraction in Arabic news
independent from any lexical parsers. We investigate constructing a
comprehensive feature set to compensate the lack of parsing structural
outcomes. The proposed feature set is tuned from English previous works coupled
with our proposed semantic field and named entities features. Our feature
analysis is based on Conditional Random Fields (CRF) and semi-supervised
pattern recognition techniques. Different research models are evaluated via
cross-validation experiments achieving 54.03 F-measure. We publicly release our
own research outcome corpus and lexicon for opinion mining community to
encourage further research
- …