9,105 research outputs found

    A finite-state approach to arabic broken noun morphology

    Get PDF
    In this paper, a finite-state computational approach to Arabic broken plural noun morphology is introduced. The paper considers the derivational aspect of the approach, and how generalizations about dependencies in the broken plural noun derivational system of Arabic are captured and handled computationally in this finite-state approach. The approach will be implemented using Xerox finite-state tool

    Introduction to the special issue on cross-language algorithms and applications

    Get PDF
    With the increasingly global nature of our everyday interactions, the need for multilingual technologies to support efficient and efective information access and communication cannot be overemphasized. Computational modeling of language has been the focus of Natural Language Processing, a subdiscipline of Artificial Intelligence. One of the current challenges for this discipline is to design methodologies and algorithms that are cross-language in order to create multilingual technologies rapidly. The goal of this JAIR special issue on Cross-Language Algorithms and Applications (CLAA) is to present leading research in this area, with emphasis on developing unifying themes that could lead to the development of the science of multi- and cross-lingualism. In this introduction, we provide the reader with the motivation for this special issue and summarize the contributions of the papers that have been included. The selected papers cover a broad range of cross-lingual technologies including machine translation, domain and language adaptation for sentiment analysis, cross-language lexical resources, dependency parsing, information retrieval and knowledge representation. We anticipate that this special issue will serve as an invaluable resource for researchers interested in topics of cross-lingual natural language processing.Postprint (published version

    LDC Arabic Treebanks and Associated Corpora: Data Divisions Manual

    Get PDF
    The Linguistic Data Consortium (LDC) has developed hundreds of data corpora for natural language processing (NLP) research. Among these are a number of annotated treebank corpora for Arabic. Typically, these corpora consist of a single collection of annotated documents. NLP research, however, usually requires multiple data sets for the purposes of training models, developing techniques, and final evaluation. Therefore it becomes necessary to divide the corpora used into the required data sets (divisions). This document details a set of rules that have been defined to enable consistent divisions for old and new Arabic treebanks (ATB) and related corpora.Comment: 14 pages; one cove

    A Machine Learning Approach For Opinion Holder Extraction In Arabic Language

    Full text link
    Opinion mining aims at extracting useful subjective information from reliable amounts of text. Opinion mining holder recognition is a task that has not been considered yet in Arabic Language. This task essentially requires deep understanding of clauses structures. Unfortunately, the lack of a robust, publicly available, Arabic parser further complicates the research. This paper presents a leading research for the opinion holder extraction in Arabic news independent from any lexical parsers. We investigate constructing a comprehensive feature set to compensate the lack of parsing structural outcomes. The proposed feature set is tuned from English previous works coupled with our proposed semantic field and named entities features. Our feature analysis is based on Conditional Random Fields (CRF) and semi-supervised pattern recognition techniques. Different research models are evaluated via cross-validation experiments achieving 54.03 F-measure. We publicly release our own research outcome corpus and lexicon for opinion mining community to encourage further research
    corecore