    A Machine Learning Approach For Opinion Holder Extraction In Arabic Language

    Opinion mining aims at extracting useful subjective information from reliable amounts of text. Opinion mining holder recognition is a task that has not been considered yet in Arabic Language. This task essentially requires deep understanding of clauses structures. Unfortunately, the lack of a robust, publicly available, Arabic parser further complicates the research. This paper presents a leading research for the opinion holder extraction in Arabic news independent from any lexical parsers. We investigate constructing a comprehensive feature set to compensate the lack of parsing structural outcomes. The proposed feature set is tuned from English previous works coupled with our proposed semantic field and named entities features. Our feature analysis is based on Conditional Random Fields (CRF) and semi-supervised pattern recognition techniques. Different research models are evaluated via cross-validation experiments achieving 54.03 F-measure. We publicly release our own research outcome corpus and lexicon for opinion mining community to encourage further research

    Developing conceptual glossaries for the Latin vulgate bible.

    A conceptual glossary is a textual reference work that combines the features of a thesaurus and an index verborum. In it, the word occurrences within a given text are classified, disambiguated, and indexed according to their membership of a set of conceptual (i.e. semantic) fields. Since 1994, we have been working towards building a set of conceptual glossaries for the Latin Vulgate Bible. So far, we have published a conceptual glossary to the Gospel according to John and are at present completing the analysis of the Gospel according to Mark and the minor epistles. This paper describes the background to our project and outlines the steps by which the glossaries are developed within a relational database framework

    Gujarati Word Sense Disambiguation using Genetic Algorithm

    Genetic algorithms (GAs) have widely been investigated to solve hard optimization problems, including the word sense disambiguation (WSD). This problem asks to determine which sense of a polysemous word is used in a given context. Several approaches have been investigated for WSD in English, French, German and some Indo-Aryan languages like Hindi, Marathi, Malayalam, etc. however, research on WSD in Guajarati Language is relatively limited. In this paper, an approach for Guajarati WSD using Genetic algorithm has been proposed which uses Knowledge based approach where Indo-Aryan WordNet for Guajarati is used as lexical database for WSD

    A survey on sentiment analysis in Urdu: A resource-poor language

    © 2020 Background/introduction: The dawn of the internet opened the doors to the easy and widespread sharing of information on subject matters such as products, services, events and political opinions. While the volume of studies conducted on sentiment analysis is rapidly expanding, these studies mostly address English language concerns. The primary goal of this study is to present state-of-art survey for identifying the progress and shortcomings saddling Urdu sentiment analysis and propose rectifications. Methods: We described the advancements made thus far in this area by categorising the studies along three dimensions, namely: text pre-processing lexical resources and sentiment classification. These pre-processing operations include word segmentation, text cleaning, spell checking and part-of-speech tagging. An evaluation of sophisticated lexical resources including corpuses and lexicons was carried out, and investigations were conducted on sentiment analysis constructs such as opinion words, modifiers, negations. Results and conclusions: Performance is reported for each of the reviewed study. Based on experimental results and proposals forwarded through this paper provides the groundwork for further studies on Urdu sentiment analysis

    Особливості автоматичної обробки арабської мови

    Складність арабської мови ставить перед методами обробки природної мови великі виклики і вимагає докладних досліджень. Ця стаття є першим кроком до розуміння проблем та спробою дати поштовх до пошуку їх вирішення в автоматичній обробці арабської мови.Challenges imposed by Arabic language nature push NLP to the extreme, motivating creativity and exhaustive exploitation of every single bit of already available techniques and linguistic resources. Our article is a first step to understanding problems and development of natural language processing for Arabic language

    Evaluation on knowledge extraction and machine learning in resolving Malay word ambiguity

    The involvement of linguistic professionals in resolving the ambiguity of a word within a particular context will produce a concise meaning of the words that are found in the lexical knowledge based collection. Motivated from that issue, we employed lexical knowledge and machine learning approach which includes the integration of data or/and information from the lexical knowledge based, that is Malay collections which linked to the ambiguous words. We used the most open class word and removed the stop words from the targeted sentences. Experiments have been conducted with and without lexical knowledge on 50 ambiguous words. The Word Sense Disambiguation (WSD) method is determined by machine learning, corpus based approaches namely Malay-Malay corpus and English-Malay corpus. The results show that the proposed method has improved the precision in resolving ambiguity.Keywords: ambiguity; lexical knowledge; machine learning; Malay wor