Search CORE

37 research outputs found

Connecting Dream Networks Across Cultures

Author: Menczer Filippo
Varol Onur
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2014
Field of study

Many species dream, yet there remain many open research questions in the study of dreams. The symbolism of dreams and their interpretation is present in cultures throughout history. Analysis of online data sources for dream interpretation using network science leads to understanding symbolism in dreams and their associated meaning. In this study, we introduce dream interpretation networks for English, Chinese and Arabic that represent different cultures from various parts of the world. We analyze communities in these networks, finding that symbols within a community are semantically related. The central nodes in communities give insight about cultures and symbols in dreams. The community structure of different networks highlights cultural similarities and differences. Interconnections between different networks are also identified by translating symbols from different languages into English. Structural correlations across networks point out relationships between cultures. Similarities between network communities are also investigated by analysis of sentiment in symbol interpretations. We find that interpretations within a community tend to have similar sentiment. Furthermore, we cluster communities based on their sentiment, yielding three main categories of positive, negative, and neutral dream symbols.Comment: 6 pages, 3 figure

arXiv.org e-Print Archive

Crossref

A Method to Convert Sana’ani Accent to Modern Standard Arabic

Author: Al-Gaphari G. H.
Al-Yadoumi M.
Publication venue: Regional Information Center for Science & Technology
Publication date: 16/07/2012
Field of study

This paper presents an efficient mechanism to convert Sana’ani dialect to modern standard Arabic. The mechanism is based on morphological rulesrelated to Sana’ani dialect as well as Modern Standard Arabic. Such rules facilitate the dialect conversion to its corresponding MSA. The mechanismtokenizes the input dialect text and divides each token into stem and its affixes; such affixes can be categorized into two categories: dialect affixesand/or MSA affixes. At the same time, the stem could be dialect stem or MSA stem. Therefore, our mechanism, implemented by using a simple MSAstemmer, must pay attention to such situations. Then our dialect stemmer is applied to strip the resulting token and extract dialect affixes. At this point,the rules are applied to decide when to carry out the extraction of an affix. The experiment shows that Sana’ani dialect has three classes of distortions,which are prefixes, suffixes, and stems distortions. The algorithm normalizes such distortion based on the morphological rules. For each morphologicalrule the mechanism checks possibility of applying such a rule. That means if rule conditions be met, then the dialect affix will be replaced by itscorresponding MSA. If there is no restriction on applying the rule related to the distorted stem, then the rule can be considered as a parallel corpus of thedialect and MSA. Finally, the experiment computes the distortion ratio of MSA in Sana’ani dialect. For a Sana’ani dialect sample of 9386 words,16.29% of them have distorted suffixes, 0.70% have distorted prefixes and 2.17% contain distorted stems. These percentages are related only to theprocessed words

International Journal of Information Science and Management (IJISM)

Text Classification for Arabic Words Using Rep-Tree

Author: Ashour Wesam M.
Naji Hamza A.
Publication venue: 103-110
Publication date: 01/01/2016
Field of study

The amount of text data mining in the world and in our life seems ever increasing and there’s no end to it. The concept (Text Data Mining) defined as the process of deriving high-quality information from text. It has been applied on different fields including: Pattern mining, opinion mining, and web mining. The concept of Text Data Mining is based around the global Stemming of different forms of Arabic words. Stemming is defined like the method of reducing inflected (or typically derived) words to their word stem, base or root kind typically a word kind. We use the REP-Tree to improve text representation. In addition, test new combinations of weighting schemes to be applied on Arabic text data for classification purposes. For processing, WEKA workbench is used. The results in the paper on data set of BBC-Arabic website also show the efficiency and accuracy of REP-TREE in Arabic text classification

Institutional Repository of the Islamic University of Gaza

Arabic Book Retrieval using Class and Book Index Based Term Weighting

Author: Arifin Agus Zainal
Fauzi M. Ali
Yuniarti Anny
Publication venue: 'Institute of Advanced Engineering and Science'
Publication date: 01/12/2017
Field of study

One of the most common issue in information retrieval is documents ranking. Documents ranking system collects search terms from the user and orderly retrieves documents based on the relevance. Vector space models based on TF.IDF term weighting is the most common method for this topic. In this study, we are concerned with the study of automatic retrieval of Islamic Fiqh (Law) book collection. This collection contains many books, each of which has tens to hundreds of pages. Each page of the book is treated as a document that will be ranked based on the user query. We developed class-based indexing method called inverse class frequency (ICF) and book-based indexing method inverse book frequency (IBF) for this Arabic information retrieval. Those method then been incorporated with the previous method so that it becomes TF.IDF.ICF.IBF. The term weighting method also used for feature selection due to high dimensionality of the feature space. This novel method was tested using a dataset from 13 Arabic Fiqh e-books. The experimental results showed that the proposed method have the highest precision, recall, and F-Measure than the other three methods at variations of feature selection. The best performance of this method was obtained when using best 1000 features by precision value of 76%, recall value of 74%, and F-Measure value of 75%

Crossref

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

Institute of Advanced Engineering and Science

Arabic morphological tools for text mining

Author: Ashour Wesam M.
Saad Motaz K
Publication venue
Publication date: 01/01/2010
Field of study

Arabic Language has complex morphology; this led to unavailability to standard Arabic morphological analysis tools until now. In this paper, we present and evaluate existing common Arabic stemming/light stemming algorithms, we also implement and integrate Arabic morphological analysis tools into the leading open source machine learning and data mining tools, Weka and RapidMiner

Institutional Repository of the Islamic University of Gaza

PAAD: POLITICAL ARABIC ARTICLES DATASET FOR AUTOMATIC TEXT CATEGORIZATION

Author: Hamed Abd Dhafar
R. Abbas Ayad
T. Sadiq Ahmed
Publication venue: University of Information and Technology Communications
Publication date: 30/06/2020
Field of study

Now day’s text Classification and Sentiment analysis is considered as one of the popular Natural Language Processing (NLP) tasks. This kind of technique plays significant role in human activities and has impact on the daily behaviours. Each article in different fields such as politics and business represent different opinions according to the writer tendency. A huge amount of data will be acquired through that differentiation. The capability to manage the political orientation of an online article automatically. Therefore, there is no corpus for political categorization was directed towards this task in Arabic, due to the lack of rich representative resources for training an Arabic text classifier. However, we introduce political Arabic articles dataset (PAAD) of textual data collected from newspapers, social network, general forum and ideology website. The dataset is 206 articles distributed into three categories as (Reform, Conservative and Revolutionary) that we offer to the research community on Arabic computational linguistics. We anticipate that this dataset would make a great aid for a variety of NLP tasks on Modern Standard Arabic, political text classification purposes. We present the data in raw form and excel file. Excel file will be in four types such as V1 raw data, V2 preprocessing, V3 root stemming and V4 light stemming

Iraqi Journal for Computers and Informatics

Improving Arabic Stemmer: ISRI Stemmer

Author: Darmalaksana Wahyudin
Huda Arief Fatchul
Kurahman Opik Taupik
Syarief Mochamad Gilang
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 03/02/2020
Field of study

Stemmer is used in several types of applications such as Text Mining, Information Retrieval (IR), and Natural Language Processing (NLP). Stemmer is a step used to process text data. The main task in the stemmer is to return the word-formation to the basic word (root or stem). ISRI Stemmer is one of the Arabic stemmers contained in the NLTK package. This study improves the weakness of the ISRI stemmer in processing words consisting of two letters. From the results of the experiment, these improvements increased the stemmer yield by 7.3%

Crossref

Digital Library UIN (Universitas Islam Negeri) Sunan Gunung Djati Bandung

An Evaluation of Existing Light Stemming Algorithms for Arabic Keyword Searches

Author: Rogerson Brittany E.
Publication venue: University of North Carolina at Chapel Hill
Publication date: 01/01/2008
Field of study

The field of Information Retrieval recognizes the importance of stemming in improving retrieval effectiveness. This same tool, when applied to searches conducted in the Arabic language, increases the relevancy of documents returned and expands searches to encompass the general meaning of a word instead of the word itself. Since the Arabic language relies mainly on triconsonantal roots for verb forms and derives nouns by adding affixes, words with similar consonants are closely related in meaning. Stemming allows a search term to focus more on the meaning of a term and closely related terms and less on specific character matches. This paper discusses the strength of light stemming, the best techniques, and components for algorithmic affix-based stemmers used in keyword searching in the Arabic language

Carolina Digital Repository

The Role of Linguistic Feature Categories in Authorship Verification

Author: Ahmed H.I.A.A.
Publication venue: 'Elsevier BV'
Publication date: 15/11/2018
Field of study

Authorship verification is a type of authorship analysis that addresses the following problem: given a set of documents known to be written by an author, and a document of doubtful attribution to that author, the task is to decide whether that document is truly written by that author. A combination of a similarity-based method and relevant linguistic features is used to achieve high accuracy authorship verification. The method is an author-profiling approach that dispenses with negative-evidence training data, and a number of lexical, morphological, and syntactic features and feature ensembles are used to determine optimal feature use. The method-feature combination is applied to a test corpus of 31 Classical Arabic books and substantially outperforms best available baselines (with 87.1% accuracy). The varying performance of different features and feature ensembles indicate that Classical Arabic authors are less free to individualize their style lexically or morphologically than when involving syntactic structures.Middle Eastern Studie

Leiden University Scholary Publications