37 research outputs found

    Connecting Dream Networks Across Cultures

    Full text link
    Many species dream, yet there remain many open research questions in the study of dreams. The symbolism of dreams and their interpretation is present in cultures throughout history. Analysis of online data sources for dream interpretation using network science leads to understanding symbolism in dreams and their associated meaning. In this study, we introduce dream interpretation networks for English, Chinese and Arabic that represent different cultures from various parts of the world. We analyze communities in these networks, finding that symbols within a community are semantically related. The central nodes in communities give insight about cultures and symbols in dreams. The community structure of different networks highlights cultural similarities and differences. Interconnections between different networks are also identified by translating symbols from different languages into English. Structural correlations across networks point out relationships between cultures. Similarities between network communities are also investigated by analysis of sentiment in symbol interpretations. We find that interpretations within a community tend to have similar sentiment. Furthermore, we cluster communities based on their sentiment, yielding three main categories of positive, negative, and neutral dream symbols.Comment: 6 pages, 3 figure

    A Method to Convert Sana’ani Accent to Modern Standard Arabic

    Get PDF
    This paper presents an efficient mechanism to convert Sana’ani dialect to modern standard Arabic. The mechanism is based on morphological rulesrelated to Sana’ani dialect as well as Modern Standard Arabic. Such rules facilitate the dialect conversion to its corresponding MSA. The mechanismtokenizes the input dialect text and divides each token into stem and its affixes; such affixes can be categorized into two categories: dialect affixesand/or MSA affixes. At the same time, the stem could be dialect stem or MSA stem. Therefore, our mechanism, implemented by using a simple MSAstemmer, must pay attention to such situations. Then our dialect stemmer is applied to strip the resulting token and extract dialect affixes. At this point,the rules are applied to decide when to carry out the extraction of an affix. The experiment shows that Sana’ani dialect has three classes of distortions,which are prefixes, suffixes, and stems distortions. The algorithm normalizes such distortion based on the morphological rules. For each morphologicalrule the mechanism checks possibility of applying such a rule. That means if rule conditions be met, then the dialect affix will be replaced by itscorresponding MSA. If there is no restriction on applying the rule related to the distorted stem, then the rule can be considered as a parallel corpus of thedialect and MSA. Finally, the experiment computes the distortion ratio of MSA in Sana’ani dialect. For a Sana’ani dialect sample of 9386 words,16.29% of them have distorted suffixes, 0.70% have distorted prefixes and 2.17% contain distorted stems. These percentages are related only to theprocessed words

    Text Classification for Arabic Words Using Rep-Tree

    Get PDF
    The amount of text data mining in the world and in our life seems ever increasing and there’s no end to it. The concept (Text Data Mining) defined as the process of deriving high-quality information from text. It has been applied on different fields including: Pattern mining, opinion mining, and web mining. The concept of Text Data Mining is based around the global Stemming of different forms of Arabic words. Stemming is defined like the method of reducing inflected (or typically derived) words to their word stem, base or root kind typically a word kind. We use the REP-Tree to improve text representation. In addition, test new combinations of weighting schemes to be applied on Arabic text data for classification purposes. For processing, WEKA workbench is used. The results in the paper on data set of BBC-Arabic website also show the efficiency and accuracy of REP-TREE in Arabic text classification

    Arabic Book Retrieval using Class and Book Index Based Term Weighting

    Get PDF
    One of the most common issue in information retrieval is documents ranking. Documents ranking system collects search terms from the user and orderly retrieves documents based on the relevance. Vector space models based on TF.IDF term weighting is the most common method for this topic. In this study, we are concerned with the study of automatic retrieval of Islamic Fiqh (Law) book collection. This collection contains many books, each of which has tens to hundreds of pages. Each page of the book is treated as a document that will be ranked based on the user query. We developed class-based indexing method called inverse class frequency (ICF) and book-based indexing method inverse book frequency (IBF) for this Arabic information retrieval. Those method then been incorporated with the previous method so that it becomes TF.IDF.ICF.IBF. The term weighting method also used for feature selection due to high dimensionality of the feature space. This novel method was tested using a dataset from 13 Arabic Fiqh e-books. The experimental results showed that the proposed method have the highest precision, recall, and F-Measure than the other three methods at variations of feature selection. The best performance of this method was obtained when using best 1000 features by precision value of 76%, recall value of 74%, and F-Measure value of 75%

    Arabic morphological tools for text mining

    Get PDF
    Arabic Language has complex morphology; this led to unavailability to standard Arabic morphological analysis tools until now. In this paper, we present and evaluate existing common Arabic stemming/light stemming algorithms, we also implement and integrate Arabic morphological analysis tools into the leading open source machine learning and data mining tools, Weka and RapidMiner

    PAAD: POLITICAL ARABIC ARTICLES DATASET FOR AUTOMATIC TEXT CATEGORIZATION

    Get PDF
    Now day’s text Classification and Sentiment analysis is considered as one of the popular Natural Language Processing (NLP) tasks. This kind of technique plays significant role in human activities and has impact on the daily behaviours. Each article in different fields such as politics and business represent different opinions according to the writer tendency. A huge amount of data will be acquired through that differentiation. The capability to manage the political orientation of an online article automatically. Therefore, there is no corpus for political categorization was directed towards this task in Arabic, due to the lack of rich representative resources for training an Arabic text classifier. However, we introduce political Arabic articles dataset (PAAD) of textual data collected from newspapers, social network, general forum and ideology website. The dataset is 206 articles distributed into three categories as (Reform, Conservative and Revolutionary) that we offer to the research community on Arabic computational linguistics. We anticipate that this dataset would make a great aid for a variety of NLP tasks on Modern Standard Arabic, political text classification purposes. We present the data in raw form and excel file. Excel file will be in four types such as V1 raw data, V2 preprocessing, V3 root stemming and V4 light stemming

    Improving Arabic Stemmer: ISRI Stemmer

    Get PDF
    Stemmer is used in several types of applications such as Text Mining, Information Retrieval (IR), and Natural Language Processing (NLP). Stemmer is a step used to process text data. The main task in the stemmer is to return the word-formation to the basic word (root or stem). ISRI Stemmer is one of the Arabic stemmers contained in the NLTK package. This study improves the weakness of the ISRI stemmer in processing words consisting of two letters. From the results of the experiment, these improvements increased the stemmer yield by 7.3%

    An Evaluation of Existing Light Stemming Algorithms for Arabic Keyword Searches

    Get PDF
    The field of Information Retrieval recognizes the importance of stemming in improving retrieval effectiveness. This same tool, when applied to searches conducted in the Arabic language, increases the relevancy of documents returned and expands searches to encompass the general meaning of a word instead of the word itself. Since the Arabic language relies mainly on triconsonantal roots for verb forms and derives nouns by adding affixes, words with similar consonants are closely related in meaning. Stemming allows a search term to focus more on the meaning of a term and closely related terms and less on specific character matches. This paper discusses the strength of light stemming, the best techniques, and components for algorithmic affix-based stemmers used in keyword searching in the Arabic language

    The Role of Linguistic Feature Categories in Authorship Verification

    Get PDF
    Authorship verification is a type of authorship analysis that addresses the following problem: given a set of documents known to be written by an author, and a document of doubtful attribution to that author, the task is to decide whether that document is truly written by that author. A combination of a similarity-based method and relevant linguistic features is used to achieve high accuracy authorship verification. The method is an author-profiling approach that dispenses with negative-evidence training data, and a number of lexical, morphological, and syntactic features and feature ensembles are used to determine optimal feature use. The method-feature combination is applied to a test corpus of 31 Classical Arabic books and substantially outperforms best available baselines (with 87.1% accuracy). The varying performance of different features and feature ensembles indicate that Classical Arabic authors are less free to individualize their style lexically or morphologically than when involving syntactic structures.Middle Eastern Studie
    corecore