6 research outputs found

    PAAD: POLITICAL ARABIC ARTICLES DATASET FOR AUTOMATIC TEXT CATEGORIZATION

    Get PDF
    Now day’s text Classification and Sentiment analysis is considered as one of the popular Natural Language Processing (NLP) tasks. This kind of technique plays significant role in human activities and has impact on the daily behaviours. Each article in different fields such as politics and business represent different opinions according to the writer tendency. A huge amount of data will be acquired through that differentiation. The capability to manage the political orientation of an online article automatically. Therefore, there is no corpus for political categorization was directed towards this task in Arabic, due to the lack of rich representative resources for training an Arabic text classifier. However, we introduce political Arabic articles dataset (PAAD) of textual data collected from newspapers, social network, general forum and ideology website. The dataset is 206 articles distributed into three categories as (Reform, Conservative and Revolutionary) that we offer to the research community on Arabic computational linguistics. We anticipate that this dataset would make a great aid for a variety of NLP tasks on Modern Standard Arabic, political text classification purposes. We present the data in raw form and excel file. Excel file will be in four types such as V1 raw data, V2 preprocessing, V3 root stemming and V4 light stemming

    The effect of gamma value on support vector machine performance with different kernels

    Get PDF
    Currently, the support vector machine (SVM) regarded as one of supervised machine learning algorithm that provides analysis of data for classification and regression. This technique is implemented in many fields such as bioinformatics, face recognition, text and hypertext categorization, generalized predictive control and many other different areas. The performance of SVM is affected by some parameters, which are used in the training phase, and the settings of parameters can have a profound impact on the resulting engine’s implementation. This paper investigated the SVM performance based on value of gamma parameter with used kernels. It studied the impact of gamma value on (SVM) efficiency classifier using different kernels on various datasets descriptions. SVM classifier has been implemented by using Python. The kernel functions that have been investigated are polynomials, radial based function (RBF) and sigmoid. UC irvine machine learning repository is the source of all the used datasets. Generally, the results show uneven effect on the classification accuracy of three kernels on used datasets. The changing of the gamma value taking on consideration the used dataset influences polynomial and sigmoid kernels. While the performance of RBF kernel function is more stable with different values of gamma as its accuracy is slightly changed

    A Robust Approach for Mixed Technique of Data Encryption Between DES and RC4 Algorithm

    No full text
    In this research, the well-known encryption algorithms in the encryption Systems, namely DES & RC4 and the advantages and disadvantages of each algorithm are reviewed and evaluated. These two algorithms are combined to produce of new algorithm which is more efficient unscrambling due to the increasing of the level of complexity that make it highly resistance to several attacks. The new algorithm is implemented to show its efficiency in term of time complexity, (i.e. Breaking the code will be much more complicated than if it would have been occurring through the use of each algorithm individually), this process can be achieved with a very small time difference (approximately neglected in the encryption process(. When this algorithm is applied and tested in practice the following result has been obtained: When a block is encrypted using the DES algorithm, the time spent may be (0.000034) milliseconds, but when using the new algorithm to encrypt the same block, the time taken will be about (0.000042) milliseconds. To encrypt 1024 blocks using the DES algorithm, it will take a time of (0.0406),while by using the new algorithm the time taken for encryption is only (0.051). This gives very little increase in time compared to increasing of complexity obtained. Since the new algorithm combining from the two previous, ones allows us to encrypt each block with a key differs from the other one (i.e. each block is encrypted with a different key depending on the preceding block), making it very difficult to break the code leading to an increase in security and information protection against decoding

    A Robust Approach for Mixed Technique of Data Encryption Between DES and RC4 Algorithm

    No full text
    In this research, the well-known encryption algorithms in the encryption Systems, namely DES & RC4 and the advantages and disadvantages of each algorithm are reviewed and evaluated. These two algorithms are combined to produce of new algorithm which is more efficient unscrambling due to the increasing of the level of complexity that make it highly resistance to several attacks. The new algorithm is implemented to show its efficiency in term of time complexity, (i.e. Breaking the code will be much more complicated than if it would have been occurring through the use of each algorithm individually), this process can be achieved with a very small time difference (approximately neglected in the encryption process(. When this algorithm is applied and tested in practice the following result has been obtained: When a block is encrypted using the DES algorithm, the time spent may be (0.000034) milliseconds, but when using the new algorithm to encrypt the same block, the time taken will be about (0.000042) milliseconds. To encrypt 1024 blocks using the DES algorithm, it will take a time of (0.0406),while by using the new algorithm the time taken for encryption is only (0.051). This gives very little increase in time compared to increasing of complexity obtained. Since the new algorithm combining from the two previous, ones allows us to encrypt each block with a key differs from the other one (i.e. each block is encrypted with a different key depending on the preceding block), making it very difficult to break the code leading to an increase in security and information protection against decoding

    Categorization of Arabic posts using Artificial Neural Network and hash features

    No full text
    Sentiment analysis is an important study topic with diverse application domains including social network monitoring and automatic analysis of the body of natural language communication. Existing research on sentiment analysis has already utilised substantial domain knowledge available online comprising users’ opinion in various areas such as business, education, and social media. There is however limited literature available on Arabic language sentiment analysis. Furthermore, datasets used in majority of these studies have poor classification. In the present study, we utilised a primary dataset comprising 2122 sentences and 15,331 words compiled from 206 publicly available online posts to perform sentiment classification by using advanced machine learning technique based on Artificial Neural Networks. Unlike lexicon-based techniques that suffer from low accuracy due to their computational nature and parameter configuration, Artificial Neural Networks were used to classify people opinion posts into three categories including conservative, reform and revolution, accompanied by multiple hasher vector size to benchmark the performance of the proposed model. Extensive simulation results indicated an accuracy of 93.33%, 100%, and 100% for the classification of conservation, reform, and revolutionary classes, respectively

    Image dataset of important grape varieties in the commercial and consumer market

    No full text
    This work presents a primary dataset collected from various geographic locations in Iraq for the seedlings of eight varieties of grapes that are used for local consumption and export. Grape types included in the dataset are: deas al-annz, kamali, halawani, thompson seedless, aswud balad, riasi, frinsi, shdah. Leaves of each type of the seasoned fruit were photographed with high resolution device. A total of 8000 images (i.e., 1000 images per category) were captured using random sampling approach while maintaining the balance and diversity within grape image data. The proposed dataset is of significant potential impact and usefulness with features including (but not limited to) 8 varieties, that have different tastes and can support various industry in agriculture and food manufactures
    corecore