6 research outputs found

    Improving the Operation of Text Categorization Systems with Selecting Proper Features Based on PSO-LA

    Get PDF
    With the explosive growth in amount of information, it is highly required to utilize tools and methods in order to search, filter and manage resources. One of the major problems in text classification relates to the high dimensional feature spaces. Therefore, the main goal of text classification is to reduce the dimensionality of features space. There are many feature selection methods. However, only a few methods are utilized for huge text classification problems. In this paper, we propose a new wrapper method based on Particle Swarm Optimization (PSO) algorithm and Support Vector Machine (SVM). We combine it with Learning Automata in order to make it more efficient. This helps to select better features using the reward and penalty system of automata. To evaluate the efficiency of the proposed method, we compare it with a method which selects features based on Genetic Algorithm over the Reuters-21578 dataset. The simulation results show that our proposed algorithm works more efficiently

    Sentiment Analysis in Arabic Social Media Using Association Rule Mining

    Get PDF
    The fast-paced growth in worldwide webs has resulted in the development of sentiment analysis it involves the analysis of comments or web reviews. The sentiment classification of the Arabic social media is an exciting and fascinating area of study. Hence this study brings forth a new method engaging association rules with three Feature Selection (FS) methods in the Sentiment Analysis (SA) of web reviews in the Arabic language. The feature selection methods used are (χ2), Gini Index (GI) and Information Gain (GI). This study reveals that the use of feature selection methods has enhanced the classifier results. This means that the proposed model shows a better result than the baseline result. Finally, the experimental results show that the Chi-square Feature Selection (FS) produces the best classification technique with a high accuracy of f-measure (86.811)

    Stock Market Random Forest-Text Mining (SMRF-TM) Approach to Analyse Critical Indicators of Stock Market Movements

    Get PDF
    The Stock Market is a significant sector of a country’s economy and has a crucial role in the growth of commerce and industry. Hence, discovering efficient ways to analyse and visualise stock market data is considered a significant issue in modern finance. The use of data mining techniques to predict stock market movements has been extensively studied using historical market prices but such approaches are constrained to make assessments within the scope of existing information, and thus they are not able to model any random behaviour of the stock market or identify the causes behind events. One area of limited success in stock market prediction comes from textual data, which is a rich source of information. Analysing textual data related to the Stock Market may provide better understanding of random behaviours of the market. Text Mining combined with the Random Forest algorithm offers a novel approach to the study of critical indicators, which contribute to the prediction of stock market abnormal movements. In this thesis, a Stock Market Random Forest-Text Mining system (SMRF-TM) is developed and is used to mine the critical indicators related to the 2009 Dubai stock market debt standstill. Random forest and expectation maximisation are applied to classify the extracted features into a set of meaningful and semantic classes, thus extending current approaches from three to eight classes: critical down, down, neutral, up, critical up, economic, social and political. The study demonstrates that Random Forest has outperformed other classifiers and has achieved the best accuracy in classifying the bigram features extracted from the corpus

    Unified processing framework of high-dimensional and overly imbalanced chemical datasets for virtual screening.

    Get PDF
    Virtual screening in drug discovery involves processing large datasets containing unknown molecules in order to find the ones that are likely to have the desired effects on a biological target, typically a protein receptor or an enzyme. Molecules are thereby classified into active or non-active in relation to the target. Misclassification of molecules in cases such as drug discovery and medical diagnosis is costly, both in time and finances. In the process of discovering a drug, it is mainly the inactive molecules classified as active towards the biological target i.e. false positives that cause a delay in the progress and high late-stage attrition. However, despite the pool of techniques available, the selection of the suitable approach in each situation is still a major challenge. This PhD thesis is designed to develop a pioneering framework which enables the analysis of the virtual screening of chemical compounds datasets in a wide range of settings in a unified fashion. The proposed method provides a better understanding of the dynamics of innovatively combining data processing and classification methods in order to screen massive, potentially high dimensional and overly imbalanced datasets more efficiently
    corecore