198 research outputs found

    Arabic sentence-level sentiment analysis

    Get PDF
    Sentiment analysis has recently become one of the growing areas of research related to text mining and natural language processing. The increasing availability of online resources and popularity of rich and fast resources for opinion sharing like news, online review sites and personal blogs, caused several parties such as customers, companies, and governments to start analyzing and exploring these opinions. The main task of sentiment classification is to classify a sentence (i.e. review, blog, comment, news, etc.) as holding an overall positive, negative or neutral sentiment. Most of the current studies related to this topic focus mainly on English texts with very limited resources available for other languages like Arabic, especially for the Egyptian dialect. In this research work, we would like to improve the performance measures of Egyptian dialect sentence-level sentiment analysis by proposing a hybrid approach which combines both the machine learning approach using support vector machines and the semantic orientation approach. Two methodologies were proposed, one for each approach, which were then joined, creating the hybrid proposed approach. The corpus used contains more than 20,000 Egyptian dialect tweets collected from Twitter, from which 4800 manually annotated tweets will be used (1600 positive tweets, 1600 negative tweets and 1600 neutral tweets). We performed several experiments to: 1) compare the results of each approach individually with regards to our case which is dealing with the Egyptian dialect before and after preprocessing; 2) compare the performance of merging both approaches together generating the hybrid approach against the performance of each approach separately; and 3) evaluate the effectiveness of considering negation on the performance of the hybrid approach. The results obtained show significant improvements in terms of the accuracy, precision, recall and F-measure, indicating that our proposed hybrid approach is effective in sentence-level sentiment classification. Also, the results are very promising which encourages continuing in this line of research

    Classification of colloquial Arabic tweets in real-time to detect high-risk floods

    Get PDF
    Twitter has eased real-time information flow for decision makers, it is also one of the key enablers for Open-source Intelligence (OSINT). Tweets mining has recently been used in the context of incident response to estimate the location and damage caused by hurricanes and earthquakes. We aim to research the detection of a specific type of high-risk natural disasters frequently occurring and causing casualties in the Arabian Peninsula, namely `floods'. Researching how we could achieve accurate classification suitable for short informal (colloquial) Arabic text (usually used on Twitter), which is highly inconsistent and received very little attention in this field. First, we provide a thorough technical demonstration consisting of the following stages: data collection (Twitter REST API), labelling, text pre-processing, data division and representation, and training models. This has been deployed using `R' in our experiment. We then evaluate classifiers' performance via four experiments conducted to measure the impact of different stemming techniques on the following classifiers SVM, J48, C5.0, NNET, NB and k-NN. The dataset used consisted of 1434 tweets in total. Our findings show that Support Vector Machine (SVM) was prominent in terms of accuracy (F1=0.933). Furthermore, applying McNemar's test shows that using SVM without stemming on Colloquial Arabic is significantly better than using stemming techniques

    Mining Twitter for crisis management: realtime floods detection in the Arabian Peninsula

    Get PDF
    A thesis submitted to the University of Bedfordshire, in partial fulfilment of the requirements for the degree of doctor of Philosophy.In recent years, large amounts of data have been made available on microblog platforms such as Twitter, however, it is difficult to filter and extract information and knowledge from such data because of the high volume, including noisy data. On Twitter, the general public are able to report real-world events such as floods in real time, and act as social sensors. Consequently, it is beneficial to have a method that can detect flood events automatically in real time to help governmental authorities, such as crisis management authorities, to detect the event and make decisions during the early stages of the event. This thesis proposes a real time flood detection system by mining Arabic Tweets using machine learning and data mining techniques. The proposed system comprises five main components: data collection, pre-processing, flooding event extract, location inferring, location named entity link, and flooding event visualisation. An effective method of flood detection from Arabic tweets is presented and evaluated by using supervised learning techniques. Furthermore, this work presents a location named entity inferring method based on the Learning to Search method, the results show that the proposed method outperformed the existing systems with significantly higher accuracy in tasks of inferring flood locations from tweets which are written in colloquial Arabic. For the location named entity link, a method has been designed by utilising Google API services as a knowledge base to extract accurate geocode coordinates that are associated with location named entities mentioned in tweets. The results show that the proposed location link method locate 56.8% of tweets with a distance range of 0 – 10 km from the actual location. Further analysis has shown that the accuracy in locating tweets in an actual city and region are 78.9% and 84.2% respectively

    Sensing real-world events using Arabic Twitter posts

    Get PDF
    In recent years, there has been increased interest in event detection using data posted to social media sites. Automatically transforming user-generated content into information relating to events is a challenging task due to the short informal language used within the content and the variety oftopics discussed on social media. Recent advances in detecting real-world events in English and other languages havebeen published. However, the detection of events in the Arabic language has been limited to date. To address this task, wepresent an end-to-end event detection framework which comprises six main components: data collection, pre-processing, classification, feature selection, topic clustering and summarization. Large-scale experiments over millions of Arabic Twitter messages show the effectiveness of our approach for detecting real-world event content from Twitter posts

    New techniques and framework for sentiment analysis and tuning of CRM structure in the context of Arabic language

    Get PDF
    A thesis submitted to the University of Bedfordshire in partial fulfilment of the requirements for the degree of Doctor of PhilosophyKnowing customers’ opinions regarding services received has always been important for businesses. It has been acknowledged that both Customer Experience Management (CEM) and Customer Relationship Management (CRM) can help companies take informed decisions to improve their performance in the decision-making process. However, real-word applications are not so straightforward. A company may face hard decisions over the differences between the opinions predicted by CRM and actual opinions collected in CEM via social media platforms. Until recently, how to integrate the unstructured feedback from CEM directly into CRM, especially for the Arabic language, was still an open question. Furthermore, an accurate labelling of unstructured feedback is essential for the quality of CEM. Finally, CRM needs to be tuned and revised based on the feedback from social media to realise its full potential. However, the tuning mechanism for CEM of different levels has not yet been clarified. Facing these challenges, in this thesis, key techniques and a framework are presented to integrate Arabic sentiment analysis into CRM. First, as text pre-processing and classification are considered crucial to sentiment classification, an investigation is carried out to find the optimal techniques for the pre-processing and classification of Arabic sentiment analysis. Recommendations for using sentiment analysis classification in MSA as well as Saudi dialects are proposed. Second, to deal with the complexities of the Arabic language and to help operators identify possible conflicts in their original labelling, this study proposes techniques to improve the labelling process of Arabic sentiment analysis with the introduction of neural classes and relabelling. Finally, a framework for adjusting CRM via CEM for both the structure of the CRM system (on the sentence level) and the inaccuracy of the criteria or weights employed in the CRM system (on the aspect level) are proposed. To ensure the robustness and the repeatability of the proposed techniques and framework, the results of the study are further validated with real-word applications from different domains

    Can we predict a riot? Disruptive event detection using Twitter

    Get PDF
    In recent years, there has been increased interest in real-world event detection using publicly accessible data made available through Internet technology such as Twitter, Facebook, and YouTube. In these highly interactive systems, the general public are able to post real-time reactions to “real world” events, thereby acting as social sensors of terrestrial activity. Automatically detecting and categorizing events, particularly small-scale incidents, using streamed data is a non-trivial task but would be of high value to public safety organisations such as local police, who need to respond accordingly. To address this challenge, we present an end-to-end integrated event detection framework that comprises five main components: data collection, pre-processing, classification, online clustering, and summarization. The integration between classification and clustering enables events to be detected, as well as related smaller-scale “disruptive events,” smaller incidents that threaten social safety and security or could disrupt social order. We present an evaluation of the effectiveness of detecting events using a variety of features derived from Twitter posts, namely temporal, spatial, and textual content. We evaluate our framework on a large-scale, real-world dataset from Twitter. Furthermore, we apply our event detection system to a large corpus of tweets posted during the August 2011 riots in England. We use ground-truth data based on intelligence gathered by the London Metropolitan Police Service, which provides a record of actual terrestrial events and incidents during the riots, and show that our system can perform as well as terrestrial sources, and even better in some cases
    corecore