7 research outputs found

    Sentiment Analysis of Microblogs Using Multilayer Feed-Forward Artificial Neural Networks

    Get PDF
    Sentiment analysis aims to extract public opinion on a particular topic and microblogs, especially Twitter as the most influential platform, represent a significant source of information. The application to microblogs has to cope with difficulties, such as informal language with abbreviations, internet jargons, emoticons, hashtags that do not appear in conventional text documents. Sentiment analysis technique for microblogs based on a feed-forward artificial neural network (ANN) with sigmoid activation function is proposed in this paper and compared to machine learning approaches, i.e. Multinomial Naive Bayes, Support Vector Machines and Maximum Entropy. Experiments were performed on Stanford Twitter Sentiment corpus, a balanced dataset which contains noisy training labels weakly annotated using emoticons as sentiment indicators; and SemEval-2014 Task 9 corpus, an unbalanced dataset which contains manually annotated training examples. The obtained results show that ANN produces superior or at least comparable results to state-of-the-art machine learning techniques

    Machine learning-based sentiment analysis of Twitter data

    Get PDF
    The paper analyzes the views of Twitter users on the COVID-19 corona virus pandemic based on machine learning algorithms. The role of sentiment analysis increased with the advent of the social network era and the rapid spread of microblogging applications and forums. Social networks are the main sources for gathering information about users’ thoughts on various themes. People spend more time on social media to share their thoughts with others. One of the themes discussed on social networking platforms Twitter is the COVID-19 corona virus pandemic. In the paper, machine learning methods as Naive Bayes (NB), Support Vector Machine (SVM), Random Forest (RF), Neural Network (NN) are used to analyze the emotional “color” (positive, negative, and neutral) of tweets related to the COVID-19 corona virus pandemic. The experiments are conducted in Python programming using the scikit-learn library. A tweet database related to the COVID-19 corona virus pandemic from the Kaggle website is used for experiments. The RF classifier shows the highest performance in the experiments

    Integration of feature subset selection methods for sentiment analysis

    Get PDF
    Feature selection is one of the main challenges in sentiment analysis to find an optimal feature subset from a real-world domain. The complexity of an optimal feature subset selection grows exponentially based on the number of features for analysing and organizing data in high-dimensional spaces that lead to the high-dimensional problems. To overcome the problem, this study attempted to enhance the feature subset selection in high-dimensional data by removing irrelevant and redundant features using filter and wrapper approaches. Initially, a filter method based on dispersion of samples on feature space known as mutual standard deviation method was developed to minimize intra-class and maximize inter-class distances. The filter-based methods have some advantages such as they are easily scaled to high-dimensional datasets and are computationally simple and fast. Besides, they only depend on feature selection space and ignore the hypothesis model space. Hence, the next step of this study developed a new feature ranking approach by integrating various filter methods. The ordinal-based and frequency-based integration of different filter methods were developed. Finally, a hybrid harmony search based on search strategy was developed and used to enhance the feature subset selection to overcome the problem of ignoring the dependency of feature selection on the classifier. Therefore, a search strategy on feature space using integration of filter and wrapper approaches was introduced to find a semantic relationship among the model selections and subsets of the search features. Comparative experiments were performed on five sentiment datasets, namely movie, music, book, electronics, and kitchen review dataset. A sizeable performance improvement was noted whereby the proposed integration-based feature subset selection method yielded a result of 98.32% accuracy in sentiment classification using POS-based features on movie reviews. Finally, a statistical test conducted based on the accuracy showed significant differences between the proposed methods and the baseline methods in almost all the comparisons in k-fold cross-validation. The findings of the study have shown the effectiveness of the mutual standard deviation and integration-based feature subset selection methods have outperformed the other baseline methods in terms of accuracy

    Conversational AI Assistant Using Artificial Neural Networks: Implementation of a contextual chatbot framework in a Point-of-Sale system

    Get PDF
    Internship Report presented as the partial requirement for obtaining a Master's degree in Data Science and Advanced Analytics, specialization in Business AnalyticsArtificial intelligence is changing the way how businesses are affronting their day-to-day difficulties. Chatbots are the perfect demonstration of how simple tasks and queries such as customer support or sales metrics and reporting could be solved without human intervention. This project introduced a task-oriented chatbot framework for Spanish language in a Point-Of-Sale webpage. We applied Natural Language Processing (NLP) techniques such as NER and evaluated two supervised learning methods: (i) an Artificial Neural Network (ANN) and (ii) a Support Vector Machines (SVM) model to create a contextualized chatbot that classifies the user’s intention in a text conversation, allowing bidirectional human-to-machine communication. These intents could go from simple chitchatting to detailed reports, always providing a natural flow in conversation. The results using an augmented and balanced corpus suggested that ANN model performed statistically better than SVM. Additionally, a real-word scenario with a small-talk survey made to five users gave positive feedback about the quality of predictions. Finally, a software architecture using a PaaS computing service and an API framework was proposed to implement this dialog system in further works

    Cross-lingual sentiment classification using semi-supervised learning

    Get PDF
    Cross-lingual sentiment classification aims to utilize annotated sentiment resources in one language for text sentiment classification in another language. Automatic machine translation services are the most commonly used tools to directly project information from one language into another. However, different term distribution between translated and original documents, translation errors and different intrinsic structure of documents in various languages are the problems that lead to low performance in sentiment classification. Furthermore, due to the existence of different linguistic terms in different languages, translated documents cannot cover all vocabularies which exist in the original documents. The aim of this thesis is to propose an enhanced framework for cross-lingual sentiment classification to overcome all the aforementioned problems in order to improve the classification performance. Combination of active learning and semi-supervised learning in both single view and bi-view frameworks is proposed to incorporate unlabelled data from the target language in order to reduce term distribution divergence. Using bi-view documents can partially alleviate the negative effects of translation errors. Multi-view semisupervised learning is also used to overcome the problem of low term-coverage through employing multiple source languages. Features that are extracted from multiple source languages can cover more vocabularies from test data and consequently, more sentimental terms can be used in the classification process. Content similarities of labelled and unlabelled documents are used through graphbased semi-supervised learning approach to incorporate the structure of documents in the target language into the learning process. Performance evaluation performed on sentiment data sets in four different languages certifies the effectiveness of the proposed approaches in comparison to the well-known baseline classification methods. The experiments show that incorporation of unlabelled data from the target language can effectively improve the classification performance. Experimental results also show that using multiple source languages in the multi-view learning model outperforms other methods. The proposed framework is flexible enough to be applied on any new language, and therefore, it can be used to develop multilingual sentiment analysis systems
    corecore