28 research outputs found

    Exploiting Emotions via Composite Pretrained Embedding and Ensemble Language Model

    Get PDF
    Decisions in the modern era are based on more than just the available data; they also incorporate feedback from online sources. Processing reviews known as Sentiment analysis (SA) or Emotion analysis. Understanding the user's perspective and routines is crucial now-a-days for multiple reasons. It is used by both businesses and governments to make strategic decisions. Various architectural and vector embedding strategies have been developed for SA processing. Accurate representation of text is crucial for automatic SA. Due to the large number of languages spoken and written,  polysemy and syntactic or semantic issues were common. To get around these problems, we developed effective composite embedding (ECE), a method that combines the advantages of vector embedding techniques that are either context-independent (like glove & fasttext) or context-aware (like  XLNet) to effectively represent the features needed for processing.  To improve the performace towards emotion or  sentiment we proposed stacked ensemble model of deep lanugae models.ECE with Ensembled model is evaluated on balanced  dataset to prove that it is a reliable embedding technique and a generalised model for SA.In order to evaluate ECE, cutting-edge ML and Deep net language models are deployed and comapared. The model is evaluated using benchmark datset such as  MR, Kindle along with realtime tweet dataset of user complaints . LIME is used to verify the model's predictions and to provide statistical results for sentence.The model with ECE embedding provides state-of-art results with real time dataset as well

    An Automated Text Mining Approach for Classifying Mental-Ill Health Incidents from Police Incident Logs for Data-Driven Intelligence

    Get PDF
    Data-driven intelligence can play a pivotal role in enhancing the effectiveness and efficiency of police service provision. Despite of police organizations being a rich source of qualitative data (present in less formally structured formats, such as the text logs), little work has been done in automating steps to allow this data to feed into intelligence-led policing tasks, such as demand analysis/prediction. This paper examines the use of police incident logs to better estimate the demand of officers across all incidents, with particular respect to the cases where mental-ill health played a primary part. Persons suffering from mental-ill health are significantly more likely to come into contact with the police, but statistics relating to how much actual police time is spent dealing with this type of incident are highly variable and often subjective. We present a novel deep learning based text mining approach, which allows accurate extraction of mental-ill health related incidents from police incident logs. The data gained from these automated analyses can enable both strategic and operational planning within police forces, allowing policy makers to develop long term strategies to tackle this issue, and to better plan for day-today demand on services. The proposed model has demonstrated the cross-validated classification accuracy of 89.5% on the real dataset

    Hate Speech Classification in Indonesian Language Tweets by Using Convolutional Neural Network

    Get PDF
    The rapid development of social media, added with the freedom of social media users to express their opinions, has influenced the spread of hate speech aimed at certain groups. Online based hate speech can be identified by the used of derogatory words in social media posts. Various studies on hate speech classification have been done, however, very few researches have been conducted on hate speech classification in the Indonesian language. This paper proposes a convolutional neural network method for classifying hate speech in tweets in the Indonesian language. Datasets for both the training and testing stages were collected from Twitter. The collected tweets were categorized into hate speech and non-hate speech. We used TF-IDF as the term weighting method for feature extraction. The most optimal training accuracy and validation accuracy gained were 90.85% and 88.34% at 45 epochs. For the testing stage, experiments were conducted with different amounts of testing data. The highest testing accuracy was 82.5%, achieved by the dataset with 50 tweets in each category

    Sentiment Analysis on Work from Home Policy Using Naïve Bayes Method and Particle Swarm Optimization

    Get PDF
    At the beginning of 2020, the world was shocked by the coronavirus, which spread rapidly in various countries, one of which was Indonesia. So that the government implemented the Work from Home policy to suppress the spread of Covid-19. This has resulted in many people writing their opinions on the Twitter social media platform and reaping many pros and cons of the community from all aspects. The data source used in this study came from tweets with keywords related to work from home. Several previous studies in this field have not implemented feature selection for sentiment analysis, although the method used is not optimal. So that the contribution in this study is to classify public opinion into positive and negative using sentiment analysis and implement PSO for feature selection and Naïve Bayes for classifiers in building sentiment analysis models. The results showed that the best accuracy was 81% in the classification using Naive Bayes and 86% in the classification using naive Bayes based on PSO through a comparison of 90% training data and 10% test data. With the addition of an accuracy of 5%, it can be concluded that the use of the Particle Swarm Optimization algorithm as a feature selection can help the classification process so that the results obtained are more effective than before

    Considerations about learning Word2Vec

    Get PDF
    AbstractDespite the large diffusion and use of embedding generated through Word2Vec, there are still many open questions about the reasons for its results and about its real capabilities. In particular, to our knowledge, no author seems to have analysed in detail how learning may be affected by the various choices of hyperparameters. In this work, we try to shed some light on various issues focusing on a typical dataset. It is shown that the learning rate prevents the exact mapping of the co-occurrence matrix, that Word2Vec is unable to learn syntactic relationships, and that it does not suffer from the problem of overfitting. Furthermore, through the creation of an ad-hoc network, it is also shown how it is possible to improve Word2Vec directly on the analogies, obtaining very high accuracy without damaging the pre-existing embedding. This analogy-enhanced Word2Vec may be convenient in various NLP scenarios, but it is used here as an optimal starting point to evaluate the limits of Word2Vec

    A Proposed Sentiment Analysis Deep Learning Algorithm for Analyzing COVID-19 Tweets

    Get PDF
    With the rise in cases of COVID-19, a bizarre situation of pressure was mounted on each country to make arrangements to control the population and utilize the available resources appropriately. The swiftly rising of positive cases globally created panic, anxiety and depression among people. The effect of this deadly disease was found to be directly proportional to the physical and mental health of the population. As of 28 October 2020, more than 40 million people are tested positive and more than 1 million deaths have been recorded. The most dominant tool that disturbed human life during this time is social media. The tweets regarding COVID-19, whether it was a number of positive cases or deaths, induced a wave of fear and anxiety among people living in different parts of the world. Nobody can deny the truth that social media is everywhere and everybody is connected with it directly or indirectly. This offers an opportunity for researchers and data scientists to access the data for academic and research use. The social media data contains many data that relate to real-life events like COVID-19. In this paper, an analysis of Twitter data has been done through the R programming language. We have collected the Twitter data based on hashtag keywords, including COVID-19, coronavirus, deaths, new case, recovered. In this study, we have designed an algorithm called Hybrid Heterogeneous Support Vector Machine (H-SVM) and performed the sentiment classification and classified them positive, negative and neutral sentiment scores. We have also compared the performance of the proposed algorithm on certain parameters like precision, recall, F1 score and accuracy with Recurrent Neural Network (RNN) and Support Vector Machine (SVM)

    Sentiment Analysis for E-Commerce Products Using Natural Language Processing

    Get PDF
    Sentiment analysis is one of the ways to evaluate the attitude of consumers towards products and services. E-commerce businesses have grown to a larger level in recent years. Customers' opinions and preferences are collected to analyze them further to boost online businesses. Collecting real-time structured and unstructured data and performing sentiment analysis on them are challenging and need to be addressed. We have used PySpark, and resilient distributed dataset (RDD) based sentiment analysis using Spark NLP to address scalability and availability issues in sentiment analysis on the e-commerce platform. We have also used FLASK-based Restful APIs and Scrapy for web scrapping to collect useful data from an e-commerce site. Our findings indicate that the proposed method of Natural Language Processing (NLP) for e-commerce products in real-time has enhanced efficiency in terms of scalability, availability, and faster data collectio
    corecore