687 research outputs found

    A survey of data mining techniques for social media analysis

    Get PDF
    Social network has gained remarkable attention in the last decade. Accessing social network sites such as Twitter, Facebook LinkedIn and Google+ through the internet and the web 2.0 technologies has become more affordable. People are becoming more interested in and relying on social network for information, news and opinion of other users on diverse subject matters. The heavy reliance on social network sites causes them to generate massive data characterised by three computational issues namely; size, noise and dynamism. These issues often make social network data very complex to analyse manually, resulting in the pertinent use of computational means of analysing them. Data mining provides a wide range of techniques for detecting useful knowledge from massive datasets like trends, patterns and rules [44]. Data mining techniques are used for information retrieval, statistical modelling and machine learning. These techniques employ data pre-processing, data analysis, and data interpretation processes in the course of data analysis. This survey discusses different data mining techniques used in mining diverse aspects of the social network over decades going from the historical techniques to the up-to-date models, including our novel technique named TRCM. All the techniques covered in this survey are listed in the Table.1 including the tools employed as well as names of their authors

    Segmentation based Twitter Opinion Mining using Ensemble Learning

    Get PDF
    In recent years, social media has become the prime place for advertisements, activities, campaigns, protests etc. It provides a platform for the people to express their views and beliefs to the masses. The user beliefs, practices and interests are of great importance to organizations and provides insight into the minds of users. Data Mining is one such tool that enables these organizations to extract relevant information from user data, which can be analyzed to create a knowledge set and determine user opinion that allows companies to create products tailored to the user. Data Mining of Twitter and other social platforms is of a great importance because, its large user base is a goldmine of public opinions and views which if analyzed properly, can potentially be used to predict campaign results and product assessments and likeability. This project proposes a classification scheme that aims to perform Segmentation based Twitter Opinion Mining using Ensemble Learning. The proposed scheme is able to detect and filter out bots and uses text segmentation for effective text classification and part of speech tagging.Keywords - machine learning, supervised learning, text analysis, sentiment analysis, natural language processing

    An Ontology Artifact for Information Systems Sentiment Analysis

    Get PDF
    As companies and organizations increasingly rely on on-line, user-supplied data to obtain valuable insights into their operations, sentiment analysis of textual data has proven to be a most valuable resource. To understand how sentiment analysis can be used effectively, it is important to identify what types of sentiment analysis could be employed during the analysis of a given situation. This research proposes an Information Systems Sentiment Ontology, the purpose of which is to provide a basis for mining and understanding sentiment, specifically from text provided by customers as online content. The Information Systems Sentiment Ontology is developed by analyzing the literature on emotion, sentiment analysis, and ontology development and from prior research on online forum analysis. A traditional design science approach is followed to the ontology development. Details on the creation and application of the ontology artifact are provided

    Understanding user behavior aspects on emergency mobile applications during emergency communications using NLP and text mining techniques

    Get PDF
    Abstract. The use of mobile devices has been skyrocketing in our society. Users can access and share any type of information in a timely manner through these devices using different social media applications. This enabled users to increase their awareness of ongoing events such as election campaigns, sports updates, movie releases, disaster occurrences, and studies. The attractiveness, affordability, and two-way communication capabilities empowered these mobile devices that support various social media platforms to be central to emergency communication as well. This makes a mobile-based emergency application an attractive communication tool during emergencies. The emergence of mobile-based emergency communication has intrigued us to learn about the user behavior related to the usage of these applications. Our study was mainly conducted on emergency apps in Nordic countries such as Finland, Sweden, and Norway. To understand the user objects regarding the usage of emergency mobile applications we leveraged various Natural Language Processing and Text Mining techniques. VADER sentiment tool was used to predict and track users’ review polarity of a particular application over time. Lately, to identify factors that affect users’ sentiments, we employed topic modeling techniques such as the Latent Dirichlet Allocation (LDA) model. This model identifies various themes discussed in the user reviews and the result of each theme will be represented by the weighted sum of words in the corpus. Even though LDA succeeds in highlighting the user-related factors, it fails to identify the aspects of the user, and the topic definition from the LDA model is vague. Hence we leveraged Aspect Based Sentiment Analysis (ABSA) methods to extract the user aspects from the user reviews. To perform this task we consider fine-tuning DeBERTa (a variant of the BERT model). BERT is a Bidirectional Encoder Representation of transformer architecture which allows the model to learn the context in the text. Following this, we performed a sentence pair sentiment classification task using different variants of BERT. Later, we dwell on different sentiments to highlight the factors and the categories that impact user behavior most by leveraging the Empath categorization technique. Finally, we construct a word association by considering different Ontological vocabularies related to mobile applications and emergency response and management systems. The insights from the study can be used to identify the user aspect terms, predict the sentiment of the aspect term in the review provided, and find how the aspect term impacts the user perspective on the usage of mobile emergency applications

    A Fine Grain Sentiment Analysis with Semantics in Tweets

    Get PDF
    Social networking is nowadays a major source of new information in the world. Microblogging sites like Twitter have millions of active users (320 million active users on Twitter on the 30th September 2015) who share their opinions in real time, generating huge amounts of data. These data are, in most cases, available to any network user. The opinions of Twitter users have become something that companies and other organisations study to see whether or not their users like the products or services they offer. One way to assess opinions on Twitter is classifying the sentiment of the tweets as positive or negative. However, this process is usually done at a coarse grain level and the tweets are classified as positive or negative. However, tweets can be partially positive and negative at the same time, referring to different entities. As a result, general approaches usually classify these tweets as “neutral”. In this paper, we propose a semantic analysis of tweets, using Natural Language Processing to classify the sentiment with regards to the entities mentioned in each tweet. We offer a combination of Big Data tools (under the Apache Hadoop framework) and sentiment analysis using RDF graphs supporting the study of the tweet’s lexicon. This work has been empirically validated using a sporting event, the 2014 Phillips 66 Big 12 Men’s Basketball Championship. The experimental results show a clear correlation between the predicted sentiments with specific events during the championship

    Comprehensive Review of Opinion Summarization

    Get PDF
    The abundance of opinions on the web has kindled the study of opinion summarization over the last few years. People have introduced various techniques and paradigms to solving this special task. This survey attempts to systematically investigate the different techniques and approaches used in opinion summarization. We provide a multi-perspective classification of the approaches used and highlight some of the key weaknesses of these approaches. This survey also covers evaluation techniques and data sets used in studying the opinion summarization problem. Finally, we provide insights into some of the challenges that are left to be addressed as this will help set the trend for future research in this area.unpublishednot peer reviewe

    DomainSenticNet: An Ontology and a Methodology Enabling Domain-aware Sentic Computing

    Full text link
    [EN] In recent years, SenticNet and OntoSenticNet have represented important developments in the novel interdisciplinary field of research known as sentic computing, enabling the development of a variety of Sentic applications. In this paper, we propose an extension of the OntoSenticNet ontology, named DomainSenticNet, and contribute an unsupervised methodology to support the development of domain-aware Sentic applications. We developed an unsupervised methodology that, for each concept in OntoSenticNet, mines semantically related concepts from WordNet and Probase knowledge bases and computes domain distributional information from the entire collection of Kickstarter domain-specific crowdfunding campaigns. Subsequently, we applied DomainSenticNet to a prototype tool for Kickstarter campaign authoring and success prediction, demonstrating an improvement in the interpretability of sentiment intensities. DomainSenticNet is an extension of the OntoSenticNet ontology that integrates each of the 100,000 concepts included in OntoSenticNet with a set of semantically related concepts and domain distributional information. The defined unsupervised methodology is highly replicable and can be easily adapted to build similar domain-aware resources from different domain corpora and external knowledge bases. Used in combination with OntoSenticNet, DomainSenticNet may favor the development of novel hybrid aspect-based sentiment analysis systems and support further research on sentic computing in domain-aware applications.The work of Paolo Rosso was partially funded by the Spanish MICINN under the project PGC2018-096212-B-C31.Distante, D.; Faralli, S.; Rittinghaus, S.; Rosso, P.; Samsami, N. (2022). DomainSenticNet: An Ontology and a Methodology Enabling Domain-aware Sentic Computing. Cognitive Computation. 14(1):62-77. https://doi.org/10.1007/s12559-021-09825-w627714

    A Latent Dirichlet Allocation Approach using Mixed Graph of Terms for Sentiment Analysis

    Get PDF
    The spread of generic (as Twitter, Facebook orGoogle+) or specialized (as LinkedIn or Viadeo) social networks allows to millions of users to share opinions on different aspects of life every day. Therefore this information is a rich source of data for opinion mining and sentiment analysis. This paper presents a novel approach to the sentiment analysis based on the Latent Dirichlet Allocation (LDA) approach. The proposed methodology aims to identify a word-based graphical model (we call it a mixed graph of terms) for depicting a positive or negative attitude towards a topic. By the use of this model it will be possible to automatically mine from documents positive and negative sentiments.Experimental evaluation, on standard and real datasets, shows that the proposed approach is effective and furnishes good and reliable results

    Sentiment analysis through twitter as a mechanism for assessing university satisfaction

    Get PDF
    Currently, the data generated in the university environment related to the perception of satisfaction is generated through surveys with categorical response questions defined on a Likert scale, with factors already defined to be evaluated, applied once per academic semester, which generates very biased information. This leads us to wonder why this survey is applied only once and why it only asks about some factors. The objective of the article is to demonstrate the feasibility of a proposal to determine the degree of perception of student satisfaction through the use of data science and natural language processing (NLP), supported by the social network twitter, as an element of data collection. As a result of the application of this proposal based on data science, it was possible to determine the level of student satisfaction, being 57.27%, through sentiment analysis using the Python library "NLTK"; Thus, it was also possible to extract texts linked to the relevant factors of teaching performance to achieve student satisfaction, through the term frequency and inverse document frequency (TF-IDF) approach, these being those linked to the use of tools of simulation in the virtual learning process.Campus Lima Centr

    Feature-Based Opinion Classification Using the KPCA Technique: Concept and Performance Evaluation

    Get PDF
    Over the last several years, a widespread trend on the internet has been the proliferation of online evaluations written by people with whom they share their ideas, interests, experiences, and opinions. Opinion mining, also known as sentiment analysis, is the process of classifying pieces of text written in a natural language on a subject into positive, negative, or neutral categories according to the human emotions, views, and feelings that are communicated in that text. The field of sentiment analysis has progressed to the point that it can now analyse internet evaluations and provide significant information to people as well as corporations, which may assist these parties in the decision-making process. In the proposed model, feature extraction extracts the collection of features that are both semantically and statistically significant using the kernel principal component analysis (KPCA) method. According to the findings of the simulations, the suggested model performs better than other existing models
    • 

    corecore