121,661 research outputs found

    Visualizing and Quantifying Impact and Effect in Twitter Narrative using Geometric Data Analysis

    Full text link
    We use geometric multivariate data analysis which has been termed a methodology for both the visualization and verbalization of data. The general objectives are data mining and knowledge discovery. In the first case study, we use the narrative surrounding very highly profiled tweets, and thus a Twitter event of significance and importance. In the second case study, we use eight carefully planned Twitter campaigns relating to environmental issues. The aim of these campaigns was to increase environmental awareness and behaviour. Unlike current marketing, political and other communication campaigns using Twitter, we develop an innovative approach to measuring bevavioural change. We show also how we can assess statistical significance of social media behaviour.Comment: 34 pages, 11 figure

    Revisit Behavior in Social Media: The Phoenix-R Model and Discoveries

    Full text link
    How many listens will an artist receive on a online radio? How about plays on a YouTube video? How many of these visits are new or returning users? Modeling and mining popularity dynamics of social activity has important implications for researchers, content creators and providers. We here investigate the effect of revisits (successive visits from a single user) on content popularity. Using four datasets of social activity, with up to tens of millions media objects (e.g., YouTube videos, Twitter hashtags or LastFM artists), we show the effect of revisits in the popularity evolution of such objects. Secondly, we propose the Phoenix-R model which captures the popularity dynamics of individual objects. Phoenix-R has the desired properties of being: (1) parsimonious, being based on the minimum description length principle, and achieving lower root mean squared error than state-of-the-art baselines; (2) applicable, the model is effective for predicting future popularity values of objects.Comment: To appear on European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases 201

    Relevance of Health-Related Hashtags on Twitter: A Text Mining Approach

    Get PDF
    BACKGROUND Social media platforms facilitate user interaction and impact decision making. Users prefer to use hashtags while sharing posts. Knowing the sentiment towards diabetes, bloodpressure, and obesity is fundamental to understanding the impact of these information on patients and their families. The study seeks to determine the relevance of health-related hashtags on Twitter and analyze sentiments about diabetes, obesity, blood pressure. METHOD Tweets were retrieved using synonyms for “diabetes”, “hypertension” and “obesity”. The extended knowledge discovery in data mining (KDDM) model guided our research with research objectives defined in the ‘research problem understanding’ phase. The ‘information seeking’ from Uses and Gratifications Theory (UGT) determined the success and text mining assessment criteria. Text pre-processing was done using tokenization, stop word removal, and stemming. The research objectives, text mining goals, and success criteria were answered using ‘Uses and Gratifications Theory’ (UGT). RESULTS Total 6749 tweets were extracted using RStudio. 36.41% were about blood pressure, 0.25%- diabetes, 24.43% -obesity and 6.99% -combination of two or more terms. Additional topics such as cholesterol, chia seeds, postpartum, diet, exercise were identified. Upcoming conferences like ‘#ipna’, ‘#review’, ‘#APCH2019’, ‘#cardiotwitter’ were identified. Increased user engagement – about managing blood pressure, diabetes, obesity across different age groups, as well as the consequences of increased cardio exercise for obese and diabetic users were encouraging. Tweets about advertisements specific to clothing for oversized individuals-initiated conversation among users about monitoring self-health. CONCLUSIONS Sentiment analysis can thus increase our understanding about user engagement on such platforms and potentially help improve managing public health strategically.https://scholarscompass.vcu.edu/gradposters/1105/thumbnail.jp

    A method for ontology and knowledgebase assisted text mining for diabetes discussion forum

    Get PDF
    Social media offers researchers vast amount of unstructured text as a source to discover hidden knowledge and insights. However, social media poses new challenges to text mining and knowledge discovery due to its short length, temporal nature and informal language. In order to identify the main requirements for analysing unstructured text in social media, this research takes a case study of a large discussion forum in the diabetes domain. It then reviews and evaluates existing text mining methods for the requirements to analyse such a domain. Using domain background knowledge to bridge the semantic gap in traditional text mining methods was identified as a key requirement for analysing text in discussion forums. Existing ontology engineering methodologies encounter difficulties in deriving suitable domain knowledge with the appropriate breadth and depth in domain-specific concepts with a rich relationships structure. These limitations usually originate from a reliance on human domain experts. This research developed a novel semantic text mining method. It can identify the concepts and topics being discussed, the strength of the relationships between them and then display the emergent knowledge from a discussion forum. The derived method has a modular design that consists of three main components: The Ontology building Process, Semantic Annotation and Topic Identification, and Visualisation Tools. The ontology building process generates domain ontology quickly with little need for domain experts. The topic identification component utilises a hybrid system of domain ontology and a general knowledge base for text enrichment and annotation, while the visualisation methods of dynamic tag clouds and cooccurrence network for pattern discovery enable a flexible visualisation of these results and can help uncover hidden knowledge. Application of the derived text mining method within the case study helped identify trending topics in the forum and how they change over time. The derived method performed better in semantic annotation of the text compared to the other systems evaluated. The new text mining method appears to be “generalisable” to other domains than diabetes. Future study needs to confirm this ability and to evaluate its applicability to other types of social media text sources

    A Sentiment Knowledge Discovery Model in Twitter’s TV Content Using Stochastic Gradient Descent Algorithm

    Get PDF
    The use of social media that the explosive can be a rich source for data mining. Meanwhile, the development of television programs become increased and varied so motivate people to make comments on it’s via social media. Social network contains abundant information which is unstructured, heterogeneous, high dimensional and incremental in nature. Abundant data can be a rich source of information but it is difficult to identify manually. The contributions of this research are to perform preprocessing to address unstructured data, a lot of noise and heterogeneous; find patterns of information and knowledge of social media user activities in the form of positive and negative sentiment on twitter TV content. Some methodologies and techniques are used to perform preprocessing. They are eliminates punctuation and symbols, eliminates number, replace numbers into letters, translation of Alay words, eliminate stop word and Stemming Porter Algorithm. Methodology of this study was used Stochastic Gradient Descent (SGD).The text that has been through preprocessing produces a more structured text, reducing noise and reducing the diversity of text. So, preprocessing affect to the correctly classified istances and processing time. The experiment results reveal that the use of SGD for discovery of the positive and negative sentiment tends to be faster for large data or stream data. Correctly classified instance with a maximum of 88%

    Pattern Discovery from Event Data

    Get PDF
    Events are ubiquitous in real-life. With the rapid rise of the popularity of social media channels, massive amounts of event data, such as information about festivals, concerts, or meetings, are increasingly created and shared by users on the Internet. Deriving insights or knowledge from such social media data provides a semantically rich basis for many applications, for instance, social media marketing, service recommendation, sales promotion, or enrichment of existing data sources. In spite of substantial research on discovering valuable knowledge from various types of social media data such as microblog data, check-in data, or GPS trajectories, interestingly there has been only little work on mining event data for useful patterns. In this thesis, we focus on the discovery of interesting, useful patterns from datasets of events, where information about these events is shared by and spread across social media platforms. To deal with the existence of heterogeneous event data sources, we propose a comprehensive framework to model events for pattern mining purposes, where each event is described by three components: context, time, and location. This framework allows one to easily define how events are related in terms of conceptual, temporal, and spatial (geographic) relationships. Moreover, we also take into account hierarchies for contexts, time, and locations of events, which naturally exist as useful background knowledge to derive patterns at different levels of abstraction and granularity. Based on this framework, we focus on the following problems: (i) mining interval-based event sequence patterns, (ii) mining periodic event patterns, and (iii) extracting semantic annotations for locations of events. Generally, the first two problems consider correlations of events whereas the last one takes correlations of event components into account. In particular, the first problem is a generalization of mining sequential patterns from traditional data, where patterns representing complex temporal relationships among events can be discovered at different levels of abstraction and granularity. The second problem is to find periodic event patterns, where a notion of relaxed periodicity is formulated for events as well as for groups of events that co-occur. The third~problem is to extract semantic annotations for locations on the basis of exploiting correlations of contexts, time, and locations of events. For the three problems above, we respectively propose novel and efficient approaches. Our experiments clearly indicate that extracted patterns and knowledge can be well utilized in various useful tasks, such as event prediction, semantic search for locations, or topic-based clustering of locations
    • 

    corecore