12,442 research outputs found

    Using Text Similarity to Detect Social Interactions not Captured by Formal Reply Mechanisms

    Full text link
    In modeling social interaction online, it is important to understand when people are reacting to each other. Many systems have explicit indicators of replies, such as threading in discussion forums or replies and retweets in Twitter. However, it is likely these explicit indicators capture only part of people's reactions to each other, thus, computational social science approaches that use them to infer relationships or influence are likely to miss the mark. This paper explores the problem of detecting non-explicit responses, presenting a new approach that uses tf-idf similarity between a user's own tweets and recent tweets by people they follow. Based on a month's worth of posting data from 449 ego networks in Twitter, this method demonstrates that it is likely that at least 11% of reactions are not captured by the explicit reply and retweet mechanisms. Further, these uncaptured reactions are not evenly distributed between users: some users, who create replies and retweets without using the official interface mechanisms, are much more responsive to followees than they appear. This suggests that detecting non-explicit responses is an important consideration in mitigating biases and building more accurate models when using these markers to study social interaction and information diffusion.Comment: A final version of this work was published in the 2015 IEEE 11th International Conference on e-Science (e-Science

    Enhancing Decision Making Capacity in Tourism Domain Using Social Media Analytics

    Full text link
    Social media has gained an immense popularity over the last decade. People tend to express opinions about their daily encounters on social media freely. These daily encounters include the places they traveled, hotels or restaurants they have tried and aspects related to tourism in general. Since people usually express their true experiences on social media, the expressed opinions contain valuable information that can be used to generate business value and aid in decision-making processes. Due to the large volume of data, it is not a feasible task to manually go through each and every item and extract the information. Hence, we propose a social media analytics platform which has the capability to identify discussion pathways and aspects with their corresponding sentiment and deeper emotions using machine learning techniques and a visualization tool which shows the extracted insights in a comprehensible and concise manner. Identified topic pathways and aspects will give a decision maker some insight into what are the most discussed topics about the entity whereas associated sentiments and emotions will help to identify the feedback.Comment: To Appear in Proceedings of International Conference on Advances in ICT for Emerging Regions, Colombo, L

    Using Social Media to Predict the Future: A Systematic Literature Review

    Full text link
    Social media (SM) data provides a vast record of humanity's everyday thoughts, feelings, and actions at a resolution previously unimaginable. Because user behavior on SM is a reflection of events in the real world, researchers have realized they can use SM in order to forecast, making predictions about the future. The advantage of SM data is its relative ease of acquisition, large quantity, and ability to capture socially relevant information, which may be difficult to gather from other data sources. Promising results exist across a wide variety of domains, but one will find little consensus regarding best practices in either methodology or evaluation. In this systematic review, we examine relevant literature over the past decade, tabulate mixed results across a number of scientific disciplines, and identify common pitfalls and best practices. We find that SM forecasting is limited by data biases, noisy data, lack of generalizable results, a lack of domain-specific theory, and underlying complexity in many prediction tasks. But despite these shortcomings, recurring findings and promising results continue to galvanize researchers and demand continued investigation. Based on the existing literature, we identify research practices which lead to success, citing specific examples in each case and making recommendations for best practices. These recommendations will help researchers take advantage of the exciting possibilities offered by SM platforms

    From Tweets to Events: Exploring a Scalable Solution for Twitter Streams

    Full text link
    The unprecedented use of social media through smartphones and other web-enabled mobile devices has enabled the rapid adoption of platforms like Twitter. Event detection has found many applications on the web, including breaking news identification and summarization. The recent increase in the usage of Twitter during crises has attracted researchers to focus on detecting events in tweets. However, current solutions have focused on static Twitter data. The necessity to detect events in a streaming environment during fast paced events such as a crisis presents new opportunities and challenges. In this paper, we investigate event detection in the context of real-time Twitter streams as observed in real-world crises. We highlight the key challenges in this problem: the informal nature of text, and the high volume and high velocity characteristics of Twitter streams. We present a novel approach to address these challenges using single-pass clustering and the compression distance to efficiently detect events in Twitter streams. Through experiments on large Twitter datasets, we demonstrate that the proposed framework is able to detect events in near real-time and can scale to large and noisy Twitter streams

    A Survey of Location Prediction on Twitter

    Full text link
    Locations, e.g., countries, states, cities, and point-of-interests, are central to news, emergency events, and people's daily lives. Automatic identification of locations associated with or mentioned in documents has been explored for decades. As one of the most popular online social network platforms, Twitter has attracted a large number of users who send millions of tweets on daily basis. Due to the world-wide coverage of its users and real-time freshness of tweets, location prediction on Twitter has gained significant attention in recent years. Research efforts are spent on dealing with new challenges and opportunities brought by the noisy, short, and context-rich nature of tweets. In this survey, we aim at offering an overall picture of location prediction on Twitter. Specifically, we concentrate on the prediction of user home locations, tweet locations, and mentioned locations. We first define the three tasks and review the evaluation metrics. By summarizing Twitter network, tweet content, and tweet context as potential inputs, we then structurally highlight how the problems depend on these inputs. Each dependency is illustrated by a comprehensive review of the corresponding strategies adopted in state-of-the-art approaches. In addition, we also briefly review two related problems, i.e., semantic location prediction and point-of-interest recommendation. Finally, we list future research directions.Comment: Accepted to TKDE. 30 pages, 1 figur

    Automatically Detecting Self-Reported Birth Defect Outcomes on Twitter for Large-scale Epidemiological Research

    Full text link
    In recent work, we identified and studied a small cohort of Twitter users whose pregnancies with birth defect outcomes could be observed via their publicly available tweets. Exploiting social media's large-scale potential to complement the limited methods for studying birth defects, the leading cause of infant mortality, depends on the further development of automatic methods. The primary objective of this study was to take the first step towards scaling the use of social media for observing pregnancies with birth defect outcomes, namely, developing methods for automatically detecting tweets by users reporting their birth defect outcomes. We annotated and pre-processed approximately 23,000 tweets that mention birth defects in order to train and evaluate supervised machine learning algorithms, including feature-engineered and deep learning-based classifiers. We also experimented with various under-sampling and over-sampling approaches to address the class imbalance. A Support Vector Machine (SVM) classifier trained on the original, imbalanced data set, with n-grams, word clusters, and structural features, achieved the best baseline performance for the positive classes: an F1-score of 0.65 for the "defect" class and 0.51 for the "possible defect" class. Our contributions include (i) natural language processing (NLP) and supervised machine learning methods for automatically detecting tweets by users reporting their birth defect outcomes, (ii) a comparison of feature-engineered and deep learning-based classifiers trained on imbalanced, under-sampled, and over-sampled data, and (iii) an error analysis that could inform classification improvements using our publicly available corpus. Future work will focus on automating user-level analyses for cohort inclusion

    Applying Social Media Intelligence for Predicting and Identifying On-line Radicalization and Civil Unrest Oriented Threats

    Full text link
    Research shows that various social media platforms on Internet such as Twitter, Tumblr (micro-blogging websites), Facebook (a popular social networking website), YouTube (largest video sharing and hosting website), Blogs and discussion forums are being misused by extremist groups for spreading their beliefs and ideologies, promoting radicalization, recruiting members and creating online virtual communities sharing a common agenda. Popular microblogging websites such as Twitter are being used as a real-time platform for information sharing and communication during planning and mobilization if civil unrest related events. Applying social media intelligence for predicting and identifying online radicalization and civil unrest oriented threats is an area that has attracted several researchers' attention over past 10 years. There are several algorithms, techniques and tools that have been proposed in existing literature to counter and combat cyber-extremism and predicting protest related events in much advance. In this paper, we conduct a literature review of all these existing techniques and do a comprehensive analysis to understand state-of-the-art, trends and research gaps. We present a one class classification approach to collect scholarly articles targeting the topics and subtopics of our research scope. We perform characterization, classification and an in-depth meta analysis meta-anlaysis of about 100 conference and journal papers to gain a better understanding of existing literature.Comment: 18 pages, 16 figures, 4 tables. This paper is a comprehensive and detailed literature survey to understand current state-of-the-art of Online Social Media Intelligence to counter and combat ISI related threat

    On Identifying Disaster-Related Tweets: Matching-based or Learning-based?

    Full text link
    Social media such as tweets are emerging as platforms contributing to situational awareness during disasters. Information shared on Twitter by both affected population (e.g., requesting assistance, warning) and those outside the impact zone (e.g., providing assistance) would help first responders, decision makers, and the public to understand the situation first-hand. Effective use of such information requires timely selection and analysis of tweets that are relevant to a particular disaster. Even though abundant tweets are promising as a data source, it is challenging to automatically identify relevant messages since tweet are short and unstructured, resulting to unsatisfactory classification performance of conventional learning-based approaches. Thus, we propose a simple yet effective algorithm to identify relevant messages based on matching keywords and hashtags, and provide a comparison between matching-based and learning-based approaches. To evaluate the two approaches, we put them into a framework specifically proposed for analyzing disaster-related tweets. Analysis results on eleven datasets with various disaster types show that our technique provides relevant tweets of higher quality and more interpretable results of sentiment analysis tasks when compared to learning approach

    Latent Dirichlet Allocation (LDA) and Topic modeling: models, applications, a survey

    Full text link
    Topic modeling is one of the most powerful techniques in text mining for data mining, latent data discovery, and finding relationships among data, text documents. Researchers have published many articles in the field of topic modeling and applied in various fields such as software engineering, political science, medical and linguistic science, etc. There are various methods for topic modeling, which Latent Dirichlet allocation (LDA) is one of the most popular methods in this field. Researchers have proposed various models based on the LDA in topic modeling. According to previous work, this paper can be very useful and valuable for introducing LDA approaches in topic modeling. In this paper, we investigated scholarly articles highly (between 2003 to 2016) related to Topic Modeling based on LDA to discover the research development, current trends and intellectual structure of topic modeling. Also, we summarize challenges and introduce famous tools and datasets in topic modeling based on LDA.Comment: arXiv admin note: text overlap with arXiv:1505.07302 by other author

    Fusing Visual, Textual and Connectivity Clues for Studying Mental Health

    Full text link
    With ubiquity of social media platforms, millions of people are sharing their online persona by expressing their thoughts, moods, emotions, feelings, and even their daily struggles with mental health issues voluntarily and publicly on social media. Unlike the most existing efforts which study depression by analyzing textual content, we examine and exploit multimodal big data to discern depressive behavior using a wide variety of features including individual-level demographics. By developing a multimodal framework and employing statistical techniques for fusing heterogeneous sets of features obtained by processing visual, textual and user interaction data, we significantly enhance the current state-of-the-art approaches for identifying depressed individuals on Twitter (improving the average F1-Score by 5 percent) as well as facilitate demographic inference from social media for broader applications. Besides providing insights into the relationship between demographics and mental health, our research assists in the design of a new breed of demographic-aware health interventions
    • …
    corecore