7,601 research outputs found
How did the discussion go: Discourse act classification in social media conversations
We propose a novel attention based hierarchical LSTM model to classify
discourse act sequences in social media conversations, aimed at mining data
from online discussion using textual meanings beyond sentence level. The very
uniqueness of the task is the complete categorization of possible pragmatic
roles in informal textual discussions, contrary to extraction of
question-answers, stance detection or sarcasm identification which are very
much role specific tasks. Early attempt was made on a Reddit discussion
dataset. We train our model on the same data, and present test results on two
different datasets, one from Reddit and one from Facebook. Our proposed model
outperformed the previous one in terms of domain independence; without using
platform-dependent structural features, our hierarchical LSTM with word
relevance attention mechanism achieved F1-scores of 71\% and 66\% respectively
to predict discourse roles of comments in Reddit and Facebook discussions.
Efficiency of recurrent and convolutional architectures in order to learn
discursive representation on the same task has been presented and analyzed,
with different word and comment embedding schemes. Our attention mechanism
enables us to inquire into relevance ordering of text segments according to
their roles in discourse. We present a human annotator experiment to unveil
important observations about modeling and data annotation. Equipped with our
text-based discourse identification model, we inquire into how heterogeneous
non-textual features like location, time, leaning of information etc. play
their roles in charaterizing online discussions on Facebook
State of the art 2015: a literature review of social media intelligence capabilities for counter-terrorism
Overview
This paper is a review of how information and insight can be drawn from open social media sources. It focuses on the specific research techniques that have emerged, the capabilities they provide, the possible insights they offer, and the ethical and legal questions they raise. These techniques are considered relevant and valuable in so far as they can help to maintain public safety by preventing terrorism, preparing for it, protecting the public from it and pursuing its perpetrators. The report also considers how far this can be achieved against the backdrop of radically changing technology and public attitudes towards surveillance. This is an updated version of a 2013 report paper on the same subject, State of the Art. Since 2013, there have been significant changes in social media, how it is used by terrorist groups, and the methods being developed to make sense of it.
The paper is structured as follows:
Part 1 is an overview of social media use, focused on how it is used by groups of interest to those involved in counter-terrorism. This includes new sections on trends of social media platforms; and a new section on Islamic State (IS).
Part 2 provides an introduction to the key approaches of social media intelligence (henceforth âSOCMINTâ) for counter-terrorism.
Part 3 sets out a series of SOCMINT techniques. For each technique a series of capabilities and insights are considered, the validity and reliability of the method is considered, and how they might be applied to counter-terrorism work explored.
Part 4 outlines a number of important legal, ethical and practical considerations when undertaking SOCMINT work
Public Opinion Analysis Using Hadoop
Recent technological advances in devices, computing, and social networking have revolutionized the world but have also increased the amount of data produced by humans on a large scale. If you collect this data in the form of disks, it may fill an entire football field. According to studies, 2.5 billion gigabytes of new data is generated every day and 2.5 petabytes of data is collected every hour. This rate is still growing enormously. Though all this information produced is meaningful and can be useful when processed, it gets neglected. Social media has gained massive popularity nowadays. Twitter makes it easy to engage users in expressing, sharing and discussing hot latest topics but these public expressions and views are hard to analyze due to the bigger size of the data created by Twitter. In order to perform analysis and predictions over the hot topics in society, latest technologies are needed. The most popular solution for this is Hadoop. Hadoop acts as an open-source framework for developing and executing distributed applications that process very large amounts of data. It stores and process big data in a distributed fashion on large clusters of commodity hardware. The risk, of course, in running on commodity machines is how to handle failure. Hadoop is built with the assumption that hardware will fail and as such, it can easily handle most failures. Hadoop can be used for developing and executing distributed applications that process very large amounts of data. It provides a suitable environment needed for treating or processing huge data. Our job is to extract and store data into its file system and query the data according to the desired output. We propose to perform analysis on Public opinion expressed over Twitter regarding the trending topics of the society by using Apache Hadoop framework along with its services Apache Flume and Apache Hive
Combination of Domain Knowledge and Deep Learning for Sentiment Analysis of Short and Informal Messages on Social Media
Sentiment analysis has been emerging recently as one of the major natural
language processing (NLP) tasks in many applications. Especially, as social
media channels (e.g. social networks or forums) have become significant sources
for brands to observe user opinions about their products, this task is thus
increasingly crucial. However, when applied with real data obtained from social
media, we notice that there is a high volume of short and informal messages
posted by users on those channels. This kind of data makes the existing works
suffer from many difficulties to handle, especially ones using deep learning
approaches. In this paper, we propose an approach to handle this problem. This
work is extended from our previous work, in which we proposed to combine the
typical deep learning technique of Convolutional Neural Networks with domain
knowledge. The combination is used for acquiring additional training data
augmentation and a more reasonable loss function. In this work, we further
improve our architecture by various substantial enhancements, including
negation-based data augmentation, transfer learning for word embeddings, the
combination of word-level embeddings and character-level embeddings, and using
multitask learning technique for attaching domain knowledge rules in the
learning process. Those enhancements, specifically aiming to handle short and
informal messages, help us to enjoy significant improvement in performance once
experimenting on real datasets.Comment: A Preprint of an article accepted for publication by Inderscience in
IJCVR on September 201
Anyone Can Become a Troll: Causes of Trolling Behavior in Online Discussions
In online communities, antisocial behavior such as trolling disrupts
constructive discussion. While prior work suggests that trolling behavior is
confined to a vocal and antisocial minority, we demonstrate that ordinary
people can engage in such behavior as well. We propose two primary trigger
mechanisms: the individual's mood, and the surrounding context of a discussion
(e.g., exposure to prior trolling behavior). Through an experiment simulating
an online discussion, we find that both negative mood and seeing troll posts by
others significantly increases the probability of a user trolling, and together
double this probability. To support and extend these results, we study how
these same mechanisms play out in the wild via a data-driven, longitudinal
analysis of a large online news discussion community. This analysis reveals
temporal mood effects, and explores long range patterns of repeated exposure to
trolling. A predictive model of trolling behavior shows that mood and
discussion context together can explain trolling behavior better than an
individual's history of trolling. These results combine to suggest that
ordinary people can, under the right circumstances, behave like trolls.Comment: Best Paper Award at CSCW 201
Assessing learnersâ satisfaction in collaborative online courses through a big data approach
none4noMonitoring learners' satisfaction (LS) is a vital action for collecting precious information and design valuable online collaborative learning (CL) experiences. Today's CL platforms allow students for performing many online activities, thus generating a huge mass of data that can be processed to provide insights about the level of satisfaction on contents, services, community interactions, and effort. Big Data is a suitable paradigm for real-time processing of large data sets concerning the LS, in the final aim to provide valuable information that may improve the CL experience. Besides, the adoption of Big Data offers the opportunity to implement a non-intrusive and in-process evaluation strategy of online courses that complements the traditional and time-consuming ways to collect feedback (e.g. questionnaires or surveys). Although the application of Big Data in the CL domain is a recent explored research area with limited applications, it may have an important role in the future of online education. By adopting the design science research methodology, this article describes a novel method and approach to analyse individual students' contributions in online learning activities and assess the level of their satisfaction towards the course. A software artefact is also presented, which leverages Learning Analytics in a Big Data context, with the goal to provide in real-time valuable insights that people and systems can use to intervene properly in the program. The contribution of this paper can be of value for both researchers and practitioners: the former can be interested in the approach and method used for LS assessment; the latter can find of interest the system implemented and how it has been tested in a real online course.openElia G.; Solazzo G.; Lorenzo G.; Passiante G.Elia, G.; Solazzo, G.; Lorenzo, G.; Passiante, G
Internet rumor audience response prediction algorithm based on machine learning in big data environment
Rumors are an important factor affecting social stability in some special times. Therefore, the dissemination and prevention and control mechanisms of rumors have always been issues of concern to the academic community and have long been highly valued and widely discussed by experts and scholars. However, in combination with the Internet as a new type of media, although people have begun to pay attention to online rumors, research on it is still relatively fragmented, especially in the cross-domain research specific to the social influence of online rumors, and there is no clear indication of online rumors. The specific definition also did not analyze in detail the internal connection between its influence and group behavior. Therefore, this article will combine actual cases to explore and analyze the spread and influence process of online rumors and show its social influence, hoping to enrich the research of online rumors. Nowadays, the Internet has become the most important carrier to reflect the public grievances. Internet users have expressed their opinions on hot issues such as enterprises, peopleâs livelihood, and government management, which has formed a powerful public opinion pressure, which has far exceeded the traditional media. The hidden dangers of security cannot be ignored. Therefore, how to monitor network public opinion from a large amount of network data is a difficult problem that needs to be solved urgently. Firstly, this consists of four modules: information collection, web page preprocessing, public opinion analysis, and public information report. Secondly, text clustering, the core technology of network public opinion, is optimized, and single-pass algorithm based on double threshold is proposed. Then the dual-threshold single-pass algorithm is optimized based on the MapReduce parallel computing model, and finally a network public opinion collection technology is formed under the background of big data. Simulation results can greatly improve the performance of text clustering and can effectively optimize the design using the parallel computing model based on MapReduce. The average miss rate after optimization is 0.7569 times, the average false alarm rate is 0.5556 times, and C det is 0.5714 times. It proves that the collection technology based on machine learning under the background of big data is effective and has good performance
- âŠ