Search CORE

1,088 research outputs found

Leadership in Action: How Top Hackers Behave A Big-Data Approach with Text-Mining and Sentiment Analysis

Author: Biswas Baidyanath
Gupta Gaurav
Mukhopadhyay Arunabha
Publication venue: AIS Electronic Library (AISeL)
Publication date: 03/01/2018
Field of study

This paper examines hacker behavior in dark forums and identifies its significant predictors in the light of leadership theory for communities of practice. We combine techniques from online forum features as well as text-mining and sentiment-analysis of messages. We create a multinomial logistic regression model to achieve role-based hacker classification and validate our model with actual hacker forum data. We identify total number of messages, number of threads, hacker keyword frequency, and sentiments as the most significant predictors of expert hacker behavior. We also demonstrate that while disseminating technical knowledge, the hacker community follows Pareto principle. As a recommendation for future research, we build a unique keyword lexicon of the most significant terms derived by tf-idf measure. Such investigation of hacker behavior is particularly relevant for organizations in proactive prevention of cyber-attacks. Foresight on online hacker behavior can help businesses save losses from breaches and additional costs of attack-preventive measures

AIS Electronic Library (AISeL)

Recommended from our members

Adoption of Social Media Search Systems: An IS Success Model Perspective

Author: Chen Hsinchun
Dang Mandy Yan
Zhang Gavin Yulei
Publication venue: AIS Electronic Library (AISeL)
Publication date: 01/01/2018
Field of study

The social media search system aims at providing an organized and integrated access and search support to a massive amount of unstructured, multilingual, user-generated content in an effective and efficient manner. Previous research on social media analytics mainly focuses on developing and applying advanced analysis methods and/or tools to make sense of the large amount of user-generated data over the Internet. Relatively little effort has been put to specifically examine the social media search system. In this study, we utilize and apply the DeLone and McLean IS Success Model to examine this type of systems. To do it, a lab experiment was conducted, and the results showed that all causal relationships, except for satisfaction to social benefit, specified in the DeLone and McLean IS Success Model hold in the context of the large-scale, social media search system. Specifically, we found that information quality and system quality associated with the system could significantly influence both users’ intention to use and satisfaction toward it, both of which, in turn, had significant impacts on users’ perceived individual benefit and social benefit. In addition, satisfaction could significantly influence intention to use the system. Available at: https://aisel.aisnet.org/pajais/vol10/iss2/4

The University of Arizona

AIS Electronic Library (AISeL)

A systematic survey of online data mining technology intended for law enforcement

Author: Edwards Matthew
Rashid Awais
Rayson Paul
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2015
Field of study

As an increasing amount of crime takes on a digital aspect, law enforcement bodies must tackle an online environment generating huge volumes of data. With manual inspections becoming increasingly infeasible, law enforcement bodies are optimising online investigations through data-mining technologies. Such technologies must be well designed and rigorously grounded, yet no survey of the online data-mining literature exists which examines their techniques, applications and rigour. This article remedies this gap through a systematic mapping study describing online data-mining literature which visibly targets law enforcement applications, using evidence-based practices in survey making to produce a replicable analysis which can be methodologically examined for deficiencies

Lancaster E-Prints

BCS SGAI SMA 2013: the BCS SGAI workshop on social media analysis

Author
Publication venue: M. Jeusfeld
Publication date: 01/01/2013
Field of study

Evaluating text visualization for authorship analysis

Author: A Abbasi
A Abbasi
A Abbasi
A Abbasi
A Abbasi
A Abbasi
A Abbasi
A Balahur
B Kjell
CD Shaw
D Zimbra
D Zimbra
FB Viegas
FJ Tweedie
G Frantzeskou
H Chen
J Chuang
J Donath
J Li
J Myers
JA Wise
JF Burrows
KKR Choo
O De Vel
R Zheng
R Zheng
RH Baayen
RL Ribler
S Argamon
T Erickson
T Urvoy
TJ Fu
V Benjamin
V Benjamin
V Benjamin
W Chung
W Chung
W Chung
W Sack
X Liu
Y Zhang
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Anyone Can Become a Troll: Causes of Trolling Behavior in Online Discussions

Author: Berkowitz Leonard
Ceaparu Irina
Cheng Justin
Cheng Justin
Cho Daegon
Cialdini Robert B
Hutto Clayton J
Muchnik Lev
Pennebaker James W
Qing Li Tanya Beran
Raine Adrian
Salganik Matthew J
Zimbardo Philip G
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 03/02/2017
Field of study

In online communities, antisocial behavior such as trolling disrupts constructive discussion. While prior work suggests that trolling behavior is confined to a vocal and antisocial minority, we demonstrate that ordinary people can engage in such behavior as well. We propose two primary trigger mechanisms: the individual's mood, and the surrounding context of a discussion (e.g., exposure to prior trolling behavior). Through an experiment simulating an online discussion, we find that both negative mood and seeing troll posts by others significantly increases the probability of a user trolling, and together double this probability. To support and extend these results, we study how these same mechanisms play out in the wild via a data-driven, longitudinal analysis of a large online news discussion community. This analysis reveals temporal mood effects, and explores long range patterns of repeated exposure to trolling. A predictive model of trolling behavior shows that mood and discussion context together can explain trolling behavior better than an individual's history of trolling. These results combine to suggest that ordinary people can, under the right circumstances, behave like trolls.Comment: Best Paper Award at CSCW 201

arXiv.org e-Print Archive

Detection of Software Vulnerability Communication in Expert Social Media Channels: A Data-driven Approach

Author: Queiroz Andrei Lima
Publication venue: Dublin Institute of Technology
Publication date: 01/09/2020
Field of study

Conceptually, a vulnerability is: A flaw or weakness in a system’s design, implementation,or operation and management that could be exploited to violate the system’s security policy .Some of these flaws can go undetected and exploited for long periods of time after soft-ware release. Although some software providers are making efforts to avoid this situ-ation, inevitability, users are still exposed to vulnerabilities that allow criminal hackersto take advantage. These vulnerabilities are constantly discussed in specialised forumson social media. Therefore, from a cyber security standpoint, the information found inthese places can be used for countermeasures actions against malicious exploitation ofsoftware. However, manual inspection of the vast quantity of shared content in socialmedia is impractical. For this reason, in this thesis, we analyse the real applicability ofsupervised classification models to automatically detect software vulnerability com-munication in expert social media channels. We cover the following three principal aspects: Firstly, we investigate the applicability of classification models in a range of 5 differ-ent datasets collected from 3 Internet Domains: Dark Web, Deep Web and SurfaceWeb. Since supervised models require labelled data, we have provided a systematiclabelling process using multiple annotators to guarantee accurate labels to carry outexperiments. Using these datasets, we have investigated the classification models withdifferent combinations of learning-based algorithms and traditional features represen-tation. Also, by oversampling the positive instances, we have achieved an increaseof 5% in Positive Recall (on average) in these models. On top of that, we have appiiplied Feature Reduction, Feature Extraction and Feature Selection techniques, whichprovided a reduction on the dimensionality of these models without damaging the accuracy, thus, providing computationally efficient models. Furthermore, in addition to traditional features representation, we have investigated the performance of robust language models, such as Word Embedding (WEMB) andSentence Embedding (SEMB) on the accuracy of classification models. RegardingWEMB, our experiment has shown that this model trained with a small security-vocabulary dataset provides comparable results with WEMB trained in a very large general-vocabulary dataset. Regarding SEMB model, our experiment has shown thatits use overcomes WEMB model in detecting vulnerability communication, recording 8% of Avg. Class Accuracy and 74% of Positive Recall. In addition, we investigate twoDeep Learning algorithms as classifiers, text CNN (Convolutional Neural Network)and RNN (Recurrent Neural Network)-based algorithms, which have improved ourmodel, resulting in the best overall performance for our task

Arrow@TUDublin

Automated Crowdturfing Attacks and Defenses in Online Review Systems

Author: Arisoy Ebru
Fei Geli
Kakhki Arash Molavi
Kim Gyuwan
Lee Kyumin
Lee Kyumin
Li Fangtao
Maas Andrew L.
Maxwell Harper F.
Mukherjee Arjun
Sutskever Ilya
Publication venue
Publication date: 07/09/2017
Field of study

Malicious crowdsourcing forums are gaining traction as sources of spreading misinformation online, but are limited by the costs of hiring and managing human workers. In this paper, we identify a new class of attacks that leverage deep learning language models (Recurrent Neural Networks or RNNs) to automate the generation of fake online reviews for products and services. Not only are these attacks cheap and therefore more scalable, but they can control rate of content output to eliminate the signature burstiness that makes crowdsourced campaigns easy to detect. Using Yelp reviews as an example platform, we show how a two phased review generation and customization attack can produce reviews that are indistinguishable by state-of-the-art statistical detectors. We conduct a survey-based user study to show these reviews not only evade human detection, but also score high on "usefulness" metrics by users. Finally, we develop novel automated defenses against these attacks, by leveraging the lossy transformation introduced by the RNN training and generation cycle. We consider countermeasures against our mechanisms, show that they produce unattractive cost-benefit tradeoffs for attackers, and that they can be further curtailed by simple constraints imposed by online service providers

arXiv.org e-Print Archive

Attribute Sentiment Scoring With Online Text Reviews : Accounting for Language Structure and Attribute Self-Selection

Author: Chakraborty Ishita
Kim Minkyung
Sudhir K.
Publication venue: EliScholar – A Digital Platform for Scholarly Publishing at Yale
Publication date: 01/05/2019
Field of study

The authors address two novel and signiﬁcant challenges in using online text reviews to obtain attribute level ratings. First, they introduce the problem of inferring attribute level sentiment from text data to the marketing literature and develop a deep learning model to address it. While extant bag of words based topic models are fairly good at attribute discovery based on frequency of word or phrase occurrences, associating sentiments to attributes requires exploiting the spatial and sequential structure of language. Second, they illustrate how to correct for attribute self-selection—reviewers choose the subset of attributes to write about—in metrics of attribute level restaurant performance. Using Yelp.com reviews for empirical illustration, they ﬁnd that a hybrid deep learning (CNN-LSTM) model, where CNN and LSTM exploit the spatial and sequential structure of language respectively provide the best performance in accuracy, training speed and training data size requirements. The model does particularly well on the “hard” sentiment classiﬁcation problems. Further, accounting for attribute self-selection signiﬁcantly impacts sentiment scores, especially on attributes that are frequently missing

Yale University