1,088 research outputs found
Leadership in Action: How Top Hackers Behave A Big-Data Approach with Text-Mining and Sentiment Analysis
This paper examines hacker behavior in dark forums and identifies its significant predictors in the light of leadership theory for communities of practice. We combine techniques from online forum features as well as text-mining and sentiment-analysis of messages. We create a multinomial logistic regression model to achieve role-based hacker classification and validate our model with actual hacker forum data. We identify total number of messages, number of threads, hacker keyword frequency, and sentiments as the most significant predictors of expert hacker behavior. We also demonstrate that while disseminating technical knowledge, the hacker community follows Pareto principle. As a recommendation for future research, we build a unique keyword lexicon of the most significant terms derived by tf-idf measure. Such investigation of hacker behavior is particularly relevant for organizations in proactive prevention of cyber-attacks. Foresight on online hacker behavior can help businesses save losses from breaches and additional costs of attack-preventive measures
Recommended from our members
Adoption of Social Media Search Systems: An IS Success Model Perspective
The social media search system aims at providing an organized and integrated access and search support to a massive amount of unstructured, multilingual, user-generated content in an effective and efficient manner. Previous research on social media analytics mainly focuses on developing and applying advanced analysis methods and/or tools to make sense of the large amount of user-generated data over the Internet. Relatively little effort has been put to specifically examine the social media search system. In this study, we utilize and apply the DeLone and McLean IS Success Model to examine this type of systems. To do it, a lab experiment was conducted, and the results showed that all causal relationships, except for satisfaction to social benefit, specified in the DeLone and McLean IS Success Model hold in the context of the large-scale, social media search system. Specifically, we found that information quality and system quality associated with the system could significantly influence both usersâ intention to use and satisfaction toward it, both of which, in turn, had significant impacts on usersâ perceived individual benefit and social benefit. In addition, satisfaction could significantly influence intention to use the system.
Available at: https://aisel.aisnet.org/pajais/vol10/iss2/4
A systematic survey of online data mining technology intended for law enforcement
As an increasing amount of crime takes on a digital aspect, law enforcement bodies must tackle an online environment generating huge volumes of data. With manual inspections becoming increasingly infeasible, law enforcement bodies are optimising online investigations through data-mining technologies. Such technologies must be well designed and rigorously grounded, yet no survey of the online data-mining literature exists which examines their techniques, applications and rigour. This article remedies this gap through a systematic mapping study describing online data-mining literature which visibly targets law enforcement applications, using evidence-based practices in survey making to produce a replicable analysis which can be methodologically examined for deficiencies
Anyone Can Become a Troll: Causes of Trolling Behavior in Online Discussions
In online communities, antisocial behavior such as trolling disrupts
constructive discussion. While prior work suggests that trolling behavior is
confined to a vocal and antisocial minority, we demonstrate that ordinary
people can engage in such behavior as well. We propose two primary trigger
mechanisms: the individual's mood, and the surrounding context of a discussion
(e.g., exposure to prior trolling behavior). Through an experiment simulating
an online discussion, we find that both negative mood and seeing troll posts by
others significantly increases the probability of a user trolling, and together
double this probability. To support and extend these results, we study how
these same mechanisms play out in the wild via a data-driven, longitudinal
analysis of a large online news discussion community. This analysis reveals
temporal mood effects, and explores long range patterns of repeated exposure to
trolling. A predictive model of trolling behavior shows that mood and
discussion context together can explain trolling behavior better than an
individual's history of trolling. These results combine to suggest that
ordinary people can, under the right circumstances, behave like trolls.Comment: Best Paper Award at CSCW 201
Detection of Software Vulnerability Communication in Expert Social Media Channels: A Data-driven Approach
Conceptually, a vulnerability is: A flaw or weakness in a systemâs design, implementation,or operation and management that could be exploited to violate the systemâs security policy .Some of these flaws can go undetected and exploited for long periods of time after soft-ware release. Although some software providers are making efforts to avoid this situ-ation, inevitability, users are still exposed to vulnerabilities that allow criminal hackersto take advantage. These vulnerabilities are constantly discussed in specialised forumson social media. Therefore, from a cyber security standpoint, the information found inthese places can be used for countermeasures actions against malicious exploitation ofsoftware. However, manual inspection of the vast quantity of shared content in socialmedia is impractical. For this reason, in this thesis, we analyse the real applicability ofsupervised classification models to automatically detect software vulnerability com-munication in expert social media channels. We cover the following three principal aspects: Firstly, we investigate the applicability of classification models in a range of 5 differ-ent datasets collected from 3 Internet Domains: Dark Web, Deep Web and SurfaceWeb. Since supervised models require labelled data, we have provided a systematiclabelling process using multiple annotators to guarantee accurate labels to carry outexperiments. Using these datasets, we have investigated the classification models withdifferent combinations of learning-based algorithms and traditional features represen-tation. Also, by oversampling the positive instances, we have achieved an increaseof 5% in Positive Recall (on average) in these models. On top of that, we have appiiplied Feature Reduction, Feature Extraction and Feature Selection techniques, whichprovided a reduction on the dimensionality of these models without damaging the accuracy, thus, providing computationally efficient models. Furthermore, in addition to traditional features representation, we have investigated the performance of robust language models, such as Word Embedding (WEMB) andSentence Embedding (SEMB) on the accuracy of classification models. RegardingWEMB, our experiment has shown that this model trained with a small security-vocabulary dataset provides comparable results with WEMB trained in a very large general-vocabulary dataset. Regarding SEMB model, our experiment has shown thatits use overcomes WEMB model in detecting vulnerability communication, recording 8% of Avg. Class Accuracy and 74% of Positive Recall. In addition, we investigate twoDeep Learning algorithms as classifiers, text CNN (Convolutional Neural Network)and RNN (Recurrent Neural Network)-based algorithms, which have improved ourmodel, resulting in the best overall performance for our task
Automated Crowdturfing Attacks and Defenses in Online Review Systems
Malicious crowdsourcing forums are gaining traction as sources of spreading
misinformation online, but are limited by the costs of hiring and managing
human workers. In this paper, we identify a new class of attacks that leverage
deep learning language models (Recurrent Neural Networks or RNNs) to automate
the generation of fake online reviews for products and services. Not only are
these attacks cheap and therefore more scalable, but they can control rate of
content output to eliminate the signature burstiness that makes crowdsourced
campaigns easy to detect.
Using Yelp reviews as an example platform, we show how a two phased review
generation and customization attack can produce reviews that are
indistinguishable by state-of-the-art statistical detectors. We conduct a
survey-based user study to show these reviews not only evade human detection,
but also score high on "usefulness" metrics by users. Finally, we develop novel
automated defenses against these attacks, by leveraging the lossy
transformation introduced by the RNN training and generation cycle. We consider
countermeasures against our mechanisms, show that they produce unattractive
cost-benefit tradeoffs for attackers, and that they can be further curtailed by
simple constraints imposed by online service providers
Attribute Sentiment Scoring With Online Text Reviews : Accounting for Language Structure and Attribute Self-Selection
The authors address two novel and signiďŹcant challenges in using online text reviews to obtain attribute level ratings. First, they introduce the problem of inferring attribute level sentiment from text data to the marketing literature and develop a deep learning model to address it. While extant bag of words based topic models are fairly good at attribute discovery based on frequency of word or phrase occurrences, associating sentiments to attributes requires exploiting the spatial and sequential structure of language. Second, they illustrate how to correct for attribute self-selectionâreviewers choose the subset of attributes to write aboutâin metrics of attribute level restaurant performance. Using Yelp.com reviews for empirical illustration, they ďŹnd that a hybrid deep learning (CNN-LSTM) model, where CNN and LSTM exploit the spatial and sequential structure of language respectively provide the best performance in accuracy, training speed and training data size requirements. The model does particularly well on the âhardâ sentiment classiďŹcation problems. Further, accounting for attribute self-selection signiďŹcantly impacts sentiment scores, especially on attributes that are frequently missing
- âŚ