1,171 research outputs found
Crime sensing with big data: the affordances and limitations of using open-source communications to estimate crime patterns
This paper critically examines the affordances and limitations of big data for the study of crime and disorder. We hypothesise that disorder-related posts on Twitter are associated with actual police crime rates. Our results provide evidence that naturally occurring social media data may provide an alternative information source on the crime problem. This paper adds to the emerging field of computational criminology and big data in four ways: i) it estimates the utility of social media data to explain variance in offline crime patterns; ii) it provides the first evidence of the estimation offline crime patterns using a measure of broken windows found in the textual content of social media communications; iii) it tests if the bias present in offline perceptions of disorder is present in online communications; and iv) it takes the results of experiments to critically engage with debates on big data and crime prediction
Towards an ethical framework for publishing Twitter data in social research: taking into account users’ views, online context and algorithmic estimation
New and emerging forms of data, including posts harvested from social media sites such as Twitter, have become part of the sociologist’s data diet. In particular, some researchers see an advantage in the perceived ‘public’ nature of Twitter posts, representing them in publications without seeking informed consent. While such practice may not be at odds with Twitter’s terms of service, we argue there is a need to interpret these through the lens of social science research methods, that imply a more reflexive ethical approach than provided in ‘legal’ accounts of the permissible use of these data in research publications. To challenge some existing practice in Twitter based research, this paper brings to the fore i) views of Twitter users through analysis of online survey data, ii) the effect of context collapse and online disinhibition on the behaviors of users, and iii) the publication of identifiable sensitive classifications derived from algorithms
Us and them: identifying cyber hate on Twitter across multiple protected characteristics
Hateful and antagonistic content published and propagated via the World Wide Web
has the potential to cause harm and suffering on an individual basis, and lead to
social tension and disorder beyond cyber space. Despite new legislation aimed at
prosecuting those who misuse new forms of communication to post threatening,
harassing, or grossly offensive language - or cyber hate - and the fact large social
media companies have committed to protecting their users from harm, it goes largely
unpunished due to difficulties in policing online public spaces. To support the
automatic detection of cyber hate online, specifically on Twitter, we build multiple
individual models to classify cyber hate for a range of protected characteristics
including race, disability and sexual orientation. We use text parsing to extract typed
dependencies, which represent syntactic and grammatical relationships between
words, and are shown to capture ‘othering’ language - consistently improving
machine classification for different types of cyber hate beyond the use of a Bag of
Words and known hateful terms. Furthermore, we build a data-driven blended model
of cyber hate to improve classification where more than one protected characteristic
may be attacked (
e.g.
race and sexual orientation), contributing to the nascent study
of intersectionality in hate crime
Multi-class machine classification of suicide-related communication on Twitter
The World Wide Web, and online social networks in particular, have increased connectivity between people such that information can spread to millions of people in a matter of minutes. This form of online collective contagion has provided many benefits to society, such as providing reassurance and emergency management in the immediate aftermath of natural disasters. However, it also poses a potential risk to vulnerable Web users who receive this information and could subsequently come to harm. One example of this would be the spread of suicidal ideation in online social networks, about which concerns have been raised. In this paper we report the results of a number of machine classifiers built with the aim of classifying text relating to suicide on Twitter. The classifier distinguishes between the more worrying content, such as suicidal ideation, and other suicide-related topics such as reporting of a suicide, memorial, campaigning and support. It also aims to identify flippant references to suicide. We built a set of baseline classifiers using lexical, structural, emotive and psychological features extracted from Twitter posts. We then improved on the baseline classifiers by building an ensemble classifier using the Rotation Forest algorithm and a Maximum Probability voting classification decision method, based on the outcome of base classifiers. This achieved an F-measure of 0.728 overall (for 7 classes, including suicidal ideation) and 0.69 for the suicidal ideation class. We summarise the results by reflecting on the most significant predictive principle components of the suicidal ideation class to provide insight into the language used on Twitter to express suicidal ideation. Finally, we perform a 12-month case study of suicide-related posts where we further evaluate the classification approach - showing a sustained classification performance and providing anonymous insights into the trends and demographic profile of Twitter users posting content of this type
Suspended accounts: A source of Tweets with disgust and anger emotions for augmenting hate speech data sample
In this paper we present a proposal to address the problem of the pricey and unreliable human annotation, which is important for detection of hate speech from the web contents. In particular, we propose to use the text that are produced from the suspended accounts in the aftermath of a hateful event as subtle and reliable source for hate speech prediction. The proposal was motivated after implementing emotion analysis on three sources of data sets: suspended, active and neutral ones, i.e. the first two sources of data sets contain hateful tweets from suspended accounts and active accounts, respectively, whereas the third source of data sets contain neutral tweets only. The emotion analysis indicated that the tweets from suspended accounts show more disgust, negative, fear and sadness emotions than the ones from active accounts, although tweets from both types of accounts might be annotated as hateful ones by human annotators. We train two Random Forest classifiers based on the semantic meaning of tweets respectively from suspended and active accounts, and evaluate the prediction accuracy of the two classifiers on unseen data. The results show that the classifier trained on the tweets from suspended accounts outperformed the one trained on the tweets from active accounts by 16% of overall F-score
Malware classification using self organising feature maps and machine activity data
In this article we use machine activity metrics to automatically distinguish between malicious and trusted portable executable software samples. The motivation stems from the growth of cyber attacks using techniques that have been employed to surreptitiously deploy Advanced Persistent Threats (APTs). APTs are becoming more sophisticated and able to obfuscate much of their identifiable features through encryption, custom code bases and in-memory execution. Our hypothesis is that we can produce a high degree of accuracy in distinguishing malicious from trusted samples using Machine Learning with features derived from the inescapable footprint left behind on a computer system during execution. This includes CPU, RAM, Swap use and network traffic at a count level of bytes and packets. These features are continuous and allow us to be more flexible with the classification of samples than discrete features such as API calls (which can also be obfuscated) that form the main feature of the extant literature. We use these continuous data and develop a novel classification method using Self Organizing Feature Maps to reduce over fitting during training through the ability to create unsupervised clusters of similar ‘behaviour’ that are subsequently used as features for classification, rather than using the raw data. We compare our method to a set of machine classification methods that have been applied in previous research and demonstrate an increase of between 7.24% and 25.68% in classification accuracy using our method and an unseen dataset over the range of other machine classification methods that have been applied in previous research
Methodology for Designing Decision Support Systems for Visualising and Mitigating Supply Chain Cyber Risk from IoT Technologies
This paper proposes a methodology for designing decision support systems for
visualising and mitigating the Internet of Things cyber risks. Digital
technologies present new cyber risk in the supply chain which are often not
visible to companies participating in the supply chains. This study
investigates how the Internet of Things cyber risks can be visualised and
mitigated in the process of designing business and supply chain strategies. The
emerging DSS methodology present new findings on how digital technologies
affect business and supply chain systems. Through epistemological analysis, the
article derives with a decision support system for visualising supply chain
cyber risk from Internet of Things digital technologies. Such methods do not
exist at present and this represents the first attempt to devise a decision
support system that would enable practitioners to develop a step by step
process for visualising, assessing and mitigating the emerging cyber risk from
IoT technologies on shared infrastructure in legacy supply chain systems
- …
