1,171 research outputs found

    Crime sensing with big data: the affordances and limitations of using open-source communications to estimate crime patterns

    Get PDF
    This paper critically examines the affordances and limitations of big data for the study of crime and disorder. We hypothesise that disorder-related posts on Twitter are associated with actual police crime rates. Our results provide evidence that naturally occurring social media data may provide an alternative information source on the crime problem. This paper adds to the emerging field of computational criminology and big data in four ways: i) it estimates the utility of social media data to explain variance in offline crime patterns; ii) it provides the first evidence of the estimation offline crime patterns using a measure of broken windows found in the textual content of social media communications; iii) it tests if the bias present in offline perceptions of disorder is present in online communications; and iv) it takes the results of experiments to critically engage with debates on big data and crime prediction

    Towards an ethical framework for publishing Twitter data in social research: taking into account users’ views, online context and algorithmic estimation

    Get PDF
    New and emerging forms of data, including posts harvested from social media sites such as Twitter, have become part of the sociologist’s data diet. In particular, some researchers see an advantage in the perceived ‘public’ nature of Twitter posts, representing them in publications without seeking informed consent. While such practice may not be at odds with Twitter’s terms of service, we argue there is a need to interpret these through the lens of social science research methods, that imply a more reflexive ethical approach than provided in ‘legal’ accounts of the permissible use of these data in research publications. To challenge some existing practice in Twitter based research, this paper brings to the fore i) views of Twitter users through analysis of online survey data, ii) the effect of context collapse and online disinhibition on the behaviors of users, and iii) the publication of identifiable sensitive classifications derived from algorithms

    Us and them: identifying cyber hate on Twitter across multiple protected characteristics

    Get PDF
    Hateful and antagonistic content published and propagated via the World Wide Web has the potential to cause harm and suffering on an individual basis, and lead to social tension and disorder beyond cyber space. Despite new legislation aimed at prosecuting those who misuse new forms of communication to post threatening, harassing, or grossly offensive language - or cyber hate - and the fact large social media companies have committed to protecting their users from harm, it goes largely unpunished due to difficulties in policing online public spaces. To support the automatic detection of cyber hate online, specifically on Twitter, we build multiple individual models to classify cyber hate for a range of protected characteristics including race, disability and sexual orientation. We use text parsing to extract typed dependencies, which represent syntactic and grammatical relationships between words, and are shown to capture ‘othering’ language - consistently improving machine classification for different types of cyber hate beyond the use of a Bag of Words and known hateful terms. Furthermore, we build a data-driven blended model of cyber hate to improve classification where more than one protected characteristic may be attacked ( e.g. race and sexual orientation), contributing to the nascent study of intersectionality in hate crime

    Multi-class machine classification of suicide-related communication on Twitter

    Get PDF
    The World Wide Web, and online social networks in particular, have increased connectivity between people such that information can spread to millions of people in a matter of minutes. This form of online collective contagion has provided many benefits to society, such as providing reassurance and emergency management in the immediate aftermath of natural disasters. However, it also poses a potential risk to vulnerable Web users who receive this information and could subsequently come to harm. One example of this would be the spread of suicidal ideation in online social networks, about which concerns have been raised. In this paper we report the results of a number of machine classifiers built with the aim of classifying text relating to suicide on Twitter. The classifier distinguishes between the more worrying content, such as suicidal ideation, and other suicide-related topics such as reporting of a suicide, memorial, campaigning and support. It also aims to identify flippant references to suicide. We built a set of baseline classifiers using lexical, structural, emotive and psychological features extracted from Twitter posts. We then improved on the baseline classifiers by building an ensemble classifier using the Rotation Forest algorithm and a Maximum Probability voting classification decision method, based on the outcome of base classifiers. This achieved an F-measure of 0.728 overall (for 7 classes, including suicidal ideation) and 0.69 for the suicidal ideation class. We summarise the results by reflecting on the most significant predictive principle components of the suicidal ideation class to provide insight into the language used on Twitter to express suicidal ideation. Finally, we perform a 12-month case study of suicide-related posts where we further evaluate the classification approach - showing a sustained classification performance and providing anonymous insights into the trends and demographic profile of Twitter users posting content of this type

    Suspended accounts: A source of Tweets with disgust and anger emotions for augmenting hate speech data sample

    Get PDF
    In this paper we present a proposal to address the problem of the pricey and unreliable human annotation, which is important for detection of hate speech from the web contents. In particular, we propose to use the text that are produced from the suspended accounts in the aftermath of a hateful event as subtle and reliable source for hate speech prediction. The proposal was motivated after implementing emotion analysis on three sources of data sets: suspended, active and neutral ones, i.e. the first two sources of data sets contain hateful tweets from suspended accounts and active accounts, respectively, whereas the third source of data sets contain neutral tweets only. The emotion analysis indicated that the tweets from suspended accounts show more disgust, negative, fear and sadness emotions than the ones from active accounts, although tweets from both types of accounts might be annotated as hateful ones by human annotators. We train two Random Forest classifiers based on the semantic meaning of tweets respectively from suspended and active accounts, and evaluate the prediction accuracy of the two classifiers on unseen data. The results show that the classifier trained on the tweets from suspended accounts outperformed the one trained on the tweets from active accounts by 16% of overall F-score

    Malware classification using self organising feature maps and machine activity data

    Get PDF
    In this article we use machine activity metrics to automatically distinguish between malicious and trusted portable executable software samples. The motivation stems from the growth of cyber attacks using techniques that have been employed to surreptitiously deploy Advanced Persistent Threats (APTs). APTs are becoming more sophisticated and able to obfuscate much of their identifiable features through encryption, custom code bases and in-memory execution. Our hypothesis is that we can produce a high degree of accuracy in distinguishing malicious from trusted samples using Machine Learning with features derived from the inescapable footprint left behind on a computer system during execution. This includes CPU, RAM, Swap use and network traffic at a count level of bytes and packets. These features are continuous and allow us to be more flexible with the classification of samples than discrete features such as API calls (which can also be obfuscated) that form the main feature of the extant literature. We use these continuous data and develop a novel classification method using Self Organizing Feature Maps to reduce over fitting during training through the ability to create unsupervised clusters of similar ‘behaviour’ that are subsequently used as features for classification, rather than using the raw data. We compare our method to a set of machine classification methods that have been applied in previous research and demonstrate an increase of between 7.24% and 25.68% in classification accuracy using our method and an unseen dataset over the range of other machine classification methods that have been applied in previous research

    Methodology for Designing Decision Support Systems for Visualising and Mitigating Supply Chain Cyber Risk from IoT Technologies

    Full text link
    This paper proposes a methodology for designing decision support systems for visualising and mitigating the Internet of Things cyber risks. Digital technologies present new cyber risk in the supply chain which are often not visible to companies participating in the supply chains. This study investigates how the Internet of Things cyber risks can be visualised and mitigated in the process of designing business and supply chain strategies. The emerging DSS methodology present new findings on how digital technologies affect business and supply chain systems. Through epistemological analysis, the article derives with a decision support system for visualising supply chain cyber risk from Internet of Things digital technologies. Such methods do not exist at present and this represents the first attempt to devise a decision support system that would enable practitioners to develop a step by step process for visualising, assessing and mitigating the emerging cyber risk from IoT technologies on shared infrastructure in legacy supply chain systems
    corecore