15 research outputs found
Ethical issues in research using datasets of illicit origin
We evaluate the use of data obtained by illicit means against a broad set of ethical and legal issues. Our analysis covers both the direct collection, and secondary uses of, data obtained via illicit means such as exploiting a vulnerability, or unauthorized disclosure. We extract ethical principles from existing advice and guidance and analyse how they have been applied within more than 20 recent peer reviewed papers that deal with illicitly obtained datasets. We find that existing advice and guidance does not address all of the problems that researchers have faced and explain how the papers tackle ethical issues inconsistently, and sometimes not at all. Our analysis reveals not only a lack of application of safeguards but also that legitimate ethical justifications for research are being overlooked. In many cases positive benefits, as well as potential harms, remain entirely unidentified. Few papers record explicit Research Ethics Board (REB) approval for the activity that is described and the justifications given for exemption suggest deficiencies in the REB process.Daniel R. Thomas is supported by a grant from ThreatSTOP Inc. All authors are supported by the EPSRC [grant number EP/M020320/1]. The opinions, findings, and conclusions or recommendations expressed are those of the authors and do not necessarily reflect those of any of the funders
Exploring the Impact of Password Dataset Distribution on Guessing
Leaks from password datasets are a regular occurrence. An organization may
defend a leak with reassurances that just a small subset of passwords were
taken. In this paper we show that the leak of a relatively small number of
text-based passwords from an organizations' stored dataset can lead to a
further large collection of users being compromised. Taking a sample of
passwords from a given dataset of passwords we exploit the knowledge we gain of
the distribution to guess other samples from the same dataset. We show
theoretically and empirically that the distribution of passwords in the sample
follows the same distribution as the passwords in the whole dataset. We propose
a function that measures the ability of one distribution to estimate another.
Leveraging this we show that a sample of passwords leaked from a given dataset,
will compromise the remaining passwords in that dataset better than a sample
leaked from another source
Exploring the Impact of Password Dataset Distribution on Guessing
Leaks from password datasets are a regular occurrence. An organization may defend a leak with reassurances that just a small subset of passwords were taken. In this paper we show that the leak of a relatively small number of text-based passwords from an organizations' stored dataset can lead to a further large collection of users being compromised. Taking a sample of passwords from a given dataset of passwords we exploit the knowledge we gain of the distribution to guess other samples from the same dataset. We show theoretically and empirically that the distribution of passwords in the sample follows the same distribution as the passwords in the whole dataset. We propose a function that measures the ability of one distribution to estimate another. Leveraging this we show that a sample of passwords leaked from a given dataset, will compromise the remaining passwords in that dataset better than a sample leaked from another source
A methodology for large-scale identification of related accounts in underground forums
Underground forums allow users to interact with communities focused on illicit activities. They serve as an entry point for actors interested in deviant and criminal topics. Due to the pseudo-anonymity provided, they have become improvised marketplaces for trading illegal products and services, including those used to conduct cyberattacks. Thus, these forums are an important data source for threat intelligence analysts and law enforcement. The use of multiple accounts is forbidden in most forums since these are mostly used for malicious purposes. Still, this is a common practice. Being able to identify an actor or gang behind multiple accounts allows for proper attribution in online investigations, and also to design intervention mechanisms for illegal activities. Existing solutions for multi-account detection either require ground truth data to conduct supervised classification or use manual approaches. In this work, we propose a methodology for the large-scale identification of related accounts in underground forums. These accounts are similar according to the distinctive content posted, and thus are likely to belong to the same actor or group. The methodology applies to various domains and leverages distinctive artefacts and personal information left online by the users. We provide experimental results on a large dataset comprising more than 1.1M user accounts from 15 different forums. We show how this methodology, combined with existing approaches commonly used in social media forensics, can assist with and improve online investigations.This work was partially supported by CERN openlab, the CERN Doctoral Student Programme, the Spanish grants ODIO (PID2019-111429RB-C21 and PID2019-111429RB) and the Region of Madrid grant CYNAMON-CM (P2018/TCS-4566), co-financed by European Structural Funds ESF and FEDER, and Excellence Program EPUC3M1
Countering distrust in illicit online networks : the dispute resolution strategies of cybercriminals
The core of this paper is a detailed investigation of the dispute resolution system contained
within Darkode, an elite cybercriminal forum. Extracting the dedicated disputes section from
within the marketplace, where users can report bad behaviour and register complaints, we carry
out content analysis on these threads. This involves both descriptive statistics across the dataset
and qualitative analysis of particular posts of interest, leading to a number of new insights. First,
the overall level of disputes is quite high, even though members are vetted for entry in the first
instance. Second, the lower ranked members of the marketplace are the most highly represented
category for both the plaintiffs and defendants. Third, vendors are accused of malfeasance far
more often than buyers, and that their “crimes” are most commonly either not providing the
product/service or providing a poor one. Fourth, the monetary size of the disputes is surprisingly
small. Finally, only 23.1% of disputes reach a clear outcome
Leveraging New Technologies and Interdisciplinarity to Study Political Behavior, Attitudes, and Beliefs
I make use of new technological and scholarly developments to study political sentiments and behavior in three independent papers. In my lead paper, I address an important consequence of political deepfakes (i.e., computer-manipulated video misinformation): does the provision of information about deepfakes cause people to disbelieve real political videos? Through a set of online survey experiments, I find that information that is typical of news coverage of deepfakes induces people to disbelieve real political information. My second paper uses new social media datasets to address pressing questions about how organized American far-right groups (e.g., neo-Nazis, white supremacists, etc.) recruit new members, and whether the rise of Trump was used as a catalyst in far-right recruitment efforts. I made use of prior sociological and anthropological research that found that far-right music scenes (featuring bands with such names as Aryan Terrorism) are a key part of day-to-day functioning of the overwhelming majority of far-right hate groups in the United States. As such, I made use of public databases of song listenership on the music social network, Last.fm, before and after Trump events. I find that online friends of frequent listeners of hate music were more likely to increase their levels of hate music listenership after Trump-related events (e.g., xenophobic tweets, primary election victories, etc.). Finally, in my third paper, I leverage new theoretical frameworks in the cognitive sciences and the growth of large-scale, data-driven voter mobilization programs among non-profit organizations to address the puzzle of “voting habits.” Namely, prior research provides strong empirical evidence that voting in one election makes the average individual more likely to vote in a subsequent election, but this kind of turnout persistence does not comport with habit as it is defined in psychological sciences (elections happen too infrequently and voting is never an automatic behavior). So, in my third paper, I apply Duckworth and Gross’s (2020) Process Model of Behavior Change to turnout persistence to bridge the gap between classic economic models of voter turnout and the large body of rigorous empirical evidence showing turnout persistence. I evaluate the concrete predictions made by this model in a novel dataset of ~1.8 million voters across 9 different independent experiments