15 research outputs found

    Ethical issues in research using datasets of illicit origin

    Get PDF
    We evaluate the use of data obtained by illicit means against a broad set of ethical and legal issues. Our analysis covers both the direct collection, and secondary uses of, data obtained via illicit means such as exploiting a vulnerability, or unauthorized disclosure. We extract ethical principles from existing advice and guidance and analyse how they have been applied within more than 20 recent peer reviewed papers that deal with illicitly obtained datasets. We find that existing advice and guidance does not address all of the problems that researchers have faced and explain how the papers tackle ethical issues inconsistently, and sometimes not at all. Our analysis reveals not only a lack of application of safeguards but also that legitimate ethical justifications for research are being overlooked. In many cases positive benefits, as well as potential harms, remain entirely unidentified. Few papers record explicit Research Ethics Board (REB) approval for the activity that is described and the justifications given for exemption suggest deficiencies in the REB process.Daniel R. Thomas is supported by a grant from ThreatSTOP Inc. All authors are supported by the EPSRC [grant number EP/M020320/1]. The opinions, findings, and conclusions or recommendations expressed are those of the authors and do not necessarily reflect those of any of the funders

    Exploring the Impact of Password Dataset Distribution on Guessing

    Full text link
    Leaks from password datasets are a regular occurrence. An organization may defend a leak with reassurances that just a small subset of passwords were taken. In this paper we show that the leak of a relatively small number of text-based passwords from an organizations' stored dataset can lead to a further large collection of users being compromised. Taking a sample of passwords from a given dataset of passwords we exploit the knowledge we gain of the distribution to guess other samples from the same dataset. We show theoretically and empirically that the distribution of passwords in the sample follows the same distribution as the passwords in the whole dataset. We propose a function that measures the ability of one distribution to estimate another. Leveraging this we show that a sample of passwords leaked from a given dataset, will compromise the remaining passwords in that dataset better than a sample leaked from another source

    Exploring the Impact of Password Dataset Distribution on Guessing

    Get PDF
    Leaks from password datasets are a regular occurrence. An organization may defend a leak with reassurances that just a small subset of passwords were taken. In this paper we show that the leak of a relatively small number of text-based passwords from an organizations' stored dataset can lead to a further large collection of users being compromised. Taking a sample of passwords from a given dataset of passwords we exploit the knowledge we gain of the distribution to guess other samples from the same dataset. We show theoretically and empirically that the distribution of passwords in the sample follows the same distribution as the passwords in the whole dataset. We propose a function that measures the ability of one distribution to estimate another. Leveraging this we show that a sample of passwords leaked from a given dataset, will compromise the remaining passwords in that dataset better than a sample leaked from another source

    A methodology for large-scale identification of related accounts in underground forums

    Get PDF
    Underground forums allow users to interact with communities focused on illicit activities. They serve as an entry point for actors interested in deviant and criminal topics. Due to the pseudo-anonymity provided, they have become improvised marketplaces for trading illegal products and services, including those used to conduct cyberattacks. Thus, these forums are an important data source for threat intelligence analysts and law enforcement. The use of multiple accounts is forbidden in most forums since these are mostly used for malicious purposes. Still, this is a common practice. Being able to identify an actor or gang behind multiple accounts allows for proper attribution in online investigations, and also to design intervention mechanisms for illegal activities. Existing solutions for multi-account detection either require ground truth data to conduct supervised classification or use manual approaches. In this work, we propose a methodology for the large-scale identification of related accounts in underground forums. These accounts are similar according to the distinctive content posted, and thus are likely to belong to the same actor or group. The methodology applies to various domains and leverages distinctive artefacts and personal information left online by the users. We provide experimental results on a large dataset comprising more than 1.1M user accounts from 15 different forums. We show how this methodology, combined with existing approaches commonly used in social media forensics, can assist with and improve online investigations.This work was partially supported by CERN openlab, the CERN Doctoral Student Programme, the Spanish grants ODIO (PID2019-111429RB-C21 and PID2019-111429RB) and the Region of Madrid grant CYNAMON-CM (P2018/TCS-4566), co-financed by European Structural Funds ESF and FEDER, and Excellence Program EPUC3M1

    Countering distrust in illicit online networks : the dispute resolution strategies of cybercriminals

    Full text link
    The core of this paper is a detailed investigation of the dispute resolution system contained within Darkode, an elite cybercriminal forum. Extracting the dedicated disputes section from within the marketplace, where users can report bad behaviour and register complaints, we carry out content analysis on these threads. This involves both descriptive statistics across the dataset and qualitative analysis of particular posts of interest, leading to a number of new insights. First, the overall level of disputes is quite high, even though members are vetted for entry in the first instance. Second, the lower ranked members of the marketplace are the most highly represented category for both the plaintiffs and defendants. Third, vendors are accused of malfeasance far more often than buyers, and that their “crimes” are most commonly either not providing the product/service or providing a poor one. Fourth, the monetary size of the disputes is surprisingly small. Finally, only 23.1% of disputes reach a clear outcome

    Leveraging New Technologies and Interdisciplinarity to Study Political Behavior, Attitudes, and Beliefs

    Get PDF
    I make use of new technological and scholarly developments to study political sentiments and behavior in three independent papers. In my lead paper, I address an important consequence of political deepfakes (i.e., computer-manipulated video misinformation): does the provision of information about deepfakes cause people to disbelieve real political videos? Through a set of online survey experiments, I find that information that is typical of news coverage of deepfakes induces people to disbelieve real political information. My second paper uses new social media datasets to address pressing questions about how organized American far-right groups (e.g., neo-Nazis, white supremacists, etc.) recruit new members, and whether the rise of Trump was used as a catalyst in far-right recruitment efforts. I made use of prior sociological and anthropological research that found that far-right music scenes (featuring bands with such names as Aryan Terrorism) are a key part of day-to-day functioning of the overwhelming majority of far-right hate groups in the United States. As such, I made use of public databases of song listenership on the music social network, Last.fm, before and after Trump events. I find that online friends of frequent listeners of hate music were more likely to increase their levels of hate music listenership after Trump-related events (e.g., xenophobic tweets, primary election victories, etc.). Finally, in my third paper, I leverage new theoretical frameworks in the cognitive sciences and the growth of large-scale, data-driven voter mobilization programs among non-profit organizations to address the puzzle of “voting habits.” Namely, prior research provides strong empirical evidence that voting in one election makes the average individual more likely to vote in a subsequent election, but this kind of turnout persistence does not comport with habit as it is defined in psychological sciences (elections happen too infrequently and voting is never an automatic behavior). So, in my third paper, I apply Duckworth and Gross’s (2020) Process Model of Behavior Change to turnout persistence to bridge the gap between classic economic models of voter turnout and the large body of rigorous empirical evidence showing turnout persistence. I evaluate the concrete predictions made by this model in a novel dataset of ~1.8 million voters across 9 different independent experiments
    corecore