4 research outputs found

    Reasoning about Cyber Threat Actors

    Get PDF
    abstract: Reasoning about the activities of cyber threat actors is critical to defend against cyber attacks. However, this task is difficult for a variety of reasons. In simple terms, it is difficult to determine who the attacker is, what the desired goals are of the attacker, and how they will carry out their attacks. These three questions essentially entail understanding the attacker’s use of deception, the capabilities available, and the intent of launching the attack. These three issues are highly inter-related. If an adversary can hide their intent, they can better deceive a defender. If an adversary’s capabilities are not well understood, then determining what their goals are becomes difficult as the defender is uncertain if they have the necessary tools to accomplish them. However, the understanding of these aspects are also mutually supportive. If we have a clear picture of capabilities, intent can better be deciphered. If we understand intent and capabilities, a defender may be able to see through deception schemes. In this dissertation, I present three pieces of work to tackle these questions to obtain a better understanding of cyber threats. First, we introduce a new reasoning framework to address deception. We evaluate the framework by building a dataset from DEFCON capture-the-flag exercise to identify the person or group responsible for a cyber attack. We demonstrate that the framework not only handles cases of deception but also provides transparent decision making in identifying the threat actor. The second task uses a cognitive learning model to determine the intent – goals of the threat actor on the target system. The third task looks at understanding the capabilities of threat actors to target systems by identifying at-risk systems from hacker discussions on darkweb websites. To achieve this task we gather discussions from more than 300 darkweb websites relating to malicious hacking.Dissertation/ThesisDoctoral Dissertation Computer Engineering 201

    Do Online Consumers Value Corporate Social Responsibility More in Times of Uncertainty?: Evidence from Online Auctions Conducted During the Onset of the COVID-19 Pandemic

    Get PDF
    The relationships between Corporate Social Responsibility (CSR) and consumer behaviors have been widely explored in the literature. From the consumer standpoint, it has been shown that individuals largely want to be socially responsible actors and that, more than ever, they consider the CSR aspects of products or services when contemplating purchasing decisions. We utilize data from 23,247 online auctions conducted before and during the COVID-19 pandemic to analyze in what way consumer preferences might be influenced by how the CSR characteristics of products are touted in their descriptions. We find that a greater CSR emphasis is positively associated with an increased prospect of an online auction item selling. Additionally, we find CSR is valued more by consumers during a period of economic hardship and social uncertainty (COVID-19). Finally, we find that profit-seeking behaviors by intermediary auction house brokers undermine the effect of CSR on consumer purchasing behavior

    An exploratory study on utilising the web of linked data for product data mining

    Get PDF
    The Linked Open Data practice has led to a significant growth of structured data on the Web. While this has created an unprecedented opportunity for research in the field of Natural Language Processing, there is a lack of systematic studies on how such data can be used to support downstream NLP tasks. This work focuses on the e-commerce domain and explores how we can use such structured data to create language resources for product data mining tasks. To do so, we process billions of structured data points in the form of RDF n-quads, to create multi-million words of product-related corpora that are later used in three different ways for creating language resources: training word-embedding models, continued pre-training of BERT-like language models, and training machine translation models that are used as a proxy to generate product-related keywords. These language resources are then evaluated in three downstream tasks, product classification, linking, and fake review detection using an extensive set of benchmarks. Our results show word embeddings to be the most reliable and consistent method to improve the accuracy on all tasks (with up to 6.9% points in macro-average F1 on some datasets). Contrary to some earlier studies that suggest a rather simple but effective approach such as building domain-specific language models by pre-training using in-domain corpora, our work serves a lesson that adapting these methods to new domains may not be as easy as it seems. We further analyse our datasets and reflect on how our findings can inform future research and practice

    Using Pre-trained Language Models for Toxic Comment Classification

    Get PDF
    Toxic comment classification is a core natural language processing task for combating online toxic comments. It follows the supervised learning paradigm which requires labelled data for the training. A large amount of high-quality training data is empirically beneficial to the model performance. Transferring a pre-trained language model (PLM) to a downstream model allows the downstream model to access more data without creating new labelled data. Despite the increasing research on PLMs in NLP tasks, there remains a fundamental lack of understanding in applying PLMs to toxic comment classification. This work focuses on this area from three perspectives. First, we investigate different transferring strategies for toxic comment classification tasks. We highlight the importance of efficiency during the transfer. The transferring efficiency seeks a reasonable requirement of computational resources and a comparable model performance at the same time. Thus, we explore the continued pre-training in-domain which further pre-trains a PLM with in-domain corpus. We compare different PLMs and different settings for the continued pre-training in-domain. Second, we investigate the limitations of PLMs for toxic comment classification. Taking the most popular PLM, BERT, as the representative model for our study, we focus on studying the identity term bias (i.e. prediction bias towards comments with identity terms, such as "Muslim" and "Black"). To investigate the bias, we conduct both quantitative and qualitative analyses and study the model explanations. We also propose a hypothesis that builds on the potential relationship between the identity term bias and the subjectivity of comments. Third, building on the hypothesis, we propose a novel BERT-based model to mitigate the identity term bias. Our method is different from previous methods that try to suppress the model's attention to identity terms. To do so, we insert the subjectivity into the model along with the suggestion of the presence of identity terms. Our method shows consistent improvements on a range of different toxic comment classification tasks
    corecore