126 research outputs found

    Learning representations for information mining from text corpora with applications to cyber threat intelligence

    Get PDF
    Doctor of PhilosophyDepartment of Computer ScienceWilliam H HsuThis research develops learning representations and architectures for natural language understanding, within an information mining framework for analysis of open-source cyber threat intelligence (CTI). Both contextual (sequential) and topological (graph-based) encodings of short text documents are modeled. To accomplish this goal, a series of machine learning tasks are defined, and learning representations are developed to detect crucial information in these documents: cyber threat entities, types, and events. Using hybrid transformer-based implementations of these learning models, CTI-relevant key phrases are identified, and specific cyber threats are classified using classification models based upon graph neural networks (GNNs). The central scientific goal here is to learn features from corpora consisting of short texts for multiple document categorization and information extraction sub-tasks to improve the accuracy, precision, recall, and F1 score of a multimodal framework. To address a performance gap (e.g., classification accuracy) for text classification, a novel multi-dimensional Feature Attended Parametric Kernel Graph Neural Network (APKGNN) layer is introduced to construct a GNN model in this dissertation where the text classification task is transformed into a graph node classification task. To extract key phrases, contextual semantic tagging with text sequences as input to transformers is used which improves a transformer's learning representation. By deriving a set of characteristics ranging from low-level (lexical) natural language features to summative extracts, this research focuses on reducing human effort by adopting a combination of semi-supervised approaches for learning syntactic, semantic, and topological feature representation. The following central research questions are addressed: can CTI-relevant key phrases be identified effectively with reduced human effort; whether threats be classified into different types; and can threat events be detected and ranked from social media like Twitter data and other benchmark data sets. Developing an integrated system to answer these research questions showed that user-specific information in shared social media content, and connections (followers and followees) are effective and crucial for algorithmically tracing active CTI user accounts from open-source social network data. All these components, used in combination, facilitate the understanding of key analytical tasks and objectives of open-source cyber-threat intelligence

    NLP-Based Techniques for Cyber Threat Intelligence

    Full text link
    In the digital era, threat actors employ sophisticated techniques for which, often, digital traces in the form of textual data are available. Cyber Threat Intelligence~(CTI) is related to all the solutions inherent to data collection, processing, and analysis useful to understand a threat actor's targets and attack behavior. Currently, CTI is assuming an always more crucial role in identifying and mitigating threats and enabling proactive defense strategies. In this context, NLP, an artificial intelligence branch, has emerged as a powerful tool for enhancing threat intelligence capabilities. This survey paper provides a comprehensive overview of NLP-based techniques applied in the context of threat intelligence. It begins by describing the foundational definitions and principles of CTI as a major tool for safeguarding digital assets. It then undertakes a thorough examination of NLP-based techniques for CTI data crawling from Web sources, CTI data analysis, Relation Extraction from cybersecurity data, CTI sharing and collaboration, and security threats of CTI. Finally, the challenges and limitations of NLP in threat intelligence are exhaustively examined, including data quality issues and ethical considerations. This survey draws a complete framework and serves as a valuable resource for security professionals and researchers seeking to understand the state-of-the-art NLP-based threat intelligence techniques and their potential impact on cybersecurity

    Rethinking Privacy and Security Mechanisms in Online Social Networks

    Get PDF
    With billions of users, Online Social Networks(OSNs) are amongst the largest scale communication applications on the Internet. OSNs enable users to easily access news from local and worldwide, as well as share information publicly and interact with friends. On the negative side, OSNs are also abused by spammers to distribute ads or malicious information, such as scams, fraud, and even manipulate public political opinions. Having achieved significant commercial success with large amount of user information, OSNs do treat the security and privacy of their users seriously and provide several mechanisms to reinforce their account security and information privacy. However, the efficacy of those measures is either not thoroughly validated or in need to be improved. In sight of cyber criminals and potential privacy threats on OSNs, we focus on the evaluations and improvements of OSN user privacy configurations, account security protection mechanisms, and trending topic security in this dissertation. We first examine the effectiveness of OSN privacy settings on protecting user privacy. Given each privacy configuration, we propose a corresponding scheme to reveal the target user\u27s basic profile and connection information starting from some leaked connections on the user\u27s homepage. Based on the dataset we collected on Facebook, we calculate the privacy exposure in each privacy setting type and measure the accuracy of our privacy inference schemes with different amount of public information. The evaluation results show that (1) a user\u27s private basic profile can be inferred with high accuracy and (2) connections can be revealed in a significant portion based on even a small number of directly leaked connections. Secondly, we propose a behavioral-profile-based method to detect OSN user account compromisation in a timely manner. Specifically, we propose eight behavioral features to portray a user\u27s social behavior. A user\u27s statistical distributions of those feature values comprise its behavioral profile. Based on the sample data we collected from Facebook, we observe that each user\u27s activities are highly likely to conform to its behavioral profile while two different user\u27s profile tend to diverge from each other, which can be employed for compromisation detection. The evaluation result shows that the more complete and accurate a user\u27s behavioral profile can be built the more accurately compromisation can be detected. Finally, we investigate the manipulation of OSN trending topics. Based on the dataset we collected from Twitter, we manifest the manipulation of trending and a suspect spamming infrastructure. We then measure how accurately the five factors (popularity, coverage, transmission, potential coverage, and reputation) can predict trending using an SVM classifier. We further study the interaction patterns between authenticated accounts and malicious accounts in trending. at last we demonstrate the threats of compromised accounts and sybil accounts to trending through simulation and discuss countermeasures against trending manipulation

    INTRUSION PREDICTION SYSTEM FOR CLOUD COMPUTING AND NETWORK BASED SYSTEMS

    Get PDF
    Cloud computing offers cost effective computational and storage services with on-demand scalable capacities according to the customers’ needs. These properties encourage organisations and individuals to migrate from classical computing to cloud computing from different disciplines. Although cloud computing is a trendy technology that opens the horizons for many businesses, it is a new paradigm that exploits already existing computing technologies in new framework rather than being a novel technology. This means that cloud computing inherited classical computing problems that are still challenging. Cloud computing security is considered one of the major problems, which require strong security systems to protect the system, and the valuable data stored and processed in it. Intrusion detection systems are one of the important security components and defence layer that detect cyber-attacks and malicious activities in cloud and non-cloud environments. However, there are some limitations such as attacks were detected at the time that the damage of the attack was already done. In recent years, cyber-attacks have increased rapidly in volume and diversity. In 2013, for example, over 552 million customers’ identities and crucial information were revealed through data breaches worldwide [3]. These growing threats are further demonstrated in the 50,000 daily attacks on the London Stock Exchange [4]. It has been predicted that the economic impact of cyber-attacks will cost the global economy $3 trillion on aggregate by 2020 [5]. This thesis focused on proposing an Intrusion Prediction System that is capable of sensing an attack before it happens in cloud or non-cloud environments. The proposed solution is based on assessing the host system vulnerabilities and monitoring the network traffic for attacks preparations. It has three main modules. The monitoring module observes the network for any intrusion preparations. This thesis proposes a new dynamic-selective statistical algorithm for detecting scan activities, which is part of reconnaissance that represents an essential step in network attack preparation. The proposed method performs a statistical selective analysis for network traffic searching for an attack or intrusion indications. This is achieved by exploring and applying different statistical and probabilistic methods that deal with scan detection. The second module of the prediction system is vulnerabilities assessment that evaluates the weaknesses and faults of the system and measures the probability of the system to fall victim to cyber-attack. Finally, the third module is the prediction module that combines the output of the two modules and performs risk assessments of the system security from intrusions prediction. The results of the conducted experiments showed that the suggested system outperforms the analogous methods in regards to performance of network scan detection, which means accordingly a significant improvement to the security of the targeted system. The scanning detection algorithm has achieved high detection accuracy with 0% false negative and 50% false positive. In term of performance, the detection algorithm consumed only 23% of the data needed for analysis compared to the best performed rival detection method

    Critical analysis of Big Data Challenges and analytical methods

    Get PDF
    Big Data (BD), with their potential to ascertain valued insights for enhanced decision-making process, have recently attracted substantial interest from both academics and practitioners. Big Data Analytics (BDA) is increasingly becoming a trending practice that many organizations are adopting with the purpose of constructing valuable information from BD. The analytics process, including the deployment and use of BDA tools, is seen by organizations as a tool to improve operational efficiency though it has strategic potential, drive new revenue streams and gain competitive advantages over business rivals. However, there are different types of analytic applications to consider. Therefore, prior to hasty use and buying costly BD tools, there is a need for organizations to first understand the BDA landscape. Given the significant nature of the BD and BDA, this paper presents a state-of-the-art review that presents a holistic view of the BD challenges and BDA methods theorized/proposed/employed by organizations to help others understand this landscape with the objective of making robust investment decisions. In doing so, systematically analysing and synthesizing the extant research published on BD and BDA area. More specifically, the authors seek to answer the following two principal questions: Q1 – What are the different types of BD challenges theorized/proposed/confronted by organizations? and Q2 – What are the different types of BDA methods theorized/proposed/employed to overcome BD challenges?. This systematic literature review (SLR) is carried out through observing and understanding the past trends and extant patterns/themes in the BDA research area, evaluating contributions, summarizing knowledge, thereby identifying limitations, implications and potential further research avenues to support the academic community in exploring research themes/patterns. Thus, to trace the implementation of BD strategies, a profiling method is employed to analyze articles (published in English-speaking peer-reviewed journals between 1996 and 2015) extracted from the Scopus database. The analysis presented in this paper has identified relevant BD research studies that have contributed both conceptually and empirically to the expansion and accrual of intellectual wealth to the BDA in technology and organizational resource management discipline

    Technology Assessment of Dual-Use ICTs - How to Assess Diffusion, Governance and Design

    Get PDF
    Technologies that can be used in military and civilian applications are referred to as dual-use. The dual-use nature of many information and communications technologies (ICTs) raises new questions for research and development for national, international, and human security. Measures to deal with the risks associated with the various dual-use technologies, including proliferation control, design approaches, and policy measures, vary widely. For example, Autonomous Weapon Systems (AWS) have not yet been regulated, while cryptographic products are subject to export and import controls. Innovations in artificial intelligence (AI), robotics, cybersecurity, and automated analysis of publicly available data raise new questions about their respective dual-use risks. Dual-use risks have been systematically discussed so far, especially in the life sciences, which have contributed to the development of methods for assessment and risk management. Dual-use risks arise, among other things, from the fact that safety-critical technologies can be easily disseminated or modified, as well as used as part of a weapon system. Therefore, the development and adaptation of robots and software requires an independent consideration that builds on the insights of related dual-use discourses. Therefore, this dissertation considers the management of such risks in terms of the proliferation, regulation, and design of individual dual-use information technologies. Technology Assessment (TA) is the epistemological framework for this work, bringing together the concepts and approaches of Critical Security Studies (CSS) and Human-Computer Interaction (HCI) to help evaluate and shape dual-use technologies. In order to identify the diffusion of dual-use at an early stage, the dissertation first examines the diffusion of dual-use innovations between civilian and military research in expert networks on LinkedIn, as well as on the basis of AI patents in a patent network. The results show low diffusion and tend to confirm existing studies on diffusion in patent networks. In the following section, the regulation of dual-use technologies is examined in the paper through two case studies. The first study uses a discourse analysis to show the value conflicts with regard to the regulation of autonomous weapons systems using the concept of Meaningful Human Control (MHC), while a second study, as a long-term comparative case study, analyzes the change and consequences of the regulation of strong cryptography in the U.S. as well as the programs of intelligence agencies for mass surveillance. Both cases point to the central role of private companies, both in the production of AWS and as intermediaries for the dissemination of encryption, as well as surveillance intermediaries. Subsequently, the dissertation examines the design of a dual-use technology using an Open Source Intelligence System (OSINT) for cybersecurity. For this purpose, conceptual, empirical, and technical studies are conducted as part of the Value-Sensitive Design (VSD) framework. During the studies, implications for research on and design of OSINT were identified. For example, the representative survey of the German population has shown that transparency of use while reducing mistrust is associated with higher acceptance of such systems. Additionally, it has been shown that data sparsity through the use of expert networks has many positive effects, not only improving the performance of the system, but is also preferable for legal and social reasons. Thus, the work contributes to the understanding of specific dual-use risks of AI, the regulation of AWS and cryptography, and the design of OSINT in cybersecurity. By combining concepts from CSS and participatory design methods in HCI, this work provides an interdisciplinary and multi-method contribution

    The Palgrave Handbook of Digital Russia Studies

    Get PDF
    This open access handbook presents a multidisciplinary and multifaceted perspective on how the ‘digital’ is simultaneously changing Russia and the research methods scholars use to study Russia. It provides a critical update on how Russian society, politics, economy, and culture are reconfigured in the context of ubiquitous connectivity and accounts for the political and societal responses to digitalization. In addition, it answers practical and methodological questions in handling Russian data and a wide array of digital methods. The volume makes a timely intervention in our understanding of the changing field of Russian Studies and is an essential guide for scholars, advanced undergraduate and graduate students studying Russia today

    The Palgrave Handbook of Digital Russia Studies

    Get PDF
    This open access handbook presents a multidisciplinary and multifaceted perspective on how the ‘digital’ is simultaneously changing Russia and the research methods scholars use to study Russia. It provides a critical update on how Russian society, politics, economy, and culture are reconfigured in the context of ubiquitous connectivity and accounts for the political and societal responses to digitalization. In addition, it answers practical and methodological questions in handling Russian data and a wide array of digital methods. The volume makes a timely intervention in our understanding of the changing field of Russian Studies and is an essential guide for scholars, advanced undergraduate and graduate students studying Russia today
    • …
    corecore