70 research outputs found

    Data-driven curation, learning and analysis for inferring evolving IoT botnets in the wild

    Get PDF
    The insecurity of the Internet-of-Things (IoT) paradigm continues to wreak havoc in consumer and critical infrastructure realms. Several challenges impede addressing IoT security at large, including, the lack of IoT-centric data that can be collected, analyzed and correlated, due to the highly heterogeneous nature of such devices and their widespread deployments in Internet-wide environments. To this end, this paper explores macroscopic, passive empirical data to shed light on this evolving threat phenomena. This not only aims at classifying and inferring Internet-scale compromised IoT devices by solely observing such one-way network traffic, but also endeavors to uncover, track and report on orchestrated "in the wild" IoT botnets. Initially, to prepare the effective utilization of such data, a novel probabilistic model is designed and developed to cleanse such traffic from noise samples (i.e., misconfiguration traffic). Subsequently, several shallow and deep learning models are evaluated to ultimately design and develop a multi-window convolution neural network trained on active and passive measurements to accurately identify compromised IoT devices. Consequently, to infer orchestrated and unsolicited activities that have been generated by well-coordinated IoT botnets, hierarchical agglomerative clustering is deployed by scrutinizing a set of innovative and efficient network feature sets. By analyzing 3.6 TB of recent darknet traffic, the proposed approach uncovers a momentous 440,000 compromised IoT devices and generates evidence-based artifacts related to 350 IoT botnets. While some of these detected botnets refer to previously documented campaigns such as the Hide and Seek, Hajime and Fbot, other events illustrate evolving threats such as those with cryptojacking capabilities and those that are targeting industrial control system communication and control services

    Data-Driven Approaches for Detecting Malware-Infected IoT Devices and Characterizing Their Unsolicited Behaviors by Leveraging Passive Internet Measurements

    Get PDF
    Despite the benefits of Internet of Things (IoT) devices, the insecurity of IoT and their deployment nature have turned them into attractive targets for adversaries, which contributed to the rise of IoT-tailored malware as a major threat to the Internet ecosystem. In this thesis, we address the threats associated with the emerging IoT malware, which utilize exploited devices to perform large-scale cyber attacks (e.g., DDoS). To mitigate such threat, there is a need to possess an Internet perspective of the deployed IoT devices while building a better understanding about the behavioral characteristic of malware-infected devices, which is challenging due to the lack of empirical data and knowledge about the deployed IoT devices and their behavioral characteristics. To address these challenges, in this thesis, we leverage passive Internet measurements and IoT device information to detect exploited IoT devices and investigate their generated traffic at the network telescope (darknet). We aim at proposing data-driven approaches for effective and near real-time IoT threat detection and characterization. Additionally, we leverage a specialized IoT Honeypot to analyze a large corpus of real IoT malware binary executable. We aim at building a better understanding about the current state of IoT malware while addressing the problems of IoT malware classification and family attribution. To this end, we perform the following to achieve our objectives: First, we address the lack of empirical data and knowledge about IoT devices and their activities. To this end, we leverage an online IoT search engine (e.g., Shodan.io) to obtain publicly available device information in the realms of consumer and cyber-physical system (CPS), while utilizing passive network measurements collected at a large-scale network telescope (CAIDA), to infer compromised devices and their unsolicited activities. Indeed, we were among the first to report experimental results on detecting compromised IoT devices and their behavioral characteristics in the wild, while demonstrating their active involvement in large-scale malware-generated malicious activities such as Internet scanning. Additionally, we leverage the IoT-generated backscatter traffic towards the network telescope to shed light on IoT devices that were victims of intensive Denial of Service (DoS) attacks. Second, given the highly orchestrated nature of IoT-driven cyber-attacks, we focus on the analysis of IoT-generated scanning activities to detect and characterize scanning campaigns generated by IoT botnets. To this end, we aggregate IoT-generated traffic and performing association rules mining to infer campaigns through common scanning objectives represented by targeted destination ports. Further, we leverage behavioural characteristics and aggregated flow features to correlate IoT devices using DBSCAN clustering algorithm. Indeed, our findings shed light on compromised IoT devices, which tend to operate within well coordinated IoT botnets. Third, considering the huge number of IoT devices and the magnitude of their malicious scanning traffic, we focus on addressing the operational challenges to automate large-scale campaign detection and analysis while generating threat intelligence in a timely manner. To this end, we leverage big data analytic frameworks such as Apache Spark to develop a scalable system for automated detection of infected IoT devices and characterization of their scanning activities using our proposed approach. Our evaluation results with over 4TB of IoT traffic demonstrated the effectiveness of the system to infer scanning campaigns generated by IoT botnets. Moreover, we demonstrate the feasibility of the implemented system/framework as a platform for implementing further supporting applications, which leverage passive Internet measurement for characterizing IoT traffic and generating IoT-related threat intelligence. Fourth, we take first steps towards mitigating threats associated with the rise of IoT malware by creating a better understanding about the characteristics and inter-relations of IoT malware. To this end, we analyze about 70,000 IoT malware binaries obtained by a specialized IoT honeypot in the past two years. We investigate the distribution of IoT malware across known families, while exploring their detection timeline and persistent. Moreover, while we shed light on the effectiveness of IoT honeypots in detecting new/unknown malware samples, we utilize static and dynamic malware analysis techniques to uncover adversarial infrastructure and investigate functional similarities. Indeed, our findings enable unknown malware labeling/attribution while identifying new IoT malware variants. Additionally, we collect malware-generated scanning traffic (whenever available) to explore behavioral characteristics and associated threats/vulnerabilities. We conclude this thesis by discussing research gaps that pave the way for future work

    Connecting the Dots: An Assessment of Cyber-risks in Networked Building and Municipal Infrastructure Systems

    Get PDF
    The buildings and city streets we walk down are changing. Driven by various data-driven use cases, there is increased interest in networking and integrating lighting and other building systems (e.g., heating, ventilation, and air conditioning (HVAC), security, scheduling) that were previously not internet-facing, and equipping them with sensors that collect information about their environment and the people that inhabit it. These data-enabled systems can potentially deliver improved occupant and resident experiences and help meet the U.S. Department of Energy (DOE) national energy and carbon reduction goals. Deploying connected devices new to being networked, however, is not without its challenges. This paper explores tools available to system designers and integrators that facilitate a cybersecurity landscape assessment – or more specifically the identification of threats, vulnerabilities, and adversarial behaviors that could be used against these networked systems. These assessments can help stakeholders shift security prioritization proactively toward the beginning of the development process

    Deep learning in phishing mitigation: a uniform resource locator-based predictive model

    Get PDF
    To mitigate the evolution of phish websites, various phishing prediction8 schemes are being optimized eventually. However, the optimized methods produce gratuitous performance overhead due to the limited exploration of advanced phishing cues. Thus, a phishing uniform resource locator-based predictive model is enhanced by this work to defeat this deficiency using deep learning algorithms. This model’s architecture encompasses pre-processing of the effective feature space that is made up of 60 mutual uniform resource locator (URL) phishing features, and a dual deep learning-based model of convolution neural network with bi-directional long short-term memory (CNN-BiLSTM). The proposed predictive model is trained and tested on a dataset of 14,000 phish URLs and 28,074 legitimate URLs. Experimentally, the performance outputs are remarked with a 0.01% false positive rate (FPR) and 99.27% testing accuracy

    Systematically Detecting Packet Validation Vulnerabilities in Embedded Network Stacks

    Full text link
    Embedded Network Stacks (ENS) enable low-resource devices to communicate with the outside world, facilitating the development of the Internet of Things and Cyber-Physical Systems. Some defects in ENS are thus high-severity cybersecurity vulnerabilities: they are remotely triggerable and can impact the physical world. While prior research has shed light on the characteristics of defects in many classes of software systems, no study has described the properties of ENS defects nor identified a systematic technique to expose them. The most common automated approach to detecting ENS defects is feedback-driven randomized dynamic analysis ("fuzzing"), a costly and unpredictable technique. This paper provides the first systematic characterization of cybersecurity vulnerabilities in ENS. We analyzed 61 vulnerabilities across 6 open-source ENS. Most of these ENS defects are concentrated in the transport and network layers of the network stack, require reaching different states in the network protocol, and can be triggered by only 1-2 modifications to a single packet. We therefore propose a novel systematic testing framework that focuses on the transport and network layers, uses seeds that cover a network protocol's states, and systematically modifies packet fields. We evaluated this framework on 4 ENS and replicated 12 of the 14 reported IP/TCP/UDP vulnerabilities. On recent versions of these ENSs, it discovered 7 novel defects (6 assigned CVES) during a bounded systematic test that covered all protocol states and made up to 3 modifications per packet. We found defects in 3 of the 4 ENS we tested that had not been found by prior fuzzing research. Our results suggest that fuzzing should be deferred until after systematic testing is employed.Comment: 12 pages, 3 figures, to be published in the 38th IEEE/ACM International Conference on Automated Software Engineering (ASE 2023

    Graphs behind data: A network-based approach to model different scenarios

    Get PDF
    openAl giorno d’oggi, i contesti che possono beneficiare di tecniche di estrazione della conoscenza a partire dai dati grezzi sono aumentati drasticamente. Di conseguenza, la definizione di modelli capaci di rappresentare e gestire dati altamente eterogenei Ăš un argomento di ricerca molto dibattuto in letteratura. In questa tesi, proponiamo una soluzione per affrontare tale problema. In particolare, riteniamo che la teoria dei grafi, e piĂč nello specifico le reti complesse, insieme ai suoi concetti ed approcci, possano rappresentare una valida soluzione. Infatti, noi crediamo che le reti complesse possano costituire un modello unico ed unificante per rappresentare e gestire dati altamente eterogenei. Sulla base di questa premessa, mostriamo come gli stessi concetti ed approcci abbiano la potenzialitĂ  di affrontare con successo molti problemi aperti in diversi contesti. ​Nowadays, the amount and variety of scenarios that can benefit from techniques for extracting and managing knowledge from raw data have dramatically increased. As a result, the search for models capable of ensuring the representation and management of highly heterogeneous data is a hot topic in the data science literature. In this thesis, we aim to propose a solution to address this issue. In particular, we believe that graphs, and more specifically complex networks, as well as the concepts and approaches associated with them, can represent a solution to the problem mentioned above. In fact, we believe that they can be a unique and unifying model to uniformly represent and handle extremely heterogeneous data. Based on this premise, we show how the same concepts and/or approach has the potential to address different open issues in different contexts. ​INGEGNERIA DELL'INFORMAZIONEopenVirgili, Luc

    Knowledge management framework based on brain models and human physiology

    Get PDF
    The life of humans and most living beings depend on sensation and perception for the best assessment of the surrounding world. Sensorial organs acquire a variety of stimuli that are interpreted and integrated in our brain for immediate use or stored in memory for later recall. Among the reasoning aspects, a person has to decide what to do with available information. Emotions are classifiers of collected information, assigning a personal meaning to objects, events and individuals, making part of our own identity. Emotions play a decisive role in cognitive processes as reasoning, decision and memory by assigning relevance to collected information. The access to pervasive computing devices, empowered by the ability to sense and perceive the world, provides new forms of acquiring and integrating information. But prior to data assessment on its usefulness, systems must capture and ensure that data is properly managed for diverse possible goals. Portable and wearable devices are now able to gather and store information, from the environment and from our body, using cloud based services and Internet connections. Systems limitations in handling sensorial data, compared with our sensorial capabilities constitute an identified problem. Another problem is the lack of interoperability between humans and devices, as they do not properly understand human’s emotional states and human needs. Addressing those problems is a motivation for the present research work. The mission hereby assumed is to include sensorial and physiological data into a Framework that will be able to manage collected data towards human cognitive functions, supported by a new data model. By learning from selected human functional and behavioural models and reasoning over collected data, the Framework aims at providing evaluation on a person’s emotional state, for empowering human centric applications, along with the capability of storing episodic information on a person’s life with physiologic indicators on emotional states to be used by new generation applications

    On Leveraging Next-Generation Deep Learning Techniques for IoT Malware Classification, Family Attribution and Lineage Analysis

    Get PDF
    Recent years have witnessed the emergence of new and more sophisticated malware targeting insecure Internet of Things (IoT) devices, as part of orchestrated large-scale botnets. Moreover, the public release of the source code of popular malware families such as Mirai [1] has spawned diverse variants, making it harder to disambiguate their ownership, lineage, and correct label. Such a rapidly evolving landscape makes it also harder to deploy and generalize effective learning models against retired, updated, and/or new threat campaigns. To mitigate such threat, there is an utmost need for effective IoT malware detection, classification and family attribution, which provide essential steps towards initiating attack mitigation/prevention countermeasures, as well as understanding the evolutionary trajectories and tangled relationships of IoT malware. This is particularly challenging due to the lack of fine-grained empirical data about IoT malware, the diverse architectures of IoT-targeted devices, and the massive code reuse between IoT malware families. To address these challenges, in this thesis, we leverage the general lack of obfuscation in IoT malware to extract and combine static features from multi-modal views of the executable binaries (e.g., images, strings, assembly instructions), along with Deep Learning (DL) architectures for effective IoT malware classification and family attribution. Additionally, we aim to address concept drift and the limitations of inter-family classification due to the evolutionary nature of IoT malware, by detecting in-class evolving IoT malware variants and interpreting the meaning behind their mutations. To this end, we perform the following to achieve our objectives: First, we analyze 70,000 IoT malware samples collected by a specialized IoT honeypot and popular malware repositories in the past 3 years. Consequently, we utilize features extracted from strings- and image-based representations of IoT malware to implement a multi-level DL architecture that fuses the learned features from each sub-component (i.e, images, strings) through a neural network classifier. Our in-depth experiments with four prominent IoT malware families highlight the significant accuracy of the proposed approach (99.78%), which outperforms conventional single-level classifiers, by relying on different representations of the target IoT malware binaries that do not require expensive feature extraction. Additionally, we utilize our IoT-tailored approach for labeling unknown malware samples, while identifying new malware strains. Second, we seek to identify when the classifier shows signs of aging, by which it fails to effectively recognize new variants and adapt to potential changes in the data. Thus, we introduce a robust and effective method that uses contrastive learning and attentive Transformer models to learn and compare semantically meaningful representations of IoT malware binaries and codes without the need for expensive target labels. We find that the evolution of IoT binaries can be used as an augmentation strategy to learn effective representations to contrast (dis)similar variant pairs. We discuss the impact and findings of our analysis and present several evaluation studies to highlight the tangled relationships of IoT malware, as well as the efficiency of our contrastively learned fine-grained feature vectors in preserving semantics and reducing out-of-vocabulary size in cross-architecture IoT malware binaries. We conclude this thesis by summarizing our findings and discussing research gaps that lay the way for future work

    An Empirical Analysis of Cyber Deception Systems

    Get PDF

    Challenges and Opportunities in Applied System Innovation

    Get PDF
    This book introduces and provides solutions to a variety of problems faced by society, companies and individuals in a quickly changing and technology-dependent world. The wide acceptance of artificial intelligence, the upcoming fourth industrial revolution and newly designed 6G technologies are seen as the main enablers and game changers in this environment. The book considers these issues not only from a technological viewpoint but also on how society, labor and the economy are affected, leading to a circular economy that affects the way people design, function and deploy complex systems
    • 

    corecore