132 research outputs found

    Scalable OS Fingerprinting: Classification Problems and Applications

    Get PDF
    The Internet has become ubiquitous in our lives today. With its rapid adoption and widespread growth across the planet, it has drawn many research efforts that attempt to understand and characterize this complex system. One such direction tries to discover the types of devices that compose the Internet, which is the topic of this dissertation. To accomplish such a measurement, researchers have turned to a technique called OS fingerprinting, which is a method to determine the operating system (OS) of a remote host. However, because the Internet today has evolved into a massive public network, large-scale OS fingerprinting has become a challenging problem. Due to increasing security concerns, most networks today will block many of the probes used by traditional fingerprinting tools (e.g., Nmap), thus requiring a different approach. Consequently, this has given rise to single-probe techniques which offer low overhead and minimal intrusiveness, but in turn require more sophistication in their algorithms as they are limited in the amount of information that they receive and many parameters can inject noise in the measurement (e.g., network delay, packet loss). This dissertation focuses on understanding the performance of single-probe algorithms. We study existing methods, formalize current problems in the field and devise new algorithms to improve classification accuracy and automate construction of fingerprint databases. We apply our work to multiple Internet-wide scans and discover that besides general purpose machines, the Internet today has grown to include large numbers of publicly accessible peripheral devices (e.g., routers, printers, cameras) and cyber-physical systems (e.g., lighting controllers, medical sensors). We go on to recover empirical distributions of network delays and loss, as well as likelihoods of users re-configuring their devices. With our developed techniques and results, we show that single-probe algorithms are an effective approach for accomplishing wide-scale network measurements

    Graph-Based Machine Learning for Passive Network Reconnaissance within Encrypted Networks

    Get PDF
    Network reconnaissance identifies a network’s vulnerabilities to both prevent and mitigate the impact of cyber-attacks. The difficulty of performing adequate network reconnaissance has been exacerbated by the rising complexity of modern networks (e.g., encryption). We identify that the majority of network reconnaissance solutions proposed in literature are infeasible for widespread deployment in realistic modern networks. This thesis provides novel network reconnaissance solutions to address the limitations of the existing conventional approaches proposed in literature. The existing approaches are limited by their reliance on large, heterogeneous feature sets making them difficult to deploy under realistic network conditions. In contrast, we devise a bipartite graph-based representation to create network reconnaissance solutions that rely only on a single feature (e.g., the Internet protocol (IP) address field). We exploit a widely available feature set to provide network reconnaissance solutions that are scalable, independent of encryption, and deployable across diverse Internet (TCP/IP) networks. We design bipartite graph embeddings (BGE); a graph-based machine learning (ML) technique for extracting insight from the structural properties of the bipartite graph-based representation. BGE is the first known graph embedding technique designed explicitly for network reconnaissance. We validate the use of BGE through an evaluation of a university’s enterprise network. BGE is shown to provide insight into crucial areas of network reconnaissance (e.g., device characterisation, service prediction, and network visualisation). We design an extension of BGE to acquire insight within a private network. Private networks—such as a virtual private network (VPN)—have posed significant challenges for network reconnaissance as they deny direct visibility into their composition. Our extension of BGE provides the first known solution for inferring the composition of both the devices and applications acting behind diverse private networks. This thesis provides novel graph-based ML techniques for two crucial aims of network reconnaissance—device characterisation and intrusion detection. The techniques developed within this thesis provide unique cybersecurity solutions to both prevent and mitigate the impact of cyber-attacks.Thesis (Ph.D.) -- University of Adelaide, School of Electrical and Electronic Engineering , 202

    One model to find them all deep learning for multivariate time-series anomaly detection in mobile network data

    Get PDF
    Network monitoring data generally consists of hundreds of counters periodically collected in the form of time-series, resulting in a complex-to-analyze multivariate time-series (MTS) process. Traditional time-series anomaly detection methods target univariate time-series analysis, which makes the MTS analysis cumbersome and prohibitively complex. We present DC-VAE (Dilated Convolutional -Variational Auto Encoder), a novel approach to anomaly detection in MTS data, leveraging convolutional neural networks (CNNs) and variational autoencoders (VAEs). DC-VAE detects anomalies in MTS data through a single model, exploiting temporal information without sacrificing computational and memory resources. In particular, instead of using recursive neural networks, large causal filters, or many layers, DC-VAE relies on Dilated Convolutions (DC) to capture long and short-term phenomena in the data. We evaluate DC-VAE on the detection of anomalies in the TELCO TELeCOmmunication-networks dataset, a large-scale, multi-dimensional network monitoring dataset collected at an operational mobile Internet Service Provider (ISP), where anomalous events were manually labeled by experts during seven months, at a five-minutes granularity. We benchmark DC-VAE against a broad set of traditional time-series anomaly detectors from the signal processing and machine learning domains. We also evaluate DC-VAE in open, publicly available datasets, comparing its performance against other multivariate anomaly detectors based on deep learning generative models. Results confirm the advantages of DC-VAE, both in terms of MTS data modeling, as well as for anomaly detection. For the sake of reproducibility and as an additional contribution, we make the TELCO dataset publicly available to the community and openly release the code implementing DC-VAE.Este trabajo ha sido parcialmente apoyado por la ANII-FMV, Proyecto con referencia FMV-1-2019-1-155850 Anomaly Detection with Continual and Streaming Machine Learning on Big Data Telecommunications Networks, por CSIC Proyecto I+D con referencia 22520220100371UD Anomaly Detection in Time Series : Generalization and Domain Change Adaptation, por Telefónica, y por el Austrian FFG ICT-of-the-Future project DynAISEC – Adaptive AI/ML for Dynamic Cybersecurity Systems – project ID 887504. Gastón García fue apoyado por la beca ANII POS-FMV-2020-1-1009239, así como por CSIC, en el marco del programa Movilidad e Intercambios Académicos 2022

    Nemesis: Judging the Efficacy of OS Fingerprinting Systems

    Get PDF
    The internet has gone from being a small network of niche uses and mostly academic interest to being a vital, foundational, piece of modern infrastructure that the world depends on. Because of its importance, the internet has also become a target and a gateway for malicious entities. Indeed, it has spawned a whole new military dimension: cyberwarefare. Of vital importance to both attackers and defenders is the identification of vulnerable systems connected to the internet. OS Fingerprinting is one mechanism by which vulnerable systems may be detected. Despite the importance and proliferation of OS Fingerprinting tools, there has not been a systematic effort to evaluate the effectiveness of such tools. We propose dimensionality as a metric for evaluating OS Fingerprinting Systems and provide a Framework to calculate this value. In addition, we identify NMAP as being the premier OS Fingerprinting tool used today and apply our Framework to this tool under various distortions to ascertain its performance based on our metric of dimensionality. Under its default conditions NMAP struggles with Firewalls, which are abundant on the internet, and performs poorly. Our Framework can identify confounding signatures within the database which disproportionately harm NMAPs dimensionality and can remove them from the database. Further, we find that NMAP has modest struggles with network jitter, potentially even on local networks. This, combined with NMAPs difficulty with Firewalls suggests that it is ill suited to the task of Fingerprinting Operating Systems over the internet. Lastly, we identify which features are crucial to NMAPs ability to identify Operating Systems. This, in addition to our other findings, potentially points in the directions for improvements to NMAP, both in it’s ability to identify an OS over the internet, and in reducing the number of probes needed to do so

    On the Generation of Cyber Threat Intelligence: Malware and Network Traffic Analyses

    Get PDF
    In recent years, malware authors drastically changed their course on the subject of threat design and implementation. Malware authors, namely, hackers or cyber-terrorists perpetrate new forms of cyber-crimes involving more innovative hacking techniques. Being motivated by financial or political reasons, attackers target computer systems ranging from personal computers to organizations’ networks to collect and steal sensitive data as well as blackmail, scam people, or scupper IT infrastructures. Accordingly, IT security experts face new challenges, as they need to counter cyber-threats proactively. The challenge takes a continuous allure of a fight, where cyber-criminals are obsessed by the idea of outsmarting security defenses. As such, security experts have to elaborate an effective strategy to counter cyber-criminals. The generation of cyber-threat intelligence is of a paramount importance as stated in the following quote: “the field is owned by who owns the intelligence”. In this thesis, we address the problem of generating timely and relevant cyber-threat intelligence for the purpose of detection, prevention and mitigation of cyber-attacks. To do so, we initiate a research effort, which falls into: First, we analyze prominent cyber-crime toolkits to grasp the inner-secrets and workings of advanced threats. We dissect prominent malware like Zeus and Mariposa botnets to uncover their underlying techniques used to build a networked army of infected machines. Second, we investigate cyber-crime infrastructures, where we elaborate on the generation of a cyber-threat intelligence for situational awareness. We adapt a graph-theoretic approach to study infrastructures used by malware to perpetrate malicious activities. We build a scoring mechanism based on a page ranking algorithm to measure the badness of infrastructures’ elements, i.e., domains, IPs, domain owners, etc. In addition, we use the min-hashing technique to evaluate the level of sharing among cyber-threat infrastructures during a period of one year. Third, we use machine learning techniques to fingerprint malicious IP traffic. By fingerprinting, we mean detecting malicious network flows and their attribution to malware families. This research effort relies on a ground truth collected from the dynamic analysis of malware samples. Finally, we investigate the generation of cyber-threat intelligence from passive DNS streams. To this end, we design and implement a system that generates anomalies from passive DNS traffic. Due to the tremendous nature of DNS data, we build a system on top of a cluster computing framework, namely, Apache Spark [70]. The integrated analytic system has the ability to detect anomalies observed in DNS records, which are potentially generated by widespread cyber-threats

    Scalable and Efficient Network Anomaly Detection on Connection Data Streams

    Get PDF
    Everyday, security experts and analysts must deal with and face the huge increase of cyber security threats that are propagating very fast on the Internet and threatening the security of hundreds of millions of users worldwide. The detection of such threats and attacks is of paramount importance to these experts in order to prevent these threats and mitigate their effects in the future. Thus, the need for security solutions that can prevent, detect, and mitigate such threats is imminent and must be addressed with scalable and efficient solutions. To this end, we propose a scalable framework, called Daedalus, to analyze streams of NIDS (network-based intrusion detection system) logs in near real-time and to extract useful threat security intelligence. The proposed system pre-processes massive amounts of connections stream logs received from different participating organizations and applies an elaborated anomaly detection technique in order to distinguish between normal and abnormal or anomalous network behaviors. As such, Daedalus detects network traffic anomalies by extracting a set of significant pre-defined features from the connection logs and then applying a time series-based technique in order to detect abnormal behavior in near real-time. Moreover, we correlate IP blocks extracted from the logs with some external security signature-based feeds that detect factual malicious activities (e.g., malware families and hashes, ransomware distribution, and command and control centers) in order to validate the proposed approach. Performed experiments demonstrate that Daedalus accurately identifies the malicious activities with an average F_1 score of 92.88\%. We further compare our proposed approach with existing K-Means and deep learning (LSTMs) approaches and demonstrate the accuracy and efficiency of our system

    Human and environmental exposure to hydrocarbon pollution in the Niger Delta:A geospatial approach

    Get PDF
    This study undertook an integrated geospatial assessment of human and environmental exposure to oil pollution in the Niger Delta using primary and secondary spatial data. This thesis begins by presenting a clear rationale for the study of extensive oil pollution in the Niger Delta, followed by a critical literature review of the potential application of geospatial techniques for monitoring and managing the problem. Three analytical chapters report on the methodological developments and applications of geospatial techniques that contribute to achieving the aim of the study. Firstly, a quantitative assessment of human and environmental exposure to oil pollution in the Niger Delta was performed using a government spill database. This was carried out using Spatial Analysis along Networks (SANET), a geostatistical tool, since oil spills in the region tend to follow the linear patterns of the pipelines. Spatial data on pipelines, oil spills, population and land cover data were analysed in order to quantify the extent of human and environmental exposure to oil pollution. The major causes of spills and spatial factors potentially reinforcing reported causes were analysed. Results show extensive general exposure and sabotage as the leading cause of oil pollution in the Niger Delta. Secondly, a method of delineating the river network in the Niger Delta using Sentinel-1 SAR data was developed, as a basis for modelling potential flow of pollutants in the distributary pathways of the network. The cloud penetration capabilities of SAR sensing are particularly valuable for this application since the Niger Delta is notorious for cloud cover. Vector and raster-based river networks derived from Sentinel-1 were compared to alternative river map products including those from the USGS and ESA. This demonstrated the superiority of the Sentinel-1 derived river network, which was subsequently used in a flow routing analysis to demonstrate the potential for understanding oil spill dispersion. Thirdly, the study applied optical remote sensing for indirect detection and mapping of oil spill impacts on vegetation. Multi-temporal Landsat data was used to delineate the spill impact footprint of a notable 2008 oil spill incident in Ogoniland and population exposure was evaluated. The optical data was effective in impact area delineation, demonstrating extensive and long-lasting population exposure to oil pollution. Overall, this study has successfully assembled and produced relevant spatial and attribute data sets and applied integrated geostatistical analytical techniques to understand the distribution and impacts of oil spills in the Niger Delta. The study has revealed the extensive level of human and environmental exposure to hydrocarbon pollution in the Niger Delta and introduced new methods that will be valuable fo

    Digital Forensics AI: on Practicality, Optimality, and Interpretability of Digital Evidence Mining Techniques

    Get PDF
    Digital forensics as a field has progressed alongside technological advancements over the years, just as digital devices have gotten more robust and sophisticated. However, criminals and attackers have devised means for exploiting the vulnerabilities or sophistication of these devices to carry out malicious activities in unprecedented ways. Their belief is that electronic crimes can be committed without identities being revealed or trails being established. Several applications of artificial intelligence (AI) have demonstrated interesting and promising solutions to seemingly intractable societal challenges. This thesis aims to advance the concept of applying AI techniques in digital forensic investigation. Our approach involves experimenting with a complex case scenario in which suspects corresponded by e-mail and deleted, suspiciously, certain communications, presumably to conceal evidence. The purpose is to demonstrate the efficacy of Artificial Neural Networks (ANN) in learning and detecting communication patterns over time, and then predicting the possibility of missing communication(s) along with potential topics of discussion. To do this, we developed a novel approach and included other existing models. The accuracy of our results is evaluated, and their performance on previously unseen data is measured. Second, we proposed conceptualizing the term “Digital Forensics AI” (DFAI) to formalize the application of AI in digital forensics. The objective is to highlight the instruments that facilitate the best evidential outcomes and presentation mechanisms that are adaptable to the probabilistic output of AI models. Finally, we enhanced our notion in support of the application of AI in digital forensics by recommending methodologies and approaches for bridging trust gaps through the development of interpretable models that facilitate the admissibility of digital evidence in legal proceedings
    • …
    corecore