83 research outputs found

    A Deep-Learning Based Robust Framework Against Adversarial P.E. and Cryptojacking Malware

    Get PDF
    This graduate thesis introduces novel, deep-learning based frameworks that are resilient to adversarial P.E. and cryptojacking malware. We propose a method that uses a convolutional neural network (CNN) to classify image representations of malware, that provides robustness against numerous adversarial attacks. Our evaluation concludes that the image-based malware classifier is significantly more robust to adversarial attacks than a state-of-the-art ML-based malware classifier, and remarkably drops the evasion rate of adversarial samples to 0% in certain attacks. Further, we develop MINOS, a novel, lightweight cryptojacking detection system that accurately detects the presence of unwarranted mining activity in real-time. MINOS can detect mining activity with a low TNR and FPR, in an average of 25.9 milliseconds while using a maximum of 4% of CPU and 6.5% of RAM. Therefore, it can be concluded that the frameworks presented in this thesis attain high accuracy, are computationally inexpensive, and are resistant to adversarial perturbations

    Robust Machine Learning In Computer Vision

    Get PDF
    Deep neural networks have been shown to be successful in various computer vision tasks such as image classification and object detection. Although deep neural networks have exceeded human performance in many tasks, robustness and reliability are always the concerns of using deep learning models. On the one hand, degraded images and videos aggravate the performance of computer vision tasks. On the other hand, if the deep neural networks are under adversarial attacks, the networks can be broken completely. Motivated by the vulnerability of deep neural networks, I analyze and develop image restoration and adversarial defense algorithms towards a vision of robust machine learning in computer vision. In this dissertation, I study two types of degradation making deep neural networks vulnerable. The first part of the dissertation focuses on face recognition at long range, whose performance is severely degraded by atmospheric turbulence. The theme is on improving the performance and robustness of various tasks in face recognition systems such as facial keypoints localization, feature extraction, and image restoration. The second part focuses on defending adversarial attacks in the images classification task. The theme is on exploring adversarial defense methods that can achieve good performance in standard accuracy, robustness to adversarial attacks with known threat models, and good generalization to other unseen attacks

    Machine Learning Based Detection and Evasion Techniques for Advanced Web Bots.

    Get PDF
    Web bots are programs that can be used to browse the web and perform different types of automated actions, both benign and malicious. Such web bots vary in sophistication based on their purpose, ranging from simple automated scripts to advanced web bots that have a browser fingerprint and exhibit a humanlike behaviour. Advanced web bots are especially appealing to malicious web bot creators, due to their browserlike fingerprint and humanlike behaviour which reduce their detectability. Several effective behaviour-based web bot detection techniques have been pro- posed in literature. However, the performance of these detection techniques when target- ing malicious web bots that try to evade detection has not been examined in depth. Such evasive web bot behaviour is achieved by different techniques, including simple heuris- tics and statistical distributions, or more advanced machine learning based techniques. Motivated by the above, in this thesis we research novel web bot detection techniques and how effective these are against evasive web bots that try to evade detection using, among others, recent advances in machine learning. To this end, we initially evaluate state-of-the-art web bot detection techniques against web bots of different sophistication levels and show that, while the existing approaches achieve very high performance in general, such approaches are not very effective when faced with only advanced web bots that try to remain undetected. Thus, we propose a novel web bot detection framework that can be used to detect effectively bots of varying levels of sophistication, including advanced web bots. This framework comprises and combines two detection modules: (i) a detection module that extracts several features from web logs and uses them as input to several well-known machine learning algo- rithms, and (ii) a detection module that uses mouse trajectories as input to Convolutional Neural Networks (CNNs). Moreover, we examine the case where advanced web bots utilise themselves the re- cent advances in machine learning to evade detection. Specifically, we propose two novel evasive advanced web bot types: (i) the web bots that use Reinforcement Learning (RL) to update their browsing behaviour based on whether they have been detected or not, and (ii) the web bots that have in their possession several data from human behaviours and use them as input to Generative Adversarial Networks (GANs) to generate images of humanlike mouse trajectories. We show that both approaches increase the evasiveness of the web bots by reducing the performance of the detection framework utilised in each case. We conclude that malicious web bots can exhibit high sophistication levels and com- bine different techniques that increase their evasiveness. Even though web bot detection frameworks can combine different methods to effectively detect such bots, web bots can update their behaviours using, among other, recent advances in machine learning to in- crease their evasiveness. Thus, the detection techniques should be continuously updated to keep up with new techniques introduced by malicious web bots to evade detection

    MRS Drone: A Modular Platform for Real-World Deployment of Aerial Multi-Robot Systems

    Full text link
    This paper presents a modular autonomous Unmanned Aerial Vehicle (UAV) platform called the Multi-robot Systems (MRS) Drone that can be used in a large range of indoor and outdoor applications. The MRS Drone features unique modularity with respect to changes in actuators, frames, and sensory configuration. As the name suggests, the platform is specially tailored for deployment within a MRS group. The MRS Drone contributes to the state-of-the-art of UAV platforms by allowing smooth real-world deployment of multiple aerial robots, as well as by outperforming other platforms with its modularity. For real-world multi-robot deployment in various applications, the platform is easy to both assemble and modify. Moreover, it is accompanied by a realistic simulator to enable safe pre-flight testing and a smooth transition to complex real-world experiments. In this manuscript, we present mechanical and electrical designs, software architecture, and technical specifications to build a fully autonomous multi UAV system. Finally, we demonstrate the full capabilities and the unique modularity of the MRS Drone in various real-world applications that required a diverse range of platform configurations.Comment: 49 pages, 39 figures, accepted for publication to the Journal of Intelligent & Robotic System

    Hardware Assisted Solutions for Automobile Security

    Get PDF
    In the past couple of decades, many in-vehicle features have been invented and deployed in order to make modern vehicles which not only safer and more reliable but also connected, smarter, and intelligent. Meanwhile, vehicular ad-hoc networks (VANETs) are proposed to provide communications between vehicles and road-side stations as the foundation of the intelligent transportation system to provide efficient and safe transportation. To support these updated functions, a large amount of electronic equipment has been integrated into the car system. Although these add-on functions around vehicles offer great help in driving assistance, they inevitably introduced new security vulnerabilities that threaten the safety of the on-board drivers, passengers and pedestrians. This has been demonstrated by many well-documented attacks either on the in-vehicle bus system or on the wireless vehicular network communications. In this dissertation, we design and implement several hardware-oriented solutions to the arousing security issues on vehicles. More specifically, we focus on three important and representative problems: (1) how to secure the in-vehicle Controller Area Network (CAN), (2) how to secure the communication between vehicle and outside, and (3) how to establish trust on VANETs. Current approaches based on cryptographic algorithms to secure CAN bus violate the strict timing and limited resource constraints for CAN communications. We thus emphasize on the alternate solution of intrusion detection system (IDS) in this dissertation. We explore monitoring the changes of CAN message content or the physical delay of its transmission to detect on the CAN bus. We first propose a new entropy-based IDS following the observation that all the known CAN message injection attacks need to alter the CAN identifier bit. Thus, analyzing the entropy changes of such bits can be an effective way to detect those attacks. Next, we develop a delay-based IDS to protect the CAN network by identifying the location of the compromised Electronic Control Unit (ECU) from the transmission delay difference to two terminals connected to the CAN bus. We demonstrate that both approaches can protect the integrity of the messages on CAN bus leading to a further improve the security and safety of autonomous vehicles. In the second part of this dissertation, we consider Plug-and-Secure, an industrial practice on key management for automotive CAN networks. It has been proven to be information theoretically secure. However, we discover side-channel attacks based on the physical properties of the CAN bus that can leak almost the entire secret key bits. We analyze the fundamental characteristics that lead to such attacks and propose techniques to minimize information leakage at the hardware level. Next, we extend our study from in-vehicle secure CAN communication to the communication between vehicle and outside world. We take the example of the popular GPS spoofing attack and show how we can use the rich information from CAN bus to build a cross-validation system to detect such attacks. Our approach is based on the belief that the local driving data from the in-vehicle network can be authenticated and thus trusted by secure CAN networks mechanisms. Such data can be used to cross-validate the GPS signals from the satellite which are vulnerable to spoofing attacks. We conduct driving tests on real roads to show that our proposed approach can defend both GPS spoofing attacks and location-based attacks on the VANETs. Finally, we propose a blockchain based Anonymous Reputation System (BARS) to establish a privacy-preserving trust model for VANETs. The certificate and revocation transparency is implemented efficiently with the proofs of presence and absence based on the extended blockchain technology. To prevent the broadcast of forged messages, a reputation evaluation algorithm is presented relying on both direct historical interactions of that vehicle and indirect opinions from the other vehicles. This dissertation features solutions to vehicle security problems based on hardware or physical characteristics, instead of cryptographic algorithms. We believe that given the critical timing requirement on vehicular systems and their very limited resource (such as the bandwidth on CAN bus), this will be a very promising direction to secure vehicles and vehicular network

    Studying the origins of primary tumours and residual disease in breast cancer

    Get PDF
    Breast cancer is the leading cause of death in women worldwide and these deaths are mostly attributed to metastasis and tumour recurrence following initially successful therapy. Metastasis refers to the development of invasive disease, wherein malignant cells dissociate from primary tumours, infiltrating other organs and tissues to give rise to secondary outgrowths. Previously, metastasis was thought to be initiated in advanced tumours, but breast cancer cellsh with metastatic potential have now been shown to disseminate very early from the primary site via largely unknown mechanisms. These early interactions of tumour cells with their cellular micro-environment and normal neighbours also results in early tumour cell heterogeneity and must therefore be elucidated such that we can prevent metastatic spread in the patient situation and better treat the resulting heterogenous tumours. However, studying tumour initiation is not possible in patients because it happens on a cellular level not detectable by current technology. Tumour recurrence is another major cause of breast cancer related death and is believed to be caused by residual disease cells that survive initial therapy. These are a reservoir of refractory cells that can lay dormant for many years (sometimes decades) before resulting in relapse tumours. They are also difficult to obtain from human patients, since they are very few and cannot be detected easily, and thus their molecular mechanisms have not been fully explored. In addition to the unavailability of human tissue, mouse models of breast cancer also fall short in helping us study early cancer initiation, because they allow oncogenic expression in all cells of the tissue instead of initiating cancer like in the human situation|one neoplastic transformed cell proliferating unchecked in a normal epithelium. To address this issue, we used primary organoids from an inducible mouse model of breast cancer and lentivirally transduced single cells within these organoids to express oncogenes. We further optimized parameters for long term imaging using light sheet microscopy and developed big data analysis pipelines that lead us to discern that single transformed cells had a lower chance at establishing tumorigenic foci, when compared to clusters of cells. Thus, we postulate a proximity-controlled signalling that is imperative to tumour initiation within epithelial tissues using the first ever in vitro stochastic breast tumorigenesis model system. This new stochastic tumorigenesis system can be further used to identify the molecular interactions in the early breast cancer cells. Our group has already revealed distinct characteristics, such as dysregulated lipid metabolism, of the residual disease correlate obtained from an inducible mouse model. As survival mechanisms invoked by residual cells remain largely unknown, we analysed the dynamic transcriptome of regressing tumours at important timepoints during the establishment of residual disease. Key molecular players upregulated during regression {like c-Jun and BCL6 { were identified and the inflammatory arm of the Nf-kB cascade was found to be dysregulated among others. Further validation of these molecular targets as potentially synthetic lethal interactors remains to be performed so that they can be used to limit the residual disease reservoir and eventually tumour recurrence

    Towards adaptive anomaly detection systems using boolean combination of hidden Markov models

    Get PDF
    Anomaly detection monitors for significant deviations from normal system behavior. Hidden Markov Models (HMMs) have been successfully applied in many intrusion detection applications, including anomaly detection from sequences of operating system calls. In practice, anomaly detection systems (ADSs) based on HMMs typically generate false alarms because they are designed using limited representative training data and prior knowledge. However, since new data may become available over time, an important feature of an ADS is the ability to accommodate newly-acquired data incrementally, after it has originally been trained and deployed for operations. Incremental re-estimation of HMM parameters raises several challenges. HMM parameters should be updated from new data without requiring access to the previously-learned training data, and without corrupting previously-learned models of normal behavior. Standard techniques for training HMM parameters involve iterative batch learning, and hence must observe the entire training data prior to updating HMM parameters. Given new training data, these techniques must restart the training procedure using all (new and previously-accumulated) data. Moreover, a single HMM system for incremental learning may not adequately approximate the underlying data distribution of the normal process, due to the many local maxima in the solution space. Ensemble methods have been shown to alleviate knowledge corruption, by combining the outputs of classifiers trained independently on successive blocks of data. This thesis makes contributions at the HMM and decision levels towards improved accuracy, efficiency and adaptability of HMM-based ADSs. It first presents a survey of techniques found in literature that may be suitable for incremental learning of HMM parameters, and assesses the challenges faced when these techniques are applied to incremental learning scenarios in which the new training data is limited and abundant. Consequently, An efficient alternative to the Forward-Backward algorithm is first proposed to reduce the memory complexity without increasing the computational overhead of HMM parameters estimation from fixed-size abundant data. Improved techniques for incremental learning of HMM parameters are then proposed to accommodate new data over time, while maintaining a high level of performance. However, knowledge corruption caused by a single HMM with a fixed number of states remains an issue. To overcome such limitations, this thesis presents an efficient system to accommodate new data using a learn-and-combine approach at the decision level. When a new block of training data becomes available, a new pool of base HMMs is generated from the data using a different number of HMM states and random initializations. The responses from the newly-trained HMMs are then combined to those of the previously-trained HMMs in receiver operating characteristic (ROC) space using novel Boolean combination (BC) techniques. The learn-and-combine approach allows to select a diversified ensemble of HMMs (EoHMMs) from the pool, and adapts the Boolean fusion functions and thresholds for improved performance, while it prunes redundant base HMMs. The proposed system is capable of changing its desired operating point during operations, and this point can be adjusted to changes in prior probabilities and costs of errors. During simulations conducted for incremental learning from successive data blocks using both synthetic and real-world system call data sets, the proposed learn-and-combine approach has been shown to achieve the highest level of accuracy than all related techniques. In particular, it can sustain a significantly higher level of accuracy than when the parameters of a single best HMM are re-estimated for each new block of data, using the reference batch learning and the proposed incremental learning techniques. It also outperforms static fusion techniques such as majority voting for combining the responses of new and previously-generated pools of HMMs. Ensemble selection techniques have been shown to form compact EoHMMs for operations, by selecting diverse and accurate base HMMs from the pool while maintaining or improving the overall system accuracy. Pruning has been shown to prevents pool sizes from increasing indefinitely with the number of data blocks acquired over time. Therefore, the storage space for accommodating HMMs parameters and the computational costs of the selection techniques are reduced, without negatively affecting the overall system performance. The proposed techniques are general in that they can be employed to adapt HMM-based systems to new data, within a wide range of application domains. More importantly, the proposed Boolean combination techniques can be employed to combine diverse responses from any set of crisp or soft one- or two-class classifiers trained on different data or features or trained according to different parameters, or from different detectors trained on the same data. In particular, they can be effectively applied when training data is limited and test data is imbalanced
    corecore