1,567 research outputs found

    Insider threat identification using the simultaneous neural learning of multi-source logs

    Get PDF
    Insider threat detection has drawn increasing attention in recent years. In order to capture a malicious insider's digital footprints that occur scatteredly across a wide range of audit data sources over a long period of time, existing approaches often leverage a scoring mechanism to orchestrate alerts generated from multiple sub-detectors, or require domain knowledge-based feature engineering to conduct a one-off analysis across multiple types of data. These approaches result in a high deployment complexity and incur additional costs for engaging security experts. In this paper, we present a novel approach that works with a variety of security logs. The security logs are transformed into texts in the same format and then arranged as a corpus. Using the model trained by Word2vec with the corpus, we are enabled to approximate the posterior probabilities for insider behaviours. Accordingly, we label the transformed events as suspicious if their behavioural probabilities are smaller than a given threshold, and a user is labelled as malicious if he/she is associated with multiple suspicious events. The experiments are undertaken with the Carnegie Mellon University (CMU) CERT Programs insider threat database v6.2, which not only demonstrate that the proposed approach is effective and scalable in practical applications but also provide a guidance for tuning the parameters and thresholds

    An Evasion Attack against ML-based Phishing URL Detectors

    Full text link
    Background: Over the year, Machine Learning Phishing URL classification (MLPU) systems have gained tremendous popularity to detect phishing URLs proactively. Despite this vogue, the security vulnerabilities of MLPUs remain mostly unknown. Aim: To address this concern, we conduct a study to understand the test time security vulnerabilities of the state-of-the-art MLPU systems, aiming at providing guidelines for the future development of these systems. Method: In this paper, we propose an evasion attack framework against MLPU systems. To achieve this, we first develop an algorithm to generate adversarial phishing URLs. We then reproduce 41 MLPU systems and record their baseline performance. Finally, we simulate an evasion attack to evaluate these MLPU systems against our generated adversarial URLs. Results: In comparison to previous works, our attack is: (i) effective as it evades all the models with an average success rate of 66% and 85% for famous (such as Netflix, Google) and less popular phishing targets (e.g., Wish, JBHIFI, Officeworks) respectively; (ii) realistic as it requires only 23ms to produce a new adversarial URL variant that is available for registration with a median cost of only $11.99/year. We also found that popular online services such as Google SafeBrowsing and VirusTotal are unable to detect these URLs. (iii) We find that Adversarial training (successful defence against evasion attack) does not significantly improve the robustness of these systems as it decreases the success rate of our attack by only 6% on average for all the models. (iv) Further, we identify the security vulnerabilities of the considered MLPU systems. Our findings lead to promising directions for future research. Conclusion: Our study not only illustrate vulnerabilities in MLPU systems but also highlights implications for future study towards assessing and improving these systems.Comment: Draft for ACM TOP

    Intrusion Detection System using Bayesian Network Modeling

    Get PDF
    Computer Network Security has become a critical and important issue due to ever increasing cyber-crimes. Cybercrimes are spanning from simple piracy crimes to information theft in international terrorism. Defence security agencies and other militarily related organizations are highly concerned about the confidentiality and access control of the stored data. Therefore, it is really important to investigate on Intrusion Detection System (IDS) to detect and prevent cybercrimes to protect these systems. This research proposes a novel distributed IDS to detect and prevent attacks such as denial service, probes, user to root and remote to user attacks. In this work, we propose an IDS based on Bayesian network classification modelling technique. Bayesian networks are popular for adaptive learning, modelling diversity network traffic data for meaningful classification details. The proposed model has an anomaly based IDS with an adaptive learning process. Therefore, Bayesian networks have been applied to build a robust and accurate IDS. The proposed IDS has been evaluated against the KDD DAPRA dataset which was designed for network IDS evaluation. The research methodology consists of four different Bayesian networks as classification models, where each of these classifier models are interconnected and communicated to predict on incoming network traffic data. Each designed Bayesian network model is capable of detecting a major category of attack such as denial of service (DoS). However, all four Bayesian networks work together to pass the information of the classification model to calibrate the IDS system. The proposed IDS shows the ability of detecting novel attacks by continuing learning with different datasets. The testing dataset constructed by sampling the original KDD dataset to contain balance number of attacks and normal connections. The experiments show that the proposed system is effective in detecting attacks in the test dataset and is highly accurate in detecting all major attacks recorded in DARPA dataset. The proposed IDS consists with a promising approach for anomaly based intrusion detection in distributed systems. Furthermore, the practical implementation of the proposed IDS system can be utilized to train and detect attacks in live network traffi

    Anomaly-based insider threat detection with expert feedback and descriptions

    Get PDF
    Abstract. Insider threat is one of the most significant security risks for organizations, hence insider threat detection is an important task. Anomaly detection is a one approach to insider threat detection. Anomaly detection techniques can be categorized into three categories with respect to how much labelled data is needed: unsupervised, semi-supervised and supervised. Obtaining accurate labels of all kinds of incidents for supervised learning is often expensive and impractical. Unsupervised methods do not require labelled data, but they have a high false positive rate because they operate on the assumption that anomalies are rarer than nominals. This can be mitigated by introducing feedback, known as expert-feedback or active learning. This allows the analyst to label a subset of the data. Another problem is the fact that models often are not interpretable, thus it is unclear why the model decided that a data instance is an anomaly. This thesis presents a literature review of insider threat detection, unsupervised and semi-supervised anomaly detection. The performance of various unsupervised anomaly detectors are evaluated. Knowledge is introduced into the system by using state-of-the-art feedback technique for ensembles, known as active anomaly discovery, which is incorporated into the anomaly detector, known as isolation forest. Additionally, to improve interpretability techniques of creating rule-based descriptions for the isolation forest are evaluated. Experiments were performed on CMU-CERT dataset, which is the only publicly available insider threat dataset with logon, removable device and HTTP log data. Models use usage count and session-based features that are computed for users on every day. The results show that active anomaly discovery helps in ranking true positives higher on the list, lowering the amount of data analysts have to analyse. Results also show that both compact description and Bayesian rulesets have the potential to be used in generating decision-rules that aid in analysing incidents; however, these rules are not correct in every instance.Poikkeamapohjainen sisäpiiriuhkien havainta palautteen ja kuvauksien avulla. Tiivistelmä. Sisäpiirinuhat ovat yksi vakavimmista riskeistä organisaatioille. Tästä syystä sisäpiiriuhkien havaitseminen on tärkeää. Sisäpiiriuhkia voidaan havaita poikkeamien havaitsemismenetelmillä. Nämä menetelmät voidaan luokitella kolmeen oppimisluokkaan saatavilla olevan tietomäärän perusteella: ohjaamaton, puoli-ohjattu ja ohjattu. Täysin oikein merkatun tiedon saaminen ohjattua oppimista varten voi olla hyvin kallista ja epäkäytännöllistä. Ohjaamattomat oppimismenetelmät eivät vaadi merkattua tietoa, mutta väärien positiivisten osuus on suurempi, koska nämä menetelmät perustuvat oletukseen että poikkeamat ovat harvinaisempia kuin normaalit tapaukset. Väärien positiivisten osuutta voidaan pienentää ottamalla käyttöön palaute, jolloin analyytikko voi merkata osan datasta. Tässä opinnäytetyössä tutustutaan ensin sisäpiiriuhkien havaitsemiseen, mitä tutkimuksia on tehty ja ohjaamattomaan ja puoli-ohjattuun poikkeamien havaitsemiseen. Muutamien lupaavien ohjaamattomien poikkeamatunnistimien toimintakyky arvioidaan. Järjestelmään lisätään tietoisuutta havaitsemisongelmasta käyttämällä urauurtavaa active anomaly discovery -palautemetelmää, joka on tehty havaitsinjoukoille (engl. ensembles). Tätä arvioidaan Isolation Forest -havaitsimen kanssa. Lisäksi, jotta analytiikko pystyisi paremmin käsittelemään havainnot, tässä työssä myös arvioidaan sääntöpohjaisten kuvausten luontimenetelmä Isolation Forest -havaitsimelle. Kokeilut suoritettiin käyttäen julkista CMU-CERT:in aineistoa, joka on ainoa julkinen aineisto, missä on muun muuassa kirjautumis-, USB-laite- ja HTTP-tapahtumia. Mallit käyttävät käyttöluku- ja istuntopohjaisia piirteitä, jotka luodaan jokaista käyttäjää ja päivää kohti. Tuloksien perusteella Active Anomaly Discovery auttaa epäilyttävämpien tapahtumien sijoittamisessa listan kärkeen vähentäen tiedon määrä, jonka analyytikon tarvitsee tutkia. Kompaktikuvakset (engl. compact descriptions)- ja Bayesian sääntöjoukko -menetelmät pystyvät luomaan sääntöjä, jotka kuvaavat minkä takia tapahtuma on epäilyttävä, mutta nämä säännöt eivät aina ole oikein

    Detecting and characterizing lateral phishing at scale

    Get PDF
    We present the first large-scale characterization of lateral phishing attacks, based on a dataset of 113 million employee-sent emails from 92 enterprise organizations. In a lateral phishing attack, adversaries leverage a compromised enterprise account to send phishing emails to other users, benefit-ting from both the implicit trust and the information in the hijacked user's account. We develop a classifier that finds hundreds of real-world lateral phishing emails, while generating under four false positives per every one-million employee-sent emails. Drawing on the attacks we detect, as well as a corpus of user-reported incidents, we quantify the scale of lateral phishing, identify several thematic content and recipient targeting strategies that attackers follow, illuminate two types of sophisticated behaviors that attackers exhibit, and estimate the success rate of these attacks. Collectively, these results expand our mental models of the 'enterprise attacker' and shed light on the current state of enterprise phishing attacks
    corecore