186 research outputs found

    Autoencoder based anomaly detection for SCADA networks

    Get PDF
    Supervisory control and data acquisition (SCADA) systems are industrial control systems that are used to monitor critical infrastructures such as airports, transport, health, and public services of national importance. These are cyber physical systems, which are increasingly integrated with networks and internet of things devices. However, this results in a larger attack surface for cyber threats, making it important to identify and thwart cyber-attacks by detecting anomalous network traffic patterns. Compared to other techniques, as well as detecting known attack patterns, machine learning can also detect new and evolving threats. Autoencoders are a type of neural network that generates a compressed representation of its input data and through reconstruction loss of inputs can help identify anomalous data. This paper proposes the use of autoencoders for unsupervised anomaly-based intrusion detection using an appropriate differentiating threshold from the loss distribution and demonstrate improvements in results compared to other techniques for SCADA gas pipeline dataset

    Anomaly-based insider threat detection with expert feedback and descriptions

    Get PDF
    Abstract. Insider threat is one of the most significant security risks for organizations, hence insider threat detection is an important task. Anomaly detection is a one approach to insider threat detection. Anomaly detection techniques can be categorized into three categories with respect to how much labelled data is needed: unsupervised, semi-supervised and supervised. Obtaining accurate labels of all kinds of incidents for supervised learning is often expensive and impractical. Unsupervised methods do not require labelled data, but they have a high false positive rate because they operate on the assumption that anomalies are rarer than nominals. This can be mitigated by introducing feedback, known as expert-feedback or active learning. This allows the analyst to label a subset of the data. Another problem is the fact that models often are not interpretable, thus it is unclear why the model decided that a data instance is an anomaly. This thesis presents a literature review of insider threat detection, unsupervised and semi-supervised anomaly detection. The performance of various unsupervised anomaly detectors are evaluated. Knowledge is introduced into the system by using state-of-the-art feedback technique for ensembles, known as active anomaly discovery, which is incorporated into the anomaly detector, known as isolation forest. Additionally, to improve interpretability techniques of creating rule-based descriptions for the isolation forest are evaluated. Experiments were performed on CMU-CERT dataset, which is the only publicly available insider threat dataset with logon, removable device and HTTP log data. Models use usage count and session-based features that are computed for users on every day. The results show that active anomaly discovery helps in ranking true positives higher on the list, lowering the amount of data analysts have to analyse. Results also show that both compact description and Bayesian rulesets have the potential to be used in generating decision-rules that aid in analysing incidents; however, these rules are not correct in every instance.Poikkeamapohjainen sisäpiiriuhkien havainta palautteen ja kuvauksien avulla. Tiivistelmä. Sisäpiirinuhat ovat yksi vakavimmista riskeistä organisaatioille. Tästä syystä sisäpiiriuhkien havaitseminen on tärkeää. Sisäpiiriuhkia voidaan havaita poikkeamien havaitsemismenetelmillä. Nämä menetelmät voidaan luokitella kolmeen oppimisluokkaan saatavilla olevan tietomäärän perusteella: ohjaamaton, puoli-ohjattu ja ohjattu. Täysin oikein merkatun tiedon saaminen ohjattua oppimista varten voi olla hyvin kallista ja epäkäytännöllistä. Ohjaamattomat oppimismenetelmät eivät vaadi merkattua tietoa, mutta väärien positiivisten osuus on suurempi, koska nämä menetelmät perustuvat oletukseen että poikkeamat ovat harvinaisempia kuin normaalit tapaukset. Väärien positiivisten osuutta voidaan pienentää ottamalla käyttöön palaute, jolloin analyytikko voi merkata osan datasta. Tässä opinnäytetyössä tutustutaan ensin sisäpiiriuhkien havaitsemiseen, mitä tutkimuksia on tehty ja ohjaamattomaan ja puoli-ohjattuun poikkeamien havaitsemiseen. Muutamien lupaavien ohjaamattomien poikkeamatunnistimien toimintakyky arvioidaan. Järjestelmään lisätään tietoisuutta havaitsemisongelmasta käyttämällä urauurtavaa active anomaly discovery -palautemetelmää, joka on tehty havaitsinjoukoille (engl. ensembles). Tätä arvioidaan Isolation Forest -havaitsimen kanssa. Lisäksi, jotta analytiikko pystyisi paremmin käsittelemään havainnot, tässä työssä myös arvioidaan sääntöpohjaisten kuvausten luontimenetelmä Isolation Forest -havaitsimelle. Kokeilut suoritettiin käyttäen julkista CMU-CERT:in aineistoa, joka on ainoa julkinen aineisto, missä on muun muuassa kirjautumis-, USB-laite- ja HTTP-tapahtumia. Mallit käyttävät käyttöluku- ja istuntopohjaisia piirteitä, jotka luodaan jokaista käyttäjää ja päivää kohti. Tuloksien perusteella Active Anomaly Discovery auttaa epäilyttävämpien tapahtumien sijoittamisessa listan kärkeen vähentäen tiedon määrä, jonka analyytikon tarvitsee tutkia. Kompaktikuvakset (engl. compact descriptions)- ja Bayesian sääntöjoukko -menetelmät pystyvät luomaan sääntöjä, jotka kuvaavat minkä takia tapahtuma on epäilyttävä, mutta nämä säännöt eivät aina ole oikein

    Insider threat identification using the simultaneous neural learning of multi-source logs

    Get PDF
    Insider threat detection has drawn increasing attention in recent years. In order to capture a malicious insider's digital footprints that occur scatteredly across a wide range of audit data sources over a long period of time, existing approaches often leverage a scoring mechanism to orchestrate alerts generated from multiple sub-detectors, or require domain knowledge-based feature engineering to conduct a one-off analysis across multiple types of data. These approaches result in a high deployment complexity and incur additional costs for engaging security experts. In this paper, we present a novel approach that works with a variety of security logs. The security logs are transformed into texts in the same format and then arranged as a corpus. Using the model trained by Word2vec with the corpus, we are enabled to approximate the posterior probabilities for insider behaviours. Accordingly, we label the transformed events as suspicious if their behavioural probabilities are smaller than a given threshold, and a user is labelled as malicious if he/she is associated with multiple suspicious events. The experiments are undertaken with the Carnegie Mellon University (CMU) CERT Programs insider threat database v6.2, which not only demonstrate that the proposed approach is effective and scalable in practical applications but also provide a guidance for tuning the parameters and thresholds

    Design and Deployment of an Access Control Module for Data Lakes

    Get PDF
    Nowadays big data is considered an extremely valued asset for companies, which are discovering new avenues to use it for their business profit. However, an organization’s ability to effectively extract valuable information from data is based on its knowledge management infrastructure. Thus, most organizations are transitioning from data warehouse (DW) storages to data lake (DL) infrastructures, from which further insights are derived. The present work is carried out as part of a cybersecurity project in a financial institution that manages vast volumes and variety of data that is kept in a data lake. Although DL is presented as the answer to the current big data scenario, this infrastructure presents certain flaws on authentication and access control. Preceding work on DL access control points out that the main goal is to avoid fraudulent behaviors derived from user’s access, such as secondary use1, that could result in business data being exposed to third parties. To overcome the risk, traditional mechanisms attempt to identify these behaviors based on rules, however, they cannot reveal all different kinds of fraud because they only look for known patterns of misuse. The present work proposes a novel access control system for data lakes, assisted by Oracle’s database audit trail and based on anomaly detection mechanisms, that automatically looks for events that do not conform the normal or expected behavior. Thus, the overall aim of this project is to develop and deploy an automated system for identifying abnormal accesses to the DL, which can be separated into four subgoals: explore the different technologies that could be applied in the domain of anomaly detection, design the solution, deploy it, and evaluate the results. For the purpose, feature engineering is performed, and four different unsupervised ML models are built and evaluated. According to the quality of the results, the better model is finally productionalized with Docker. To conclude, although anomaly detection has been a lasting yet active research area for several decades, there are still some unique problem complexities and challenges that leave the way open for the proposed solution to be further improved.Doble Grado en Ingeniería Informática y Administración de Empresa
    corecore