Search CORE

5,799 research outputs found

Big data techniques in auditing research and practice: Current trends and future opportunities

Author: Gepp Adrian
Linnenluecke Martina K
O'Neill Terence J
Publication venue: 'Elsevier BV'
Publication date: 01/06/2018
Field of study

A log mining approach for process monitoring in SCADA

Author: Bolzoni Damiano
Hadziosmanovic Dina
Hartel Pieter
Publication venue: Springer
Publication date: 01/01/2012
Field of study

SCADA (Supervisory Control and Data Acquisition) systems are used for controlling and monitoring industrial processes. We propose a methodology to systematically identify potential process-related threats in SCADA. Process-related threats take place when an attacker gains user access rights and performs actions, which look legitimate, but which are intended to disrupt the SCADA process. To detect such threats, we propose a semi-automated approach of log processing. We conduct experiments on a real-life water treatment facility. A preliminary case study suggests that our approach is effective in detecting anomalous events that might alter the regular process workflow

Crossref

Springer - Publisher Connector

University of Twente Research Information

Integrity Proofs for RDF Graphs

Author: Andrew Sutton
Reza Samavi
Publication venue: RonPub
Publication date: 01/01/2018
Field of study

Representing open datasets with the RDF model is becoming increasingly popular. An important aspect of this data model is that it can utilize the methods of computing cryptographic hashes to verify the integrity of RDF graphs. In this paper, we first develop a number of metrics to compare the state-of-the-art integrity proof methods and then present two new approaches to generate an integrity proof of RDF datasets: (i) semantic-based and (ii) structure-based. The semantic-based approach leverages timestamps (or other inherent notions of ordering) as an indexing key to construct a sorted Merkle tree variation, where timestamps are semantically extractable from the dataset. The structure-based approach utilizes the redundant structure of large RDF datasets to compress the dataset statements prior to generating a variation of a Merkle tree. We provide a theoretical analysis and an experimental evaluation of our two proposed methods. Compared to the Merkle and sorted Merkle tree, the semantic-based approach achieves faster querying performance for large datasets. The structure-based approach is well suited when RDF datasets contain large amounts of semantic redundancies. We also evaluate our methods' resistance to adversarial threats

RonPub -- Research Online Publishing

Combining K-Means and XGBoost Models for Anomaly Detection Using Log Datasets

Author: Caldeira Filipe
Cruz Tiago
Henriques João
Simões Paulo
Publication venue: 'MDPI AG'
Publication date: 15/11/2022
Field of study

Abstract: Computing and networking systems traditionally record their activity in log files, which have been used for multiple purposes, such as troubleshooting, accounting, post-incident analysis of security breaches, capacity planning and anomaly detection. In earlier systems those log files were processed manually by system administrators, or with the support of basic applications for filtering, compiling and pre-processing the logs for specific purposes. However, as the volume of these log files continues to grow (more logs per system, more systems per domain), it is becoming increasingly difficult to process those logs using traditional tools, especially for less straightforward purposes such as anomaly detection. On the other hand, as systems continue to become more complex, the potential of using large datasets built of logs from heterogeneous sources for detecting anomalies without prior domain knowledge becomes higher. Anomaly detection tools for such scenarios face two challenges. First, devising appropriate data analysis solutions for effectively detecting anomalies from large data sources, possibly without prior domain knowledge. Second, adopting data processing platforms able to cope with the large datasets and complex data analysis algorithms required for such purposes. In this paper we address those challenges by proposing an integrated scalable framework that aims at efficiently detecting anomalous events on large amounts of unlabeled data logs. Detection is supported by clustering and classification methods that take advantage of parallel computing environments. We validate our approach using the the well known NASA Hypertext Transfer Protocol (HTTP) logs datasets. Fourteen features were extracted in order to train a k-means model for separating anomalous and normal events in highly coherent clusters. A second model, making use of the XGBoost system implementing a gradient tree boosting algorithm, uses the previous binary clustered data for producing a set of simple interpretable rules. These rules represent the rationale for generalizing its application over a massive number of unseen events in a distributed computing environment. The classified anomaly events produced by our framework can be used, for instance, as candidates for further forensic and compliance auditing analysis in security management.info:eu-repo/semantics/publishedVersio

Repositório Científico do Instituto Politécnico de Viseu