7,269 research outputs found
Provenance-based Auditing of Private Data Use
Across the world, organizations are required to comply with regulatory frameworks dictating how to manage personal information. Despite these, several cases of data leaks and exposition of private data to unauthorized recipients have been publicly and widely advertised. For authorities and system administrators to check compliance to regulations, auditing of private data processing becomes crucial in IT systems. Finding the origin of some data, determining how some data is being used, checking that the processing of some data is compatible with the purpose for which the data was captured are typical functionality that an auditing capability should support, but difficult to implement in a reusable manner. Such questions are so-called provenance questions, where provenance is defined as the process that led to some data being produced. The aim of this paper is to articulate how data provenance can be used as the underpinning approach of an auditing capability in IT systems. We present a case study based on requirements of the Data Protection Act and an application that audits the processing of private data, which we apply to an example manipulating private data in a university
LogShield: A Transformer-based APT Detection System Leveraging Self-Attention
Cyber attacks are often identified using system and network logs. There have
been significant prior works that utilize provenance graphs and ML techniques
to detect attacks, specifically advanced persistent threats, which are very
difficult to detect. Lately, there have been studies where transformer-based
language models are being used to detect various types of attacks from system
logs. However, no such attempts have been made in the case of APTs. In
addition, existing state-of-the-art techniques that use system provenance
graphs, lack a data processing framework generalized across datasets for
optimal performance. For mitigating this limitation as well as exploring the
effectiveness of transformer-based language models, this paper proposes
LogShield, a framework designed to detect APT attack patterns leveraging the
power of self-attention in transformers. We incorporate customized embedding
layers to effectively capture the context of event sequences derived from
provenance graphs. While acknowledging the computational overhead associated
with training transformer networks, our framework surpasses existing LSTM and
Language models regarding APT detection. We integrated the model parameters and
training procedure from the RoBERTa model and conducted extensive experiments
on well-known APT datasets (DARPA OpTC and DARPA TC E3). Our framework achieved
superior F1 scores of 98% and 95% on the two datasets respectively, surpassing
the F1 scores of 96% and 94% obtained by LSTM models. Our findings suggest that
LogShield's performance benefits from larger datasets and demonstrates its
potential for generalization across diverse domains. These findings contribute
to the advancement of APT attack detection methods and underscore the
significance of transformer-based architectures in addressing security
challenges in computer systems
Semantic Modeling of Analytic-based Relationships with Direct Qualification
Successfully modeling state and analytics-based semantic relationships of
documents enhances representation, importance, relevancy, provenience, and
priority of the document. These attributes are the core elements that form the
machine-based knowledge representation for documents. However, modeling
document relationships that can change over time can be inelegant, limited,
complex or overly burdensome for semantic technologies. In this paper, we
present Direct Qualification (DQ), an approach for modeling any semantically
referenced document, concept, or named graph with results from associated
applied analytics. The proposed approach supplements the traditional
subject-object relationships by providing a third leg to the relationship; the
qualification of how and why the relationship exists. To illustrate, we show a
prototype of an event-based system with a realistic use case for applying DQ to
relevancy analytics of PageRank and Hyperlink-Induced Topic Search (HITS).Comment: Proceedings of the 2015 IEEE 9th International Conference on Semantic
Computing (IEEE ICSC 2015
trackr: A Framework for Enhancing Discoverability and Reproducibility of Data Visualizations and Other Artifacts in R
Research is an incremental, iterative process, with new results relying and
building upon previous ones. Scientists need to find, retrieve, understand, and
verify results in order to confidently extend them, even when the results are
their own. We present the trackr framework for organizing, automatically
annotating, discovering, and retrieving results. We identify sources of
automatically extractable metadata for computational results, and we define an
extensible system for organizing, annotating, and searching for results based
on these and other metadata. We present an open-source implementation of these
concepts for plots, computational artifacts, and woven dynamic reports
generated in the R statistical computing language
- ā¦